Closed JosiahParry closed 4 weeks ago
I think it may be this part: {n} disjoint connected subgraphs
https://github.com/r-spatial/spdep/blob/main/R%2Fsummary.nb.R#L65 was added to declare the subgraph count. This is done regularly in CARBayes and other downstream modelling packages because if the weights matrix is block diagonal, there could be multiple measures, one for each block. See also the end of https://r-spatial.org/book/14-Areal.html#contiguous-neighbours, where n.comp.nb
is discussed. Maybe igraph
might be faster. Don't trust RStudio about session non-responsiveness. I'll try to check how the function performs both with large n
and with dense neighbours. Maybe run under debug
to check the location of the slow-running line, or run n.comp.nb
on the neighour object.
The implementation by Nicholas Lewin-Koh from 2001 uses Depth First Search; https://en.wikipedia.org/wiki/Depth-first_search gives its time complexity as order of node count plus edge count. The time complexity for igraph::components
seems to be the same: https://igraph.org/c/doc/igraph-Structural.html#igraph_connected_components. igraph::is_connected
has the same time complexity, unfortunately.
In: , I added an igraph
option to m.comp.nb
, house has ~ 25000 points:
> library(spdep)
Loading required package: spData
Loading required package: sf
Linking to GEOS 3.12.2, GDAL 3.9.0, PROJ 9.4.1; sf_use_s2() is TRUE
> data(house)
> house <- sf::st_as_sf(house)
> dim(house)
[1] 25357 25
> k6 <- knn2nb(knearneigh(house, k=6))
> system.time(o <- n.comp.nb(k6, igraph=FALSE))
user system elapsed
1.617 0.004 1.634
> system.time(o <- n.comp.nb(k6, igraph=TRUE))
user system elapsed
2.986 0.058 3.055
> k60 <- knn2nb(knearneigh(house, k=60))
> system.time(o <- n.comp.nb(k60, igraph=FALSE))
user system elapsed
12.218 0.020 12.307
> system.time(o <- n.comp.nb(k60, igraph=TRUE))
user system elapsed
16.574 0.089 16.727
> k600 <- knn2nb(knearneigh(house, k=600))
> system.time(o <- n.comp.nb(k600, igraph=FALSE))
user system elapsed
275.825 0.107 277.011
> system.time(o <- n.comp.nb(k600, igraph=TRUE))
user system elapsed
322.273 1.327 325.135
with |V| always 25357, but here |E| k=6: 183834, k=60: 1776536; k=600: 18267908. So the edge count is driving most of the time complexity. However, the nb
objects are made symmetric inside n.comp.nb
, and make.sym.nb
takes most of the time, which needs to be handled. I'm looking at that now.
> library(spdep)
Loading required package: spData
Loading required package: sf
Linking to GEOS 3.12.2, GDAL 3.9.0, PROJ 9.4.1; sf_use_s2() is TRUE
> data(house)
> house <- sf::st_as_sf(house)
> k6 <- knn2nb(knearneigh(house, k=6))
> system.time(o <- n.comp.nb(k6, igraph=FALSE))
user system elapsed
1.603 0.002 1.618
> system.time(o <- n.comp.nb(k6, igraph=TRUE))
user system elapsed
1.522 0.059 1.596
> k60 <- knn2nb(knearneigh(house, k=60))
> system.time(o <- n.comp.nb(k60, igraph=FALSE))
user system elapsed
11.925 0.006 11.979
> system.time(o <- n.comp.nb(k60, igraph=TRUE))
user system elapsed
6.589 0.128 6.736
> k600 <- knn2nb(knearneigh(house, k=600))
> system.time(o <- n.comp.nb(k600, igraph=FALSE))
user system elapsed
267.720 0.081 268.524
> system.time(o <- n.comp.nb(k600, igraph=TRUE))
user system elapsed
73.950 1.633 75.786
Commit (including some other correction of igraph
use): https://github.com/r-spatial/spdep/commit/80262d0af413a3151de65de4ec35caa1ec4b1da5. Also added in the igraph=FALSE
setting the possibility to interrupt the C function doing the depth-first search.
I dropped the igraph=
argument, and in commit https://github.com/r-spatial/spdep/pull/161/commits/8e8fdbeadf2c8c990e32bf935910f6d42d682d31 detect igraph
and spatialreg
in n.comp.nb
and use them if available. The symmetry attribute in the nb
object is used. When nb
objects are created, an "ncomp"
attribute is added then used subsequently where needed, controlled by get.SubgraphOption
(default TRUE) and get.SubgraphCeiling
(default 10000) being less than |N|+|E|.
Brilliant thank you so much! I think this will be an improvement for users!
Running reverse dependency checks now, should be OK, but I'll also contact packages maintainers using n.comp.nb
to alert them to the changes, and to ask for input before moving forward.
These use n.comp.nb
:
The changes do not cause errors, but maybe they could use attr(, "ncomp")
instead of recalculating the output of n.comp.nb
, as for example now in spatialreg::isCyclical
: https://github.com/r-spatial/spatialreg/blob/d0ce698e65113ad2aecbabd80006e5d20f3039d1/R/cyclical.R#L8-L9
I'm unsure what specifically about the
nb
class print method is causing this, but I've noticed it previously and am encountering it again today.With somewhat large neighbor matrices, spdep will hang in computing for quite some time leading to RStudio reporting that R is not responding.