r-spatial / spdep

Spatial Dependence: Weighting Schemes and Statistics
https://r-spatial.github.io/spdep/
116 stars 26 forks source link

poly2nb doesn't find all neighbors #162

Open JosiahParry opened 5 days ago

JosiahParry commented 5 days ago

Today I ran into a scenario where spdep was not finding all of the neighbors for a(n oddly shaped polygon).

Below is the comparison with the neighborhoods identified via ArcGIS Pro and the neighborhoods identified via spdep. There are 9 missing contiguous neighbors from spdep. I tried increasing the snap distance to no avail.

Attached is a zip containing the shapefile as well as a dbf file of the spatial weights matrix created by ArcGIS Pro.

library(dplyr) 

df <- sf::st_read("tokyo.shp")
#> Simple feature collection with 262 features and 13 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 266206.6 ymin: -90932.11 xmax: 411400.3 ymax: 37142.75
#> Projected CRS: Tokyo / Japan Plane Rectangular CS VI
# identify queen contiguity neighbors
nb <- spdep::poly2nb(df)

i <- rep.int(df$SOURCE_, lengths(nb))
j <- unlist(nb)

# Count the number of neighbors per location using spdep
spdep_counts <- tibble(SOURCE_ID = i, j) |> 
  filter(j != 0) |> 
  count(SOURCE_ID, sort = TRUE) |> 
  rename(n_spdep = n)

# Read in swm table from Pro
swm <- foreign::read.dbf("tokio-contig-swm.dbf")

# count
pro_counts <- as_tibble(swm) |> 
  select(-Field1, WEIGHT) |> 
  arrange(SOURCE_ID, NID) |> 
  count(SOURCE_ID, sort = TRUE) |> 
  rename(n_pro = n)

left_join(pro_counts, spdep_counts)
#> Joining with `by = join_by(SOURCE_ID)`
#> # A tibble: 262 × 3
#>    SOURCE_ID n_pro n_spdep
#>        <int> <int>   <int>
#>  1        39    12       4
#>  2       183    12      12
#>  3        44    11       1
#>  4         9    10       7
#>  5        45    10       5
#>  6       171    10       9
#>  7         3     9       2
#>  8        40     9       2
#>  9        41     9       9
#> 10        42     9       8
#> # ℹ 252 more rows

geo <- df$geometry

# spdep neighbors
plot(geo[nb[[39]]], lty = 3)
plot(geo[39], add = TRUE)


# ArcGIS Pro Neighbors
pro_nbs <- swm |> 
  filter(SOURCE_ID == 39) |> 
  pull(NID)

plot(geo[pro_nbs], lty = 3)
plot(geo[39], add = TRUE)

tokyo.zip

rsbivand commented 5 days ago

Will look when I get a chance. Could you rephrase the dplyr parts in base R please, I do not use tidy verbs ever, cognitive overload. The first count extraction looks like card, am I wrong?

rsbivand commented 5 days ago

I had some code in 2015 to convert SWM to listw - SWM are a version of S+ SpatialStats "spatial.neighbors" objects, but very oddly ordered. Can you try to find such code - I'll try to dig up the emails - with Mark Janikas in May 2015.

However, the problem is snap=. I think nb <- spdep::poly2nb(df, snap=0.002) is sufficient, the map units are metres, so:

> x <- 0.002
> units::set_units(x, "m")
0.002 [m]
> units::set_units(x, "m") |> units::set_units("mm")
2 [mm]

as the proximate borders between 1 and 8 are only that far apart: image

With 2mm snap:

nb <- spdep::poly2nb(df, snap=0.002)
swm <- foreign::read.dbf("tokio-contig-swm.dbf")
swm1 <- swm[order(swm[,2], swm[,3]),-1]
attr(swm1, "n") <- 262
class(swm1) <- c(class(swm1), "spatial.neighbour")
swm2 <- sn2listw(swm1, style="B", zero.policy=TRUE)
> diffnb(nb, swm2$neighbours)
Neighbour list object:
Number of regions: 262 
Number of nonzero links: 0 
Percentage nonzero weights: 0 
Average number of links: 0 
262 regions with no links:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243
244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261
262
262 disjoint connected subgraphs
Warning message:
In diffnb(nb, swm2$neighbours) : neighbour object has 262 sub-graphs
> all.equal(nb, swm2$neighbours, check.attributes=FALSE)
[1] TRUE
rsbivand commented 4 days ago

And in general, print.nb should be used after creating a neighbour object anyway (always), to indicate whether the object is asymmetric, whether there are no-neighbour "regions" - maybe correct to "features" or "objects"? - and most recently whether there are disjoint subgraphs:

> df <- sf::st_read("tokyo.shp")
Reading layer `tokyo' from data source `/home/rsb/tmp/bigshape/tokyo.shp' using driver `ESRI Shapefile'
Simple feature collection with 262 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 266206.6 ymin: -90932.11 xmax: 411400.3 ymax: 37142.75
Projected CRS: Tokyo / Japan Plane Rectangular CS VI
> nb <- spdep::poly2nb(df)
Warning message:
In spdep::poly2nb(df) : neighbour object has 23 sub-graphs
> nb
Neighbour list object:
Number of regions: 262 
Number of nonzero links: 946 
Percentage nonzero weights: 1.378125 
Average number of links: 3.610687 
10 regions with no links:
101 127 134 135 152 154 167 237 242 243
23 disjoint connected subgraphs

This indicates trouble of some kind, and that many unexpected subgraphs suggests that default snap= is too conservative for this data. We could look at the coordinate units of the input map, and set snap= to 10 mm in both projected and spherical (say at the equator). Might that be worth exploring?

JosiahParry commented 2 days ago

Base R repro

Will look when I get a chance. Could you rephrase the dplyr parts in base R please, I do not use tidy verbs ever, cognitive overload. The first count extraction looks like card, am I wrong?

df <- sf::st_read("tokyo.shp")
swm <- foreign::read.dbf(r"{tokio-contig-swm.dbf}")

nb <- spdep::poly2nb(df)
cards <- spdep::card(nb)

# count nbs spdep
spdep_counts <- data.frame(i = df$SOURCE_, n_spdep = cards)

# count nbs arcgis pro
arc_counts <- aggregate(SOURCE_ID ~ NID, FUN = length, data = swm[, c("SOURCE_ID", "NID")])
colnames(arc_counts) <- c("i", "n_arc")

# join & sort
combined <- merge(spdep_counts, arc_counts)
combined[order(combined$n_arc, decreasing = TRUE),]
#>       i n_spdep n_arc
#> 39   39       4    12
#> 183 183      12    12
#> 44   44       1    11
#> 9     9       7    10
#> 45   45       5    10
#> 171 171       9    10
#> 3     3       2     9
#> 40   40       2     9
#> truncated here....

SWM conversions

"I had some code in 2015 to convert SWM to listw - SWM are a version of S+ SpatialStats "spatial.neighbors" objects, but very oddly ordered. Can you try to find such code..."

Yes! Mark shared this code with me some time ago and wants to be able to share this with the R community so that we can transfer spatial weights to and fro languages & tools. There is the file format which is a binary representation and then the output of the "convert swm to table" tool which is shared in the .dbf format. I suspect there could be utility in conversion function for the both.

Would you be interested in having this functionality housed in spdep?

Snap tolerances

I think this is a problem that is more noticeable with simplified geometries as is the case here. I think the snap tolerance can be increased from 1.5e-08 which is quite small to something a bit bigger based on the units as you suggest. Though, of course, there is an issue of scale and having the snap tolerance be more conservative than less is probably a good thing.

That said, I think most people might not take the time to visually inspect that all of their polygons have the neighbors that they might expect!

10mm feels like a fair amount of wiggle room! It is in most cases just rounding error that one might expect--and is surely less than the error introduced by the polygon simplifications.

rsbivand commented 2 days ago

Yes, I think that adding at least reading the DBF file format and converting to listw would be sensible. I recall that the binary format also worked, and that would also be fine, but both might need maintenance if anything changed in the SWM specification.

Shall I look at a default 10mm snap value for poly2nb?

I agree that visual inspection is unlikely, but might users usually run print.nb after creation of a neighbour object? Should maybe no-neigbour features be added to subgraphs as a warning? Should asymmetric neighbours be added? I guess it would be a matter of documenting what is going on in a simplified way in a vignette?

JosiahParry commented 2 days ago

Yes, I think a default 10mm snap is good! I do think a suppressable warning for no-neighbor features is a good idea. More often than not that might lead to an invalid analysis or misinterpretation of analytic results.