r-spatial / spdep

Spatial Dependence: Weighting Schemes and Statistics
https://r-spatial.github.io/spdep/
122 stars 26 forks source link

Support ArcGIS SWM Tables #163

Closed JosiahParry closed 1 month ago

JosiahParry commented 2 months ago

Per suggestion in #162, I want to provide conversion functions to a listw object from ArcGIS spatial weights matrix representations.

At the moment, I have prototyped a function for us to use internally that might be better served in {spdep}. The function works on .dbf files created from the Convert Spatial Weights Matrix to Table tool in ArcGIS Pro.

The one challenge with this I haven't figured out is that the SWM format requires a Unique ID field and does not use the row index of the features—though in many cases these are the same.

To support this we need to have the row.names attribute set to the unique ID field in the nb object and we should also probably set the row.names attribute in the related sf object.

I have not figured out a good design for this. Should the function also require an sf object that is validated against? In this case we might return a with the modified sf object and the listw list.

This is a repro using the SWM provided in #162.

fp <- "tokio-contig-swm.dbf"

read_swm_dbf <- function(path, unique_id_field = "SOURCE_ID") {
  # read in the dbf file
  swm_raw <- foreign::read.dbf(path)
  # split the swm
  swm_split <- split(swm_raw, swm_raw[[unique_id_field]])
  # extract the neighbors
  nbs <- lapply(swm_split, function(.i) {
    .i[["NID"]]
  })

  # extract the weights
  wts <- lapply(swm_split, function(.i) {
    .i[["WEIGHT"]]
  })

  spdep::nb2listw(
    # add the nb class
    structure(nbs, class = "nb"),
    # specify the weights manually
    wts,
    # We say "B" binary weights style so that
    # spdep doesn't modify them 
    "B"
  )
}

read_swm_dbf(fp)
#> Characteristics of weights list object:
#> Neighbour list object:
#> Number of regions: 262 
#> Number of nonzero links: 1390 
#> Percentage nonzero weights: 2.02494 
#> Average number of links: 5.305344 
#> 
#> Weights style: B 
#> Weights constants summary:
#>     n    nn   S0   S1    S2
#> B 262 68644 1390 2780 33464
rsbivand commented 2 months ago

@JosiahParry What does an SWM-DBF look like when a feature has no neighbours? I need to check this for listw2sn/sn2listw too, so also note to self.

rsbivand commented 2 months ago
library(spdep)
data(columbus)
nb1005 <- droplinks(col.gal.nb, drop="1005")
lw1005 <- nb2listw(nb1005, zero.policy=TRUE)
lw1005
sn1005 <- listw2sn(lw1005)
lw1005_rt <- sn2listw(sn1005, style="W", zero.policy=TRUE)
lw1005_rt
all.equal(lw1005$weights, lw1005_rt$weights, check.attributes=FALSE)
all.equal(lw1005$neighbours, lw1005_rt$neighbours, check.attributes=FALSE)

Some component attributes differ, will correct. I don't think S-Plus handled no-neighbour observations, as this requires passing the complete set of IDs (here as an attribute of col.gal.nb). I've ordered a copy from my institution's inter-library loan system to check. From an online version in Japanese, it looks as though there was a check.islands function, but my Japanese is non-existent.

rsbivand commented 2 months ago

The 2017 code for reading SWM files directly did handle no-neighbour features. The storage method (then) for each feature after the second header of two 4-byte integers (feature count and row-standardised or not) was:

(int) id
(int) nn # neighbour count
if (nn > 0) {
  nn (int) nhs # neighbour ids
  nn (double) w # weights if header FIXEDWEIGHTS false, else 1 (double) in rep(, nn)
  (double) sum # sum of weights
}

When nn was 0, empty lists were returned.

rsbivand commented 1 month ago

The S-Plus format preferred having an nregion attribute, giving cllear guidance of the dimensions of the underlying matrix, which would have a size of nregion * nregion. So if any indices in the from or to columns of the data frame were missing (in the help file row.id and col.id), it would be assumed that zero row/columns were no-neighbour observations. There was also a function check.islands to list no-neighbour observations for a spatial neighbour object.

Now https://github.com/r-spatial/spdep/blob/4d3e50ad98003fe834011200634cf1d4fb3b3a5b/R/listw2sn.R#L35 expects an attribute giving the dimensions of the underlying matrix, mimicked in the DBF SWM case here: https://github.com/r-spatial/spdep/blob/4d3e50ad98003fe834011200634cf1d4fb3b3a5b/R/read.gwt2nb.R#L203-L208 Does this make sense? @JosiahParry Could you please provide examples of SWM DBFs exported from say the sids and columbus data sets, also with no-neighbour observations? Could you also provide the underlying SWM files? The code I have from 2017/18 from Mark (I think) read SWM files, which do provide the observation count.

JosiahParry commented 1 month ago

Hi Roger, my apologies! I've been spread thin lately. I've put time on my calendar to address this tomorrow! I will get back to you then.

JosiahParry commented 1 month ago

Attached are a zip of swm, dbf, and shapefiles files for California and NC dataset.

swm-compat.zip

rsbivand commented 1 month ago

@JosiahParry thanks! Commit https://github.com/r-spatial/spdep/commit/7d843fefeda438758df9fe750e77d76711b74e71 gets most of the way there, I think, but not covering the edge cases of the first or last observation having no neighbours. Could you try to create such DBFs with islands as the first and/or last observation(s)?

rsbivand commented 1 month ago

@JosiahParry In https://github.com/r-spatial/spdep/commit/cb43628a80283e286411b03ed73e4388c97183f8 I added examples with first/last observations as islands - then region.id= is needed to span the ID range properly. Is this ready for a PR? If I don't hear back by Friday, I'll create a PR.

rsbivand commented 1 month ago

@JosiahParry https://github.com/r-spatial/spdep/pull/166