r-spatial / spdep

Spatial Dependence: Weighting Schemes and Statistics
https://r-spatial.github.io/spdep/
116 stars 26 forks source link

Setting grouping criteria per group using spdep::skater() #141

Closed ashirwad closed 4 weeks ago

ashirwad commented 5 months ago

@rsbivand, is it possible to set grouping criteria per group using spdep::skater()? So, suppose I want to cluster geometries into 3 groups such that the groups have the following population: [1000, 2000), [2000, 5000), and [5000, Inf). Is it doable with the crit argument? GeoDaCenter/rgeoda#44

rsbivand commented 5 months ago

Please provide a minimal reproducible code example using built-in data.

ashirwad commented 5 months ago

@rsbivand, here's my attempt:

## Borrowed from spdep package vignette
### loading data
bh <- sf::st_read(
  system.file("etc/shapes/bhicv.shp", package = "spdep")[1], quiet = TRUE
)
sf::st_crs(bh) <- "OGC:CRS84"

### data standardized
dpad <- data.frame(scale(as.data.frame(bh)[, 5:8]))

### neighborhood list
bh.nb <- spdep::poly2nb(bh)

### calculating costs
lcosts <- spdep::nbcosts(bh.nb, dpad)

### making listw
nb.w <- spdep::nb2listw(bh.nb, lcosts, style = "B")

### find a minimum spanning tree
mst.bh <- spdep::mstree(nb.w, 5)

### three groups with no restriction
res1 <- spdep::skater(mst.bh[, 1:2], dpad, 2)

## Introducing constraints (my part)
### works fine
res_min_pop_50k <- spdep::skater(
  mst.bh[, 1:2], dpad, 2,
  crit = 50000, vec.crit = bh$Population
)

### throws error
res_pop_breaks <- spdep::skater(
  mst.bh[, 1:2], dpad, 2,
  crit = c(-Inf, 250000, 500000, Inf), # population: [min, 250k), [250k, 500k), and [500k, max]
  vec.crit = bh$Population
)
rsbivand commented 5 months ago

My understanding from ?skater and http://www.dpi.inpe.br/gilberto/papers/assuncao_neves_camara_ijgis.pdf (cross-check @eliaskrainski ) is that crit is either a scalar or a two element vector https://github.com/r-spatial/spdep/blob/b815397c2ce0d58b3b4ca0c36bba7b5c3abbbcc2/R/skater.R#L21-L24 vec.crit will be a unit vector https://github.com/r-spatial/spdep/blob/b815397c2ce0d58b3b4ca0c36bba7b5c3abbbcc2/R/skater.R#L25-L29 if crit is a scalar or two element vector and crit is to be understood as a minimum or (min, max) count of observations in each group. If crit is scalar and vec.crit is not a unit vector but observation population counts, crit is the minimum group population.

https://github.com/r-spatial/spdep/blob/b815397c2ce0d58b3b4ca0c36bba7b5c3abbbcc2/R/skater.R#L65-L66 shows the counting by group of vec.crit in the iterations, if unit vector then counts of observations, or population counts. Using crit with a length > 1 when vec.crit is not the unit vector perhaps should error exit (I don't see any errors, only unhelpful output objects.

From the examples, it seems as though repeated application of skater to an object returned by the function may permit the updating of the object, but I fail to see what a vector of group population counts would mean in practice. Possibly we should add short comments in the documentation and perhaps a warning in the function code if length(crit) > 1L and vect.crit is not the unit vector.

eliaskrainski commented 5 months ago

You can combine a discrete variable with a few levels with the groups from the clustering output.

eliaskrainski commented 5 months ago
library(spdep)

## a map
map <- sf::st_read(system.file(
               "shapes/sids.shp", package="spData")[1])

### neighboorhod list
nbl <- poly2nb(map)

### a variable: estimate of the rate
ni <- map$BIR74 + map$BIR79
yi <- map$SID74 + map$SID79
map$raw <- yi/ni
map$est <- EBlocal(
    ri = yi,
    ni = ni,
    nb = nbl)$est
summary(map$est)

## standardize
dpad <- scale(map$est)

### calculating costs
lcosts <- nbcosts(nbl, dpad)

### making listw
nbw <- nb2listw(nbl, lcosts, style="B")

### find a minimum spanning tree
mst <- mstree(nbw, 5)

### cluster
sk <- skater(
    edges = mst,
    data = dpad,
    ncuts = 3
)
table(sk$groups)
tapply(ni, sk$groups, sum)

### subdivide the groups by ni
summary(ni)
table(b3 <- findInterval(ni, c(3000, 7000)))

### combine the results
newc <- paste0("B", b3, "g", sk$groups)
table(newc, b3)

### add results to the map to plot
map$group1 <- paste0("g", sk$groups)
map$group2 <- newc

library(ggplot2)
library(ggpubr)

gg0 <- ggplot(data = map) + theme_minimal()

ggarrange(gg0 + geom_sf(aes(fill = raw)),
          gg0 + geom_sf(aes(fill = est)),
          gg0 + geom_sf(aes(fill = group1)),
          gg0 + geom_sf(aes(fill = group2)))
ashirwad commented 5 months ago

Thanks, @rsbivand & @eliaskrainski! I will give it a shot!