Closed ashirwad closed 4 weeks ago
Please provide a minimal reproducible code example using built-in data.
@rsbivand, here's my attempt:
## Borrowed from spdep package vignette
### loading data
bh <- sf::st_read(
system.file("etc/shapes/bhicv.shp", package = "spdep")[1], quiet = TRUE
)
sf::st_crs(bh) <- "OGC:CRS84"
### data standardized
dpad <- data.frame(scale(as.data.frame(bh)[, 5:8]))
### neighborhood list
bh.nb <- spdep::poly2nb(bh)
### calculating costs
lcosts <- spdep::nbcosts(bh.nb, dpad)
### making listw
nb.w <- spdep::nb2listw(bh.nb, lcosts, style = "B")
### find a minimum spanning tree
mst.bh <- spdep::mstree(nb.w, 5)
### three groups with no restriction
res1 <- spdep::skater(mst.bh[, 1:2], dpad, 2)
## Introducing constraints (my part)
### works fine
res_min_pop_50k <- spdep::skater(
mst.bh[, 1:2], dpad, 2,
crit = 50000, vec.crit = bh$Population
)
### throws error
res_pop_breaks <- spdep::skater(
mst.bh[, 1:2], dpad, 2,
crit = c(-Inf, 250000, 500000, Inf), # population: [min, 250k), [250k, 500k), and [500k, max]
vec.crit = bh$Population
)
My understanding from ?skater
and http://www.dpi.inpe.br/gilberto/papers/assuncao_neves_camara_ijgis.pdf (cross-check @eliaskrainski ) is that crit
is either a scalar or a two element vector https://github.com/r-spatial/spdep/blob/b815397c2ce0d58b3b4ca0c36bba7b5c3abbbcc2/R/skater.R#L21-L24 vec.crit
will be a unit vector https://github.com/r-spatial/spdep/blob/b815397c2ce0d58b3b4ca0c36bba7b5c3abbbcc2/R/skater.R#L25-L29 if crit
is a scalar or two element vector and crit
is to be understood as a minimum or (min, max) count of observations in each group. If crit
is scalar and vec.crit
is not a unit vector but observation population counts, crit
is the minimum group population.
https://github.com/r-spatial/spdep/blob/b815397c2ce0d58b3b4ca0c36bba7b5c3abbbcc2/R/skater.R#L65-L66 shows the counting by group of vec.crit
in the iterations, if unit vector then counts of observations, or population counts. Using crit
with a length > 1 when vec.crit
is not the unit vector perhaps should error exit (I don't see any errors, only unhelpful output objects.
From the examples, it seems as though repeated application of skater
to an object returned by the function may permit the updating of the object, but I fail to see what a vector of group population counts would mean in practice. Possibly we should add short comments in the documentation and perhaps a warning in the function code if length(crit) > 1L
and vect.crit
is not the unit vector.
You can combine a discrete variable with a few levels with the groups from the clustering output.
library(spdep)
## a map
map <- sf::st_read(system.file(
"shapes/sids.shp", package="spData")[1])
### neighboorhod list
nbl <- poly2nb(map)
### a variable: estimate of the rate
ni <- map$BIR74 + map$BIR79
yi <- map$SID74 + map$SID79
map$raw <- yi/ni
map$est <- EBlocal(
ri = yi,
ni = ni,
nb = nbl)$est
summary(map$est)
## standardize
dpad <- scale(map$est)
### calculating costs
lcosts <- nbcosts(nbl, dpad)
### making listw
nbw <- nb2listw(nbl, lcosts, style="B")
### find a minimum spanning tree
mst <- mstree(nbw, 5)
### cluster
sk <- skater(
edges = mst,
data = dpad,
ncuts = 3
)
table(sk$groups)
tapply(ni, sk$groups, sum)
### subdivide the groups by ni
summary(ni)
table(b3 <- findInterval(ni, c(3000, 7000)))
### combine the results
newc <- paste0("B", b3, "g", sk$groups)
table(newc, b3)
### add results to the map to plot
map$group1 <- paste0("g", sk$groups)
map$group2 <- newc
library(ggplot2)
library(ggpubr)
gg0 <- ggplot(data = map) + theme_minimal()
ggarrange(gg0 + geom_sf(aes(fill = raw)),
gg0 + geom_sf(aes(fill = est)),
gg0 + geom_sf(aes(fill = group1)),
gg0 + geom_sf(aes(fill = group2)))
Thanks, @rsbivand & @eliaskrainski! I will give it a shot!
@rsbivand, is it possible to set grouping criteria per group using
spdep::skater()
? So, suppose I want to cluster geometries into 3 groups such that the groups have the following population: [1000, 2000), [2000, 5000), and [5000, Inf). Is it doable with thecrit
argument? GeoDaCenter/rgeoda#44