Open JosiahParry opened 1 month ago
It took me a few minutes to understand what exactly this function would do in comparison to st_cast(st_union(x), "POLYGON")
so I made a little reprex to show it.
library(sf)
nc = read_sf(system.file("shape/nc.shp", package="sf"))
set.seed(241)
test = nc[sample(nrow(nc), 18),] |> st_geometry()
a = test |> st_union() |> st_as_sf()
a$id = 1:nrow(a)
b = a |> st_cast("POLYGON") |> st_as_sf()
#> Warning in st_cast.sf(a, "POLYGON"): repeating attributes for all
#> sub-geometries for which they may not be constant
b$id = 1:nrow(b)
c = test |> dissolve_boundaries() |> st_as_sf()
#> Loading required namespace: spdep
c$id = 1:nrow(c)
b
#> Simple feature collection with 11 features and 1 field
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -83.73952 ymin: 34.30505 xmax: -75.77316 ymax: 36.55716
#> Geodetic CRS: NAD27
#> First 10 features:
#> id x
#> 1 1 POLYGON ((-83.1615 35.05922...
#> 1.1 2 POLYGON ((-76.00897 36.3196...
#> 1.2 3 POLYGON ((-75.97629 36.5179...
#> 1.3 4 POLYGON ((-75.78317 36.2251...
#> 1.4 5 POLYGON ((-82.74389 35.4180...
#> 1.5 6 POLYGON ((-78.95108 36.2338...
#> 1.6 7 POLYGON ((-76.70538 35.4119...
#> 1.7 8 POLYGON ((-76.6949 35.35043...
#> 1.8 9 POLYGON ((-81.659 36.11759,...
#> 1.9 10 POLYGON ((-80.45065 35.7648...
nrow(b)
#> [1] 11
c
#> Simple feature collection with 8 features and 1 field
#> Geometry type: GEOMETRY
#> Dimension: XY
#> Bounding box: xmin: -83.73952 ymin: 34.30505 xmax: -75.77316 ymax: 36.55716
#> Geodetic CRS: NAD27
#> x id
#> 1 MULTIPOLYGON (((-75.97629 3... 1
#> 2 POLYGON ((-77.98668 34.3399... 2
#> 3 POLYGON ((-78.95108 36.2338... 3
#> 4 MULTIPOLYGON (((-76.70538 3... 4
#> 5 POLYGON ((-82.24016 35.4681... 5
#> 6 POLYGON ((-81.659 36.11759,... 6
#> 7 POLYGON ((-80.50825 36.0708... 7
#> 8 POLYGON ((-83.1615 35.05922... 8
nrow(c)
#> [1] 8
cols = paletteer::paletteer_d("tvthemes::gravityFalls")
par(mfrow = c(2,2), mar = c(0,0,1,0))
plot(test, main = "Original geoms")
plot(a, main = "st_union()",
key.pos = NULL, reset = FALSE, pal = cols)
plot(b, main = "st_union |> st_cast('POLYGON')",
key.pos = NULL, reset = FALSE, pal = cols)
plot(c, main = "dissolve_boundaries()",
key.pos = NULL, reset = FALSE, pal = cols)
It would be cool to have this in sf! Would this fit better as a possible st_dissolve()
function or as a special case of st_union()
?
Thanks @loreabad6 that clarifies! My guess is that the only difference between the bottom two plots is a single queen neighbour, and I'm not sure that is what you want with "dissolve".
Forgive me if this is not the right thread to discuss this, but I have always wondered why {sf} didn't have an existing st_dissolve()
function that does something similar to the following:
object_dissolved <- object %>%
group_by(attribute) %>%
summarize(
geometry = sf::st_union(geometry), # or SHAPE in case of data loaded from GDB
across(everything(), first),
.groups = "drop"
)
Here it is easy since I have single aggregation for all fields (i.e. first
), however, it might be complicated to use different aggregate functions for different columns. For instance, I might want to sum()
an attribute called area
, take a mean()
of population
, and calculate median()
of median_household_income
. In that case, it might be much easier to group_by()
and st_union()
and specify aggregate function separately for different columns. While it would be cool to have a st_dissolve()
function, I am unsure how difficult would it be to implement multiple aggregation within the same function.
@ar-puuk , I may have a small tidyverse-free function for what you describe here. Not bullet proof, though...
Could I ask why the aggregate
method for sf
objects is not adequate for these needs?
Could I ask why the
aggregate
method forsf
objects is not adequate for these needs?
I do not want to hijack the initial issue, but the aggregate function is much less intuitive as is, which is not consistent with other SF
functions. If I were to want to take the mean
of numeric columns, sum
the integer column, and take the first
values of the character column, things would get messy really fast. I ended up using group_by()
and summarize()
.
# Select only the numeric columns
numeric_cols <- sapply(wfrc_taz, is.numeric)
# Perform the aggregation only on numeric columns
wfrc_taz_agg <- aggregate(
wfrc_taz[, numeric_cols], # Subset to numeric columns
by = list(CO_NAME = wfrc_taz$CO_NAME), # Aggregate by CO_NAME
FUN = sum
)
However, the function provided by @rCarto seems to be the exact one I was looking for. Thank you so much. The only wish/question I have is if it would work with logical filters such as in the code below. Also, maybe the fun
argument could take in the actual function (again for consistency), rather than a character of the function name.
r <- st_aggregate(nc, "dummy", c(is.character(), is.integer(), is.numeric()), c(first, sum, mean))
In that case, we would expand the by
argument to take in values such as r colnames %in% c("BIR74", "NWBIR74")
and so on to apply different aggregation functions to different sets of columns.
A common GIS task is to dissolve boundaries based on shared boundaries of polygons. Doing this with R always bends my head a little bit.
I think having a utility function in {sf} to do this would be very handy.
Here is a minimal example of how that function might work. Using
spdep
is probably the best for this task because it already has functions for identifying contiguous polygons as well as the connected subgraphs.This is inspired by @cgmossa's PhD work