Closed JsLth closed 9 months ago
Hi @JsLth ,
Thank you for reporting this bug.
As a short-term solution, I fixed the problem with the casting you pointed out.
Initially, the union
case you pointed out was the default in previous package versions and, in my opinion, is more intuitive when setting by_polygon = FALSE
.
About this
"This improved the regular point distribution quite a lot, but still left some irregularities”, I would not be bothered by what seems to be an irregularity. Those irregularities are regular grids within polygons with “weird” shapes. Moreover, I would rather use the
No cast
than theunion
grid because the former will estimate the numerical intervals in all the polygons with similar precision.
Below is a reprex
of re-running your code with the updated version of the package (I just pushed to GitHub, so you will have to reinstall it from here. remotes::install_github(‘lcgodoy/smile’)
will do it for you).
Meanwhile, I will work on making the union
grid the default when setting by_polygon = FALSE
.
Thanks again for reporting this bug.
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(smile)
library(tmap)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
counties <- read_sf("https://sgx.geodatenzentrum.de/wfs_vg250?service=wfs&version=2.0.0&request=GetFeature&TYPENAMES=vg250_krs&outputFormat=application/json")
counties <- counties %>%
group_by(ags) %>%
summarise(gen = unique(gen)) %>%
select(ags)
counties$var <- sample(seq(0, 1, 0.001), nrow(counties))
# Default approach: Cast multipolygons to polygons, then sample
spm <- smile::sf_to_spm(
counties,
n_pts = 2000,
poly_ids = "ags",
var_ids = "var"
)
#> Warning: st_centroid assumes attributes are constant over geometries
plot(spm$grid,
pch = 15)
# No cast approach: Cast all geometries to multipolygons, then cast back
# Note: The same results can be achieved by not casting at all, i.e.
# `st_geometry(sf_obj)` instead of `st_cast(st_geometry(sj_obj), "POLYGON")`
counties_poly <- counties %>%
st_cast("MULTIPOLYGON") %>%
st_cast("POLYGON")
#> Warning in st_cast.sf(., "POLYGON"): repeating attributes for all
#> sub-geometries for which they may not be constant
spm_poly <- smile::sf_to_spm(
counties_poly,
n_pts = 2000,
poly_ids = "ags",
var_ids = "var"
)
#> Warning: st_centroid assumes attributes are constant over geometries
# Union approach: Union all geometries before sampling because if
# by_polygon = FALSE, then everything inside the outer boundaries does not
# really seem to matter?
counties_union <- st_sf(st_union(counties))
spm_union <- smile::sf_to_spm(
counties_union,
n_pts = 2000,
poly_ids = "ags",
var_ids = "var"
)
grid_data <- bind_rows(
"Default" = spm$grid,
"No cast" = spm_poly$grid,
"Union" = spm_union$grid, .id = "type"
)
tm_shape(grid_data) +
tm_dots() +
tm_facets("type")
n_pts2 <- 2000 / nrow(counties)
spm_bp <- smile::sf_to_spm(
counties,
n_pts = n_pts2,
by_polygon = TRUE,
poly_ids = "ags",
var_ids = "var"
)
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
#> Warning in st_cast.MULTIPOLYGON(X[[i]], ...): polygon from first part only
grid_data <- bind_rows(
"Default" = spm$grid,
"spm_by_poly" = spm$grid,
"No cast" = spm_poly$grid,
"Union" = spm_union$grid, .id = "type"
)
tm_shape(grid_data) +
tm_dots() +
tm_facets("type")
Created on 2024-01-26 with reprex v2.1.0
Hi @lcgodoy and thanks again for developing this package!
I recently got to try out the package to make point estimates from counties in Germany. However, I noticed that grid sampling is kind of weird when dealing with multipolygons. Particularly, my problem is with the following lines of
sf_to_spm
:https://github.com/lcgodoy/smile/blob/da1194b137fc6a41e44f9e0aad6080796581393c/R/spm.R#L83-L84
The default behavior (in case of
by_polygon = FALSE
) is to extract the geometries of multipolygons and then cast them to"POLYGON"
. This creates some undesirable holes in the grid (see below). My first attempt was to re-sample without casting. This improved the regular point distribution quite a lot, but still left some irregularities ("No cast" in the plot below). Finally, I tried to union all polygons (ultimately, only the outer boundaries really matter, right?) and this left me with a perfectly regular grid over the study area.This leads me to the questions, whether it would be reasonable to union all polygon geometries before sampling if
by_polygon = FALSE
. I think this is a common problem for administrative units (which tend to be fragmented to multipolygons sometimes) and might lead to some weird estimates around the problematic areas.Here is a reprex: