wfmackey / absmapsdata

Use ABS ASGS data easily in R
58 stars 18 forks source link

Geometries for some SA1s are very badly mangled #71

Open nerskin opened 7 months ago

nerskin commented 7 months ago

I've noticed for a handful of SA1s in the dataset sa12021 that the polygon in the geometry column is very wrong.

One example is SA1 80109110916. Below is a map showing this SA1 as it appears in the sa12021 dataset (it's the tiny red sliver in the middle of the plot).

image

If I instead use the SA1 shapefiles published by the ABS, the map looks like this:

image

The SA1s affected by this issue can be identified by comparing the area of the SA1 as computed by sf::st_area with the area given in the variable areasqkm_2021. Reprex:

library(absmapsdata)
library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.1, PROJ 9.3.1; sf_use_s2() is TRUE
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(units)
#> udunits database from /opt/homebrew/lib/R/4.3/site-library/units/share/udunits/udunits2.xml

sa12021 |> 
  filter(sa1_code_2021 == "80109110916") |>
  mutate(area_sf = set_units(st_area(geometry),"km^2")) |> 
  select(area_sf,areasqkm_2021)
#> Simple feature collection with 1 feature and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 149.0832 ymin: -35.33873 xmax: 149.0843 ymax: -35.33796
#> Geodetic CRS:  WGS 84
#>               area_sf areasqkm_2021                       geometry
#> 1 0.0003975106 [km^2]        0.0099 MULTIPOLYGON (((149.0832 -3...

Created on 2024-03-31 with reprex v2.1.0

nerskin commented 7 months ago

I'm guessing this is caused by how the geometries are simplified before being stored in this package, but is it possible to do this in a way that doesn't completely destroy some of the polygons?

wfmackey commented 7 months ago

thanks for raising this.

yes, this is an issue with the simplification level and its interactions with your example SA1 and its neighbouring SA1s.

if i had my time again, i would have better balanced the simplification level for SA1s, or not simplified them at all. maybe a future update could use rmapshaper::ms_simplify(keep = 0.3) rather than rmapshaper::ms_simplify(keep = 0.1). this increases the file size but has with far better results, eg:

image

may also be worth hosting the data objects somewhere with greater storage capacity to allow for unsimplified sf objects to be hosted.

i don't have time to explore alternatives at the moment, but keen to hear if there are any better ideas out there.

nerskin commented 7 months ago

thanks for raising this.

yes, this is an issue with the simplification level and its interactions with your example SA1 and its neighbouring SA1s.

if i had my time again, i would have better balanced the simplification level for SA1s, or not simplified them at all. maybe a future update could use rmapshaper::ms_simplify(keep = 0.3) rather than rmapshaper::ms_simplify(keep = 0.1). this increases the file size but has with far better results, eg:

image

may also be worth hosting the data objects somewhere with greater storage capacity to allow for unsimplified sf objects to be hosted.

i don't have time to explore alternatives at the moment, but keen to hear if there are any better ideas out there.

I think finding a way to host unsimplified objects would be the best long-term solution (maybe as geoparquet files which allow for fast reading and reasonably compact storage)

wfmackey commented 7 months ago

agree. something i can explore down the line (and happy to take PRs from anyone reading this!)