mountainMath / cancensus

R wrapper for calling CensusMapper APIs
https://mountainmath.github.io/cancensus/index.html
Other
82 stars 15 forks source link

Issues with Census Division boundaries #177

Closed cgauvi closed 2 years ago

cgauvi commented 2 years ago

Hi, thanks again for all the great work, cancensus is massively convenient. I'm having some issues with polygon boundaries recently. Here's one problem for the 2021 data

mtl <- get_census('CA21',regions = list(PR=24),level='CD',geo_format = 'sf') %>%
 filter(grepl('Montréal',name))
mtl$geometry %>% plot

Selection_447

This boundary is wrong and is different from the one I get on the Stats can website

I suspect some combination of polyon simplification & water removal code is creating the issue: Montreal seems to get cut off at the canal Lachine.


Sys.info()
                                             sysname                                              release                                              version 
                                             "Linux"                                  "5.13.0-40-generic" "#45~20.04.1-Ubuntu SMP Mon Apr 4 09:38:31 UTC 2022" 
                                            nodename                                              machine                                                login 
                       "charles-GE63-Raider-RGB-9SE"                                             "x86_64"                                            "charles" 
                                                user                                       effective_user 
                                           "charles"                                            "charles" 

From my renv.lock

"R": {
    "Version": "4.1.2",
    "Repositories": [
      {
        "Name": "CRAN",
        "URL": "https://cloud.r-project.org"
      }
    ]
  }
 "cancensus": {
      "Package": "cancensus",
      "Version": "0.5.0",
      "Source": "Repository",
      "Repository": "CRAN",
      "Hash": "5d7ffdb84e55a35fad15993e8a4efec3",
      "Requirements": [
        "digest",
        "dplyr",
        "httr",
        "jsonlite",
        "rlang"
      ]
mountainMath commented 2 years ago

Looks right to me. Some differences because 2021 geographies on CensusMapper have been clipped to remove larger waterways.

image

(From GeoSuite)

cgauvi commented 2 years ago

Hum.. the map above isn't cartographic though? Maybe this mapview screenshot will make it clearer that the boundaries are different

Selection_448

mountainMath commented 2 years ago

Ah, I see it now. Thanks. Yes, that's a problem, looks like the "island" got removed because it's too small. Part of the problem of trying to balance faster mapping and download times with accuracy. I will have to think about how to best handle this.

One option is to reduce the simplification for the CDs on CensusMapper, another is to separate out mapping geometries from cancensus geometries. CensusMapper retains high-resolution geographies, but right now they aren't getting accessed via cancensus.

Another way to handle it is to add an option to cancensus to ask for high-resolution geographies. Which will significantly increase download times and load on the server, but maybe ok if it's a option that people can turn on for the times they want higher level geographies like CDs at high spatial resolution/low simplification.

Thoughts?

cgauvi commented 2 years ago

I think the last option (option for full-resolution geographies) is the best: most flexible and meets all needs. I also suspect this might be easier to implement and with the caching, I think it makes sense to have the opportunity to download higher resolution datasets. It also follows some of the standards in other related packages.

c.f. the tigris::counties documentation

cb     If cb is set to TRUE, download a generalized (1:500k) counties file. Defaults to FALSE (the most detailed TIGER file).
mountainMath commented 2 years ago

Ok, will add that to my to-do list. Leaving this issue open until I have a fix for this.

mountainMath commented 2 years ago

This will get fixed in the upcoming release of cancensus. It adds the ability to download high-resolution geographies, and I have also uploaded a slightly less simplified version of the geographies to CensusMapper as I think I went a little too far.

The following code shows what the new geographies look like:

c("simplified","high") |> 
  lapply(function(resolution)
    get_census("CA21",regions=list(CD=2466),geo_format = 'sf',use_cache = FALSE, resolution = resolution) |>
      mutate(resolution=resolution)) |> 
  bind_rows() |> 
  ggplot() + 
  geom_sf() + 
  facet_wrap(~resolution) + 
  coord_sf(datum=NA)
image

The new version of the package also allows for recalling census data, emitting a warning when recalled data has been cached or is being used, and a convenience method to remove recalled locally cached data. I will recall the geographies mid next week after doing some more testing, and (hopefully) having the new version of the package up on CRAN.

dshkol commented 2 years ago

This is very cool!

Would it be possible to calculate or estimate the size of the download at the time of the API call for different resolution levels? If someone pulls every DA in Canada at high resolution, we should warn them first. If this is harder to implement, then we can think about how to set a healthy default and steer users to safety in the documentation.

mountainMath commented 2 years ago

The default is still the simplified geographies at each level. At the DA level there is no (or very little) difference between simplified and high resolution. One thing I have done server side is slapped an API point penalty on getting high resolution geographies, so if someone is requesting high resolution DA for all of Canada the server will just sent an error that saying the user does not have enough API quota. In practice, only users with advanced API privileges will ever have the opportunity to download lots of high-resolution geographies, and those users probably know what they are doing. So don't think this will be much of an issue in practice. Although might need better documentation.

cgauvi commented 2 years ago

Looks great! Thanks for taking the time to look into this.