pdil / usmap

🗺 Create US maps including Alaska and Hawaii in R
https://usmap.dev/
GNU General Public License v3.0
74 stars 16 forks source link

Connecticut countypop or countypov data not showing in map #102

Closed ile-saenz closed 6 months ago

ile-saenz commented 7 months ago

Describe the bug This snippet generates a plot that is empty in Connecticut, even when the dataset has data for this region. usmap::plot_usmap(data = usmap::countypop, regions = "counties", values = "pop_2015", color = "black", linewidth = 0.1) image image

I think the underlying reason is that this dataset has county names and fips in the underlying usmapdata package uses sf dataset that has planning regions instead of counties (df <- sf::readsf(system.file("extdata", paste0("us", regions, ".gpkg"), package = "usmapdata")) ) Whereas the datasets in the package use county names and fips for those county names This is from the underlying sf from the usmapdata::us_map function image And this is from the countypov dataset (the same naming is true for the countypop datasets. image

The reason why I think this is the case is because I created a rough mapping between planning regions and counties and changed the fips from those datasets to match the planning regions and the mapping works. Although, the issue is that there is not a 1:1 relationship (there are 8 counties and 9 planning regions, I believe).

Since this underlying dataset is the issue, I can' really do much more than that to temporarily solve it. It's confusing that the package uses counties for all the rest of the US and planning regions for Connecticut. To Reproduce

usmap::plot_usmap(data = usmap::countypop, regions = "counties", values = "pop_2015", color = "black", linewidth = 0.1)

List the steps to reproduce the behavior. Include as much detail as possible including sample code and sample data, if applicable.

I started with this snippet usmap::plot_usmap(data = usmap::countypop, regions = "counties", values = "pop_2015", color = "black", linewidth = 0.1) Then, I explore that function usmap::plot_usmap then I followed to usmap::us_map and then lastly to usmapdata::us_map Where I found df <- sf::readsf(system.file("extdata", paste0("us", regions, ".gpkg"), package = "usmapdata")) That has geoms for planning regions and not for counties.

Expected behavior To be able to observed a colored Connecticut when mapping the packages own datasets (countypop, countypov).

If the package is using counties for all of the US, then it should be consistent and use counties and not planning regions for a single state. Or at least, if it's using planning regions for a single state, then this could be consistent between the packages own data sets and the functions

**Environment

Matrix products: default BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so; LAPACK version 3.8.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/New_York tzcode source: system (glibc)

attached base packages: stats graphics grDevices utils datasets methods
base

other attached packages: lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
dplyr_1.1.4 purrr_1.0.2 readr_2.1.5
tidyr_1.3.0 tibble_3.2.1 tidyverse_2.0.0
usmap_0.7.0 ggnewscale_0.4.10
viridis_0.6.5 viridisLite_0.4.2 ggrepel_0.9.4
ggforce_0.4.1 DT_0.31 mapproj_1.2.11
maps_3.4.2 ggplot2_3.4.4 shiny_1.8.0

loaded via a namespace (and not attached): gtable_0.3.4 bslib_0.6.1 htmlwidgets_1.6.4 tzdb_0.4.0 crosstalk_1.2.1 vctrs_0.6.5
tools_4.3.1 generics_0.1.3 proxy_0.4-27
fansi_1.0.6 pkgconfig_2.0.3 KernSmooth_2.23-22 lifecycle_1.0.4 compiler_4.3.1 farver_2.1.1
textshaping_0.3.7 munsell_0.5.0 httpuv_1.6.13
class_7.3-22 htmltools_0.5.7 sass_0.4.8
yaml_2.3.8 crayon_1.5.2 later_1.3.2
pillar_1.9.0 jquerylib_0.1.4 MASS_7.3-60
ellipsis_0.3.2 classInt_0.4-10 rsconnect_1.2.0
cachem_1.0.8 mime_0.12 tidyselect_1.2.0
digest_0.6.34 stringi_1.8.3 sf_1.0-15
labeling_0.4.3 polyclip_1.10-6 fastmap_1.1.1
grid_4.3.1 colorspace_2.1-0 cli_3.6.2
magrittr_2.0.3 utf8_1.2.4 e1071_1.7-14
withr_2.5.2 scales_1.3.0 promises_1.2.1
timechange_0.2.0 gridExtra_2.3 ragg_1.2.7
hms_1.1.3 memoise_2.0.1 ggsflabel_0.0.1
rlang_1.1.3 Rcpp_1.0.12 DBI_1.1.3
xtable_1.8-4 glue_1.7.0 tweenr_2.0.2
pkgload_1.3.4 rstudioapi_0.15.0 usmapdata_0.2.1
jsonlite_1.8.8 R6_2.5.1 units_0.8-5
systemfonts_1.0.5

Additional context

pdil commented 7 months ago

@ile-saenz, thank you for the very detailed report!

Yes, this is something I am aware of. The Census Bureau shapefiles that usmap uses to draw the map switched Connecticut to planning regions instead of counties in 2022 (it turns out CT hasn't actually had counties since 1960 so this was long overdue). This is why there's a mismatch between the map boundaries and the sample data included with the package.

I have updated the population and poverty data sets to the latest versions (2022 for population and 2021 for poverty) which should be available in the next release, probably in usmap v0.7.1 coming soon.

Unfortunately since the Census Bureau's recognition of the planning regions didn't come into effect until 2022, the poverty data set will still have the CT issue since the latest is currently only as of 2021. I hope they release an updated poverty data set soon and I can update it as soon as possible.

Some more info is also located in this issue: #88

lachanskim commented 7 months ago

Hi,

I am currently working with historical data and it occurs to me that any maps made using USMAP are going to break regularly when the Census Bureau updates county boundaries. Would it be possible to keep legacy shapefiles, e.g. 2020 before the Connecticut changes, in usmapdata?

Thank you,

pdil commented 7 months ago

@lachanskim that's an interesting idea, it should be doable but I'll have to look into the file sizes involved since CRAN prefers packages to be less than 5 MB in size.

The reason this issue exists however is because the county FIPS codes themselves changed, which should be a fairly rare occurrence.

lachanskim commented 7 months ago

Thanks. I suppose the rarer the change the better, as the data requirements will go down.