pdil / usmap

🗺 Create US maps including Alaska and Hawaii in R
https://usmap.dev/
GNU General Public License v3.0
74 stars 16 forks source link

CDC County Mapping #73

Closed USMortality closed 9 months ago

USMortality commented 9 months ago

I am using this great package to plot CDC Wonder county level data (e.g.: https://wonder.cdc.gov/controller/datarequest/D176)

County level data is available 1999-2023. Some counties have changed during that time, CDC does not seem to adhere strictly to the US Census mapping, either, but rather uses their own set of counties per year.

Do you have any suggestion how to handle these differences?

Here's an example, which compares the fips codes of the 2020 CDC Wonder query with the fips codes in this package:

cdc2020 = county_all_imputed |> filter(year == 2020)
df = read_csv(system.file("extdata", "county_fips.csv", package = "usmap"))
setdiff(df$fips, cdc2020$fips)

Output:

> setdiff(df$fips, cdc2020$fips)
 [1] "02158" "60030" "60040" "60010" "60020" "60050" "66010" "69085" "69100"
[10] "69110" "69120" "72003" "72005" "72007" "72009" "72011" "72013" "72015"
[19] "72017" "72019" "72021" "72023" "72025" "72027" "72029" "72031" "72033"
[28] "72035" "72037" "72039" "72041" "72043" "72045" "72047" "72049" "72051"
[37] "72053" "72054" "72055" "72057" "72059" "72061" "72063" "72065" "72067"
[46] "72069" "72071" "72073" "72075" "72077" "72079" "72081" "72083" "72085"
[55] "72087" "72089" "72091" "72093" "72095" "72097" "72099" "72101" "72103"
[64] "72105" "72107" "72109" "72111" "72113" "72115" "72117" "72119" "72121"
[73] "72123" "72125" "72127" "72129" "72131" "72133" "72135" "72137" "72139"
[82] "72141" "72143" "72145" "72147" "72149" "72151" "72153" "72001" "46102"
[91] "74300" "51515" "78010" "78020" "78030"

The above fips codes, are missing in the cdc dataset...

I can probably create my own mapping table, just wondering if you had any solutions or if someone had already solved this issue. Thanks!

pdil commented 9 months ago

I can't seem to access that CDC data but based on your code it seems like they are also using FIPS codes but they're just lacking some of them? I think you should be able to plot the data on the map but the counties that correspond to the missing FIPS codes will appear blank (whatever the default color of the map is). Is this what is happening or are you trying to do something other than coloring a map with the data?

USMortality commented 9 months ago

Simple example:

The current, CDC dataset linked above, still uses Alaska's 02270 fips, instead of the reassigned 022158. The fix is to run: df$fips[df$fips == "02270"] <- "02158" on the cdc dataset, there's a handful more of these cases. Screenshot 2023-11-24 at 7 40 12 AM I was just wondering, if anyone has already mapped these out ;) If not, just disregard this, and please close. thx :)

pdil commented 9 months ago

Unfortunately I'm not aware of anyone that's already done that mapping 🙁

USMortality commented 9 months ago

Cool.

I managed to resolve most of it, except Alaska pre 2003, where I some counties were not just split/merged:

# Resolve mapping issues
df |>
    split_row("02261", "02063", "02066") |>
    split_row("02232", "02105", "02230") |>
    split_row("02280", "02275", "02195")
df$fips[df$fips == "02270"] <- "02158"
df$fips[df$fips == "46113"] <- "46102"

split_row is a custom function which simply splices a row into two...