ua-snap / epa-justice

US Census and CDC data access via API
MIT License
0 stars 0 forks source link

2010 vs 2020 geography in CDC PLACES dataset #4

Open Joshdpaul opened 4 months ago

Joshdpaul commented 4 months ago

This issue came up when looking at the data pulled for Eagle River and JBER (see attached table). These are the only communities where the PI's want to aggregate the values from multiple census geographies into one value. In both cases, we are using multiple census tracts to define the community. tract_agg.csv

The PI's chose 2020 census tract GEOIDs to populate the tbl/NCRPlaces_Census_04192024.csv. When we request data using those tract GEOIDs, we are seeing data from all datasets (Census data, CDC SDOH data, and CDC PLACES) in 2020 tracts 2.01, 2.02, and 2.04. But we are missing the CDC PLACES data for tracts 2.05 and 2.06.

This is because, for some unknown reason, the CDC PLACES data uses 2010 census geographies. There used to be a 2010 tract 2.03 that has actually been split into two new tracts in 2020 (2.05 and 2.06).

So when we attempt to aggregate the data for each variable, we lose some of the variables due to the NA values for tracts 2.05 and 2.06. According to the PI's, "there is a note on their website that they're going to update the boundaries this summer so our hope is that if there were any boundary changes in AK between the censuses, that would be resolved with the update."

For now, we will return NA if there is any missing data in the aggregation formula. But we will need to revisit this if/when then geography is updated, as we may need to regenerate the results table for this and potentially other non-aggregated communities.

Joshdpaul commented 4 months ago

Two other CDPs also fall into the situation detailed above. From the PI:

"There were two CDPs -- North Lakes and South Lakes CDPs -- that were missing CDC data but have fairly substantial populations. It looks like it used to be one CDP and was split before the 2020 census, so it might be another one that like JBER and Eagle River to check back on after CDC does their summer update to the boundaries?"

These two CDPs are in the Wasilla area.

Joshdpaul commented 4 months ago

From the PI via email, another CDP to add to this list:

"...add Mill Bay CDP to that (another new CDP in 2020 so it won't have data for PLACES measures yet)"