usds / justice40-tool

A tool to identify disadvantaged communities due to environmental, socioeconomic and health burdens
https://screeningtool.geoplatform.gov/
Creative Commons Zero v1.0 Universal
133 stars 42 forks source link

As an agency partner, I want to be able to access the data of % DACs per county and zip code #1803

Open lucasmbrown-usds opened 2 years ago

lucasmbrown-usds commented 2 years ago

Description Many program officers distribute funding not at the census tract level but using other geographies such as county or zip code. In order for them to use the definition of DACs, they need to translate the census tract data into those other geographic units.

Solution Modify the ETL pipeline to create two separate files:

  1. Counties with the percent of the county that lives in a disadvantaged census tract
  2. Zip codes with the percent of the zip code that lives in a disadvantaged census tract

Make these files available in S3 with a link from Github or another location.

lucasmbrown-usds commented 2 years ago

@BethMattern - can you confirm that these files should be created as both CSVs and Excels?

And also, where would you like these files to live? Here are some options:

  1. As standalone files that get linked from the https://screeningtool.geoplatform.gov/en/downloads page. Such as,

    • "Zip codes by percent of tracts within the zip code that are identified as disadvantaged.csv"
    • "Zip codes by percent of tracts within the zip code that are identified as disadvantaged.xlsx"
    • "Counties by percent of tracts within the county that are identified as disadvantaged.csv"
    • "Counties by percent of tracts within the county that are identified as disadvantaged.xlsx"
  2. Combined into one big zip file that can be downloaded

    • "Zip codes and counties by percent of tracts that are identified as disadvantaged.zip" (contains 4 files)
lucasmbrown-usds commented 2 years ago

We have a bit of a sticking point here. There's no tool out there to map from 2010 Census Tracts to 2020 Zip Codes by weighted population. GeoCorr does not support it.

Ideally, we would use weighted population to representing the % of people inside of a zip code who live in DACs. This will be more accurate than using simply geographic area, since the zip code may have most of the population concentrated in a certain number of tracts.

So we have only a few options I'm aware of:

  1. We could convert 2010 Census Tracts to 2020 ZCTA by geographic overlap only, not using population at all. This is already implemented.

  2. We could convert from 2010 Census Tracts to 2010 ZCTA by weighted population using Geocorr, and then convert from 2010 ZCTA to 2020 ZCTA using geographic overlap only.

  3. We could convert from 2010 Census Tracts to 2010 Census Blocks by weighted population using Geocorr, then use geographic overlap to convert from 2010 Census Blocks to 2020 Census Blocks using census relationship files, and then convert 2020 Census Blocks to 2020 ZCTA by weighted population using GeoCorr.

  4. Very similar to Option 3: We could convert from 2010 Census Tracts to 2010 Census Blocks by weighted population using Geocorr, then use NHGIS's crosswalk files to convert from 2010 Census Blocks to 2020 Census Blocks by weighted population, and then convert 2020 Census Blocks to 2020 ZCTA by weighted population using GeoCorr.

lucasmbrown-usds commented 2 years ago

Finally, after all the above is completed, we need to map 2020 ZCTAs to 2020 zip codes, which are not the same.

ZCTAs can include one or more zip codes. See explanation here.

To convert from 2020 ZCTA to 2020 Zips, we have a couple of options:

https://github.com/censusreporter/acs-aggregate/blob/master/crosswalks/zip_to_zcta/ZIP_ZCTA_README.md

and

https://udsmapper.org/zip-code-to-zcta-crosswalk/

The latter seems to be more actively maintained.

Many thanks to @JoeGermuska for his help assembling this information! Joe also recommends posting to https://acsdatacommunity.prb.org/discussion-forum/ with this question.

lucasmbrown-usds commented 2 years ago

Do ranges instead of single number.

Another option:

Produce spreadsheet with:

  1. 2020(ish) Zip
  2. % of Zip geographically within a tract
  3. 2019 tract population inside that zip code that would be in that zip if population were evenly geographically distributed within tracts
  4. % of Zip population within that tract (calculated from field 3)
  5. % of zip geographically within DAC
  6. % of zip population* within DAC
  7. Range of the two estimates (smallest to largest)

1-2 is already implemented.

An idea we've eliminated: ~Randomly sample dividing 2010 tracts into 2010 blocks and getting ranges of the distribution. Generate a parameter q*x where the population of a block can be divided into tracts within the range q and (1-q).~

3 can be implemented by:

  1. Assume populations are distributed when they cross ZCTA with proportion: .5 per unit of area, evenly per unit of area (1?), 1.5 per unit of area.
  2. Load the 2019 tract population.
  3. Take % of the tract in the ZCTA and multiply it by population to get estimate A of tract population in ZCTA.
  4. Calculate an estimate of the lower estimate of population in a DAC: For every tract that spans the border, take each DAC tract that spans the border and multiply its population field (field 3) by its % spatially in the ZCTA and then multiply by .5, and then take every DAC tract that spans the border and multiply its population field (field 3) by its % spatially in the ZCTA and then multiply by 1.5, then calculate DAC estimated population by zip estimated population.