usds / justice40-tool

A tool to identify disadvantaged communities due to environmental, socioeconomic and health burdens
https://screeningtool.geoplatform.gov/
Creative Commons Zero v1.0 Universal
133 stars 42 forks source link

Update EJSCREEN data with newly refreshed data set & generate comparison report #1262

Closed BethMattern closed 2 years ago

BethMattern commented 2 years ago

EJSCREEN is scheduled to release data updates on February 18th. The new EJSCREEN data should be imported.

The comparison tool outputs should be run so that we can report on how the EJSCREEN refresh impacted the DACs by state, region, etc.

Updated data at the tract level can be found here: https://gaftp.epa.gov/EJSCREEN/2021

BethMattern commented 2 years ago

EJSCREEN team has not yet compiled the data at the tract level. It should be in a few weeks. When it's available I think it will be here: https://www.epa.gov/ejscreen/download-ejscreen-data%3C

emma-nechamkin commented 2 years ago

need to crosswalk to 2010 tracts; census releases this. Make sure to post it to s3, make sure to store it in the census folder.

emma-nechamkin commented 2 years ago

A few notes here:

Data is pulled from here: https://gaftp.epa.gov/EJSCREEN/2021/. Many of the sources aren't newly updated.

Here is some summary information in full. Here, I show how many tracts would be flagged as 90th percentile or above as well as the jaccard similarity between old and new. I also show 5 random values for each column. I included demographic data to be thorough. I'm particularly worried by the air toxics (the numbers seem quite different in format) -- but anything under a Jaccard of 0.5 seems not great to me. In particular, I am a bit worried by wastewater, pm2.5, and respiratory hazard (and continuing to look into it).

I'm also confused because some of these data sources haven't been updated. Why, for example, would the air toxics risk be different when the year of the data (2017, according to the codebook) is so old?

Details below:

Total population flag
    Old tracts length: 7401
    New tracts length: 7401
    Jaccard similarity: 0.8668180098373061
    Some old sample values: 6350, 3496, 4713, 2999, 2139
    Some new sample values: 7406, 4678, 5090, 3412, 2434
Air toxics cancer risk flag
    Old tracts length: 7341
    New tracts length: 1571
    Jaccard similarity: 0.09916132215096202
    Some old sample values: 35.3813030333, 25.3684167478, 28.7141235589, 14.6733538858, 24.3026240618
    Some new sample values: 20.0, 40.0, 20.0, 20.0, 30.0
Respiratory hazard index flag
    Old tracts length: 7341
    New tracts length: 3585
    Jaccard similarity: 0.3152762730227519
    Some old sample values: 0.275981612324, 0.585702211296, 0.288215052846, 0.67141804196, 0.323137075634
    Some new sample values: 0.2, 0.6, 0.4, 0.2, 0.2
Diesel particulate matter exposure flag
    Old tracts length: 7341
    New tracts length: 7343
    Jaccard similarity: 0.628298957640275
    Some old sample values: 0.634440538, 0.32676043522, 0.858861319, 0.3569607453, 0.161583861
    Some new sample values: 0.757447405572739, 0.161591064399115, nan, 0.104112644022358, 0.110426696675686
PM2.5 in the air flag
    Old tracts length: 7229
    New tracts length: 7229
    Jaccard similarity: 0.5749455337690632
    Some old sample values: 9.42114383562, 6.77319150685, 7.34096356164, 8.28206082192, 7.88000383562
    Some new sample values: 7.25130397260274, 9.26356328767123, 13.7640109589041, 9.29786657534247, 7.69686383561644
Ozone flag
    Old tracts length: 7229
    New tracts length: 7229
    Jaccard similarity: 0.8481400997059951
    Some old sample values: 46.8314581699, 54.5876705882, 31.9913320261, 38.3019875817, nan
    Some new sample values: 43.058670588235294, 34.6138594771242, 48.2984947712418, 35.152485620915, 55.9624320261438
Traffic proximity and volume flag
    Old tracts length: 7131
    New tracts length: 7133
    Jaccard similarity: 0.6359674274572772
    Some old sample values: 614.699392626, 189.636568307, 132.167500463, 4255.81870029, 3054.54581389
    Some new sample values: 266.09979832735894, 18.5967535381246, 634.0767897135289, 1895.80059063682, 120.885171808842
Proximity to Risk Management Plan (RMP) facilities flag
    Old tracts length: 7401
    New tracts length: 7401
    Jaccard similarity: 0.9509687623566627
    Some old sample values: 1.6542134194, 1.73792633621, 0.736821407521, 0.645033802464, 0.0141643946934
    Some new sample values: 0.192158542939031, 0.042607726878983, 0.066131613817973, 0.129044261488958, 0.130404811513743
Proximity to hazardous waste sites flag
    Old tracts length: 7401
    New tracts length: 7401
    Jaccard similarity: 0.6483296213808464
    Some old sample values: 2.49811541613, 0.378659184592, 0.175678712236, 3.13717342484, 0.578480852402
    Some new sample values: 0.25364136883734, 0.625559943408489, 4.1019853100848, 2.99056286722427, 1.00162672565321
Proximity to NPL sites flag
    Old tracts length: 7401
    New tracts length: 7401
    Jaccard similarity: 0.9911218724778047
    Some old sample values: 0.014585877207, 0.0631919377664, 0.1982174667, 0.0384528482939, 0.0790641061058
    Some new sample values: 0.07704736204817, 0.127227830366887, 0.059991724245078, 0.042769179019239, 0.066035753448489
Wastewater discharge flag
    Old tracts length: 5341
    New tracts length: 5403
    Jaccard similarity: 0.5259196136912371
    Some old sample values: nan, nan, 6.23276001813e-05, 0.0446368906659, 1.84829582971e-06
    Some new sample values: 0.000198243652825, 0.003680314198123, 0.000146311061205, 6.9826265182e-05, 0.001225818573475
Percent of households in linguistic isolation flag
    Old tracts length: 7401
    New tracts length: 7401
    Jaccard similarity: 0.8195451751690227
    Some old sample values: 0.0184254606365, 0.11706629055, 0.00688515560452, 0.0362239297475, 0.042042042042
    Some new sample values: 0.0, 0.080519480519481, 0.018255578093306, 0.0, 0.013182674199623
Poverty (Less than 200% of federal poverty line) flag
    Old tracts length: 7401
    New tracts length: 7401
    Jaccard similarity: 0.7625625148844963
    Some old sample values: 0.26125554851, 0.387816646562, 0.39463850528, 0.345983554712, 0.686848958333
    Some new sample values: 0.694397853069439, 0.106005459508644, 0.437935133299371, 0.475369458128078, 0.484296648192135
Individuals over 64 years old flag
    Old tracts length: 7401
    New tracts length: 7401
    Jaccard similarity: 0.7498522283957915
    Some old sample values: 0.119673088149, 0.198640260124, 0.10472972973, 0.161154855643, 0.0588756086764
    Some new sample values: 0.043714785311832, 0.071450569756698, 0.070710696338837, 0.107413010590015, 0.059259259259259
Individuals under 5 years old flag
    Old tracts length: 7401
    New tracts length: 7401
    Jaccard similarity: 0.5300806284887327
    Some old sample values: 0.0741700023546, 0.049715370019, 0.0376971931981, 0.0668352030337, 0.0280062236052
    Some new sample values: 0.084160552438498, 0.13182799269467, 0.034823731728289, 0.06568516421291, 0.073155687145324
Percent pre-1960s housing (lead paint indicator) flag
    Old tracts length: 7401
    New tracts length: 7401
    Jaccard similarity: 0.8346554288547348
    Some old sample values: 0.285588364918, 0.257297748123, 0.253637245393, 0.839449541284, 0.0355750487329
    Some new sample values: 0.011410788381743, 0.487096774193548, 0.057889822595705, 0.042607428987618, 0.040295748613678
emma-nechamkin commented 2 years ago

Distribution of pm2.5, regional -- very different

Screen Shot 2022-04-25 at 5.36.37 PM.png

emma-nechamkin commented 2 years ago

Scale of "any flag" isn't quite as huge, but most tracts appear to be caught by EJScreen?

Screen Shot 2022-04-25 at 5.38.37 PM.png

emma-nechamkin commented 2 years ago

It appears the technical documentation is also old, so it's kind of challenging to suss out what might have shifted. the codebook only includes year for the 2017 air toxics, diesel, etc indicators. Can we ask EPA?

emma-nechamkin commented 2 years ago

The scale of this issue in terms of FLAGGING TRACTS is small -- only about 1000-1500 "low income" tracts don't get included from EJScreen alone... but I'm a bit concerned by the shifts, especially for such old data

emma-nechamkin commented 2 years ago

In sum:

Lucas suggests reaching out to our friends at the EPA.

(to discuss in data team mtg)