Closed BethMattern closed 2 years ago
EJSCREEN team has not yet compiled the data at the tract level. It should be in a few weeks. When it's available I think it will be here: https://www.epa.gov/ejscreen/download-ejscreen-data%3C
need to crosswalk to 2010 tracts; census releases this. Make sure to post it to s3, make sure to store it in the census folder.
A few notes here:
Data is pulled from here: https://gaftp.epa.gov/EJSCREEN/2021/. Many of the sources aren't newly updated.
Here is some summary information in full. Here, I show how many tracts would be flagged as 90th percentile or above as well as the jaccard similarity between old and new. I also show 5 random values for each column. I included demographic data to be thorough. I'm particularly worried by the air toxics (the numbers seem quite different in format) -- but anything under a Jaccard of 0.5 seems not great to me. In particular, I am a bit worried by wastewater, pm2.5, and respiratory hazard (and continuing to look into it).
I'm also confused because some of these data sources haven't been updated. Why, for example, would the air toxics risk be different when the year of the data (2017, according to the codebook) is so old?
Details below:
Total population flag
Old tracts length: 7401
New tracts length: 7401
Jaccard similarity: 0.8668180098373061
Some old sample values: 6350, 3496, 4713, 2999, 2139
Some new sample values: 7406, 4678, 5090, 3412, 2434
Air toxics cancer risk flag
Old tracts length: 7341
New tracts length: 1571
Jaccard similarity: 0.09916132215096202
Some old sample values: 35.3813030333, 25.3684167478, 28.7141235589, 14.6733538858, 24.3026240618
Some new sample values: 20.0, 40.0, 20.0, 20.0, 30.0
Respiratory hazard index flag
Old tracts length: 7341
New tracts length: 3585
Jaccard similarity: 0.3152762730227519
Some old sample values: 0.275981612324, 0.585702211296, 0.288215052846, 0.67141804196, 0.323137075634
Some new sample values: 0.2, 0.6, 0.4, 0.2, 0.2
Diesel particulate matter exposure flag
Old tracts length: 7341
New tracts length: 7343
Jaccard similarity: 0.628298957640275
Some old sample values: 0.634440538, 0.32676043522, 0.858861319, 0.3569607453, 0.161583861
Some new sample values: 0.757447405572739, 0.161591064399115, nan, 0.104112644022358, 0.110426696675686
PM2.5 in the air flag
Old tracts length: 7229
New tracts length: 7229
Jaccard similarity: 0.5749455337690632
Some old sample values: 9.42114383562, 6.77319150685, 7.34096356164, 8.28206082192, 7.88000383562
Some new sample values: 7.25130397260274, 9.26356328767123, 13.7640109589041, 9.29786657534247, 7.69686383561644
Ozone flag
Old tracts length: 7229
New tracts length: 7229
Jaccard similarity: 0.8481400997059951
Some old sample values: 46.8314581699, 54.5876705882, 31.9913320261, 38.3019875817, nan
Some new sample values: 43.058670588235294, 34.6138594771242, 48.2984947712418, 35.152485620915, 55.9624320261438
Traffic proximity and volume flag
Old tracts length: 7131
New tracts length: 7133
Jaccard similarity: 0.6359674274572772
Some old sample values: 614.699392626, 189.636568307, 132.167500463, 4255.81870029, 3054.54581389
Some new sample values: 266.09979832735894, 18.5967535381246, 634.0767897135289, 1895.80059063682, 120.885171808842
Proximity to Risk Management Plan (RMP) facilities flag
Old tracts length: 7401
New tracts length: 7401
Jaccard similarity: 0.9509687623566627
Some old sample values: 1.6542134194, 1.73792633621, 0.736821407521, 0.645033802464, 0.0141643946934
Some new sample values: 0.192158542939031, 0.042607726878983, 0.066131613817973, 0.129044261488958, 0.130404811513743
Proximity to hazardous waste sites flag
Old tracts length: 7401
New tracts length: 7401
Jaccard similarity: 0.6483296213808464
Some old sample values: 2.49811541613, 0.378659184592, 0.175678712236, 3.13717342484, 0.578480852402
Some new sample values: 0.25364136883734, 0.625559943408489, 4.1019853100848, 2.99056286722427, 1.00162672565321
Proximity to NPL sites flag
Old tracts length: 7401
New tracts length: 7401
Jaccard similarity: 0.9911218724778047
Some old sample values: 0.014585877207, 0.0631919377664, 0.1982174667, 0.0384528482939, 0.0790641061058
Some new sample values: 0.07704736204817, 0.127227830366887, 0.059991724245078, 0.042769179019239, 0.066035753448489
Wastewater discharge flag
Old tracts length: 5341
New tracts length: 5403
Jaccard similarity: 0.5259196136912371
Some old sample values: nan, nan, 6.23276001813e-05, 0.0446368906659, 1.84829582971e-06
Some new sample values: 0.000198243652825, 0.003680314198123, 0.000146311061205, 6.9826265182e-05, 0.001225818573475
Percent of households in linguistic isolation flag
Old tracts length: 7401
New tracts length: 7401
Jaccard similarity: 0.8195451751690227
Some old sample values: 0.0184254606365, 0.11706629055, 0.00688515560452, 0.0362239297475, 0.042042042042
Some new sample values: 0.0, 0.080519480519481, 0.018255578093306, 0.0, 0.013182674199623
Poverty (Less than 200% of federal poverty line) flag
Old tracts length: 7401
New tracts length: 7401
Jaccard similarity: 0.7625625148844963
Some old sample values: 0.26125554851, 0.387816646562, 0.39463850528, 0.345983554712, 0.686848958333
Some new sample values: 0.694397853069439, 0.106005459508644, 0.437935133299371, 0.475369458128078, 0.484296648192135
Individuals over 64 years old flag
Old tracts length: 7401
New tracts length: 7401
Jaccard similarity: 0.7498522283957915
Some old sample values: 0.119673088149, 0.198640260124, 0.10472972973, 0.161154855643, 0.0588756086764
Some new sample values: 0.043714785311832, 0.071450569756698, 0.070710696338837, 0.107413010590015, 0.059259259259259
Individuals under 5 years old flag
Old tracts length: 7401
New tracts length: 7401
Jaccard similarity: 0.5300806284887327
Some old sample values: 0.0741700023546, 0.049715370019, 0.0376971931981, 0.0668352030337, 0.0280062236052
Some new sample values: 0.084160552438498, 0.13182799269467, 0.034823731728289, 0.06568516421291, 0.073155687145324
Percent pre-1960s housing (lead paint indicator) flag
Old tracts length: 7401
New tracts length: 7401
Jaccard similarity: 0.8346554288547348
Some old sample values: 0.285588364918, 0.257297748123, 0.253637245393, 0.839449541284, 0.0355750487329
Some new sample values: 0.011410788381743, 0.487096774193548, 0.057889822595705, 0.042607428987618, 0.040295748613678
Distribution of pm2.5, regional -- very different
Scale of "any flag" isn't quite as huge, but most tracts appear to be caught by EJScreen?
It appears the technical documentation is also old, so it's kind of challenging to suss out what might have shifted. the codebook only includes year for the 2017 air toxics, diesel, etc indicators. Can we ask EPA?
The scale of this issue in terms of FLAGGING TRACTS is small -- only about 1000-1500 "low income" tracts don't get included from EJScreen alone... but I'm a bit concerned by the shifts, especially for such old data
In sum:
Lucas suggests reaching out to our friends at the EPA.
(to discuss in data team mtg)
EJSCREEN is scheduled to release data updates on February 18th. The new EJSCREEN data should be imported.
The comparison tool outputs should be run so that we can report on how the EJSCREEN refresh impacted the DACs by state, region, etc.
Updated data at the tract level can be found here: https://gaftp.epa.gov/EJSCREEN/2021