Closed BethMattern closed 2 years ago
Note from @emma-nechamkin - if you are in the 99th percentile for all categories of our environmental indicators, but 64th percentile for low income, you WONT be a DAC... but if you're in 90th percentile for just one and 65th percentile for low income, you WILL be a DAC. I think we can look into this a bit more when we analyze how many thresholds a tract exceeds.
A few high-level notes here
Categories tracks some, but not all, metrics of disadvantage. We should discuss in a meeting.
Categories are not super correlated (for 1, 0) but certain indicators ARE (e.g., income and housing burden are quite correlated). --> might want to look at PCA / LDA
Note that all of the analysis below does not include donut hole DACs.
Another question that we had pertained to "just below the threshold" tracts. I think (preliminarily) that this issue is overblown in our collective imagination.
There are always going to be tracts that fall just outside of whatever boundary we set, by nature of setting any boundary. With that in mind, we can look at the number of tracts that fall between 80th and 90th percentile for our indicators, the number of tracts that fall between 80th and 90th percentile for our indicators and are low income, and the number of tracts that fall between 80th and 90th percentile for our indicators AND are low income AND are not already identified by the tool (narwhal). TL;DR -- most of these boundary tracts are already included.
With that in mind, we can look at a few distributions, which adds a wrinkle / complication here. The "would be" inclusions "exceed threshold count" is shifted ever so slightly to the right. However, at least preliminarily, I'm not sure how big of an impact this has.
These distributions will be in a research notebook posted soon.
In re "the next 10%" -- there are a few indicators for which there appear to be disadvantaged tracts in the next 10% (e.g., diabetes, pm2.5), but the point remains that MOST tracts are already included.
In addition, we can also look at "share already flagged" and it's universally quite high.
Even most rudimentary scales for totaling categories on average better reflect metrics of underlying disadvantage. Consider graph below -- blue line is adjusted as: sum(category in territory / count positive for category) * max(new sum) / max (straight sum).
This regression suggests that even this basic new sum may better represent underlying burden.
We are going to consider this in a few ways:
In addition, we might want to consider how the "next" group would be included. How would these correlations shift under other thresholds? How does jaccard similarity change over different cuts of the data.
CEQ's intern Gianna has started to look a little bit at correlation between individual indicators. I am interested in continuing this work.