usds / justice40-tool

A tool to identify disadvantaged communities due to environmental, socioeconomic and health burdens
https://screeningtool.geoplatform.gov/
Creative Commons Zero v1.0 Universal
133 stars 42 forks source link

Analyze higher education enrollment data to verify that the cut point is in the right place #1273

Closed BethMattern closed 2 years ago

BethMattern commented 2 years ago

The results of this analysis could also help us properly describe the characteristics of the census tracts that are getting excluded as a result of this threshold so that users can better understand the effects.

emma-nechamkin commented 2 years ago

This is tricky to walk through because super-college-heavy college towns have a lot of students (like >80%), but the vast majority of census tracts (even those with college students) do not have many. 20% is above the 90th percentile for all census tracts, and for all census tracts with >2% college students.

There's a bigger question here, though -- do we want to screen out all college tracts, or those that are particularly well-resourced?

However, a few things here give me pause:

  1. We do not adjust low income to account for college students, which leads to some weirdness: The 200FPL share for our poverty indicator is around 39% (smallest share I found with low income bool). If a tract has 20% college students but is in the 97th percentile for low income (share: 77%), even if every single student were contributing to the 200FPL statistic, the remaining residents would still be living at higher than 65th percentile for low income. This is very different than if we had, say, 90% college students -- then, there wouldn't be many residents likely living below 200FPL for reasons other than temporarily being a student! --> This makes me think we might consider adjusting income in some way to account for college students.
  2. The cutpoint might be too low: There are only about 4,000 colleges in the US. If we assume that about half of them have a high rate of resident students (e.g., not a commuter college -- and this would far exceed the schools USNW ranks, so this doesn't seem crazy), cutting the variable at a generally high cut-point might actually be low for what we are trying to measure. The 2,000th tract (assuming 1:1 tract:college), ranked by pct of college students, has 25% college students. I'd argue that's a conservative value to have as well! That's a lot of assumptions, but I think that we are less concerned about including some colleges than others (e.g., I think we probably don't want to include UC Berkeley, but we might be open to including a tract near another institution that has 20% college students and is in the 99th percentile for low income). Another way to think this through -- there are 11.9M full time college students (about 3000 tracts if tracts were ONLY students). Right now, we are identifying ~3300 tracts... so we are getting (I suspect) a plurality of college students (I haven't done the math full out). Do we want to be?
  3. Tracts with hardship are rarely solo: because we know (unfortunately) that disadvantage is often clustered, some nearness / neighbor analysis could allow us to better capture disadvantage.

There are a lot of nuances here, and I think some of it also depends on how we update the higher ed threshold more generally. But I think potentially adjusting for income (e.g., there are at least 1,000 residents who are not college students AND those residents are at or above 65th percentile for 200FPL) could be fruitful, or moving the cutoff value.

emma-nechamkin commented 2 years ago

Moved to review -- will discuss in meeting