usds / justice40-tool

A tool to identify disadvantaged communities due to environmental, socioeconomic and health burdens
https://screeningtool.geoplatform.gov/
Creative Commons Zero v1.0 Universal
134 stars 42 forks source link

As a developer, I want to have population density in the data, so that I can understand the difference between urban and rural communities. #355

Closed lucasmbrown-usds closed 3 years ago

lucasmbrown-usds commented 3 years ago

Description We currently have scores that rank order each community (Census Block Groups). One can map each community to either Rural or Urban. We want to know what the distribution of the rank ordering is across urban and rural communities. Some starting questions might be:

  1. Of the communities identified as "disadvantaged" what percent are urban vs rural (how does that compare to the inherent percentage of urban vs rural CBGs)?
  2. Of urban areas, how many are identified as "disadvantaged"?
  3. Of rural areas, how many are identified as "disadvantaged"?

Note, this literally applies to scores that generate a classification (1 or 0) of disadvantaged, which admittedly are most of the scores. For score that produce a rank ordering, will need a different definition, like mean or some other way to measure the distribution.

Links to user research or other resources

Tasks

Definition of "Done"

Relevant Links

  1. See this relevant Slack Thread for Lucas's PR
  2. See this relevant Slack Thread for finding files on AWS.
VincentLaUSDS commented 3 years ago

The UI will look something like this for Urban/Rural image

Look at https://github.com/usds/justice40-tool/pull/660 to see a similar example of how to incorporate into the ETL scripts.

The actual CSV you get is attached here

geocorr2014_2125804280.csv

Basically the data set is unique at the

census block group, Urban/Rural indicator

level. A census block group might show up twice if it has both urban and rural components. If the census block group has both urban and rural components, it will show up as two rows, one for "U" and one for "R" with the weighting given by population.

The query string should look something like this: https://github.com/usds/justice40-tool/blob/47df35b77e840b2b2e303853db751d58c355d874/data/data-pipeline/data_pipeline/etl/sources/census_acs_median_income/etl.py#L178

In our case it is: https://mcdc.missouri.edu/cgi-bin/broker?_PROGRAM=apps.geocorr2014.sas&_SERVICE=MCDC_long&_debug=0&state=Mo29&state=Al01&state=Ak02&state=Az04&state=Ar05&state=Ca06&state=Co08&state=Ct09&state=De10&state=Dc11&state=Fl12&state=Ga13&state=Hi15&state=Id16&state=Il17&state=In18&state=Ia19&state=Ks20&state=Ky21&state=La22&state=Me23&state=Md24&state=Ma25&state=Mi26&state=Mn27&state=Ms28&state=Mt30&state=Ne31&state=Nv32&state=Nh33&state=Nj34&state=Nm35&state=Ny36&state=Nc37&state=Nd38&state=Oh39&state=Ok40&state=Or41&state=Pa42&state=Ri44&state=Sc45&state=Sd46&state=Tn47&state=Tx48&state=Ut49&state=Vt50&state=Va51&state=Wa53&state=Wv54&state=Wi55&state=Wy56&g1_=bg&g2_=ur&g2_=ua&wtvar=pop10&nozerob=1&title=&csvout=1&namoptf=b&listout=1&lstfmt=html&namoptr=b&oropt=&counties=&metros=&places=&latitude=&longitude=&locname=&distance=&kiloms=0&nrings=&r1=&r2=&r3=&r4=&r5=&r6=&r7=&r8=&r9=&r10=&lathi=&latlo=&longhi=&longlo=

I've put the data set in my local laptop as:

/c/github/justice40-tool/data/data-pipeline/data_pipeline/data/geocorr

just to play around with it.

VincentLaUSDS commented 3 years ago

Slack thread on running code: https://usds.slack.com/archives/C0222FBGQ65/p1631808402174600