As a developer, I want to have population density in the data, so that I can understand the difference between urban and rural communities.

lucasmbrown-usds commented 3 years ago

Description We currently have scores that rank order each community (Census Block Groups). One can map each community to either Rural or Urban. We want to know what the distribution of the rank ordering is across urban and rural communities. Some starting questions might be:

Of the communities identified as "disadvantaged" what percent are urban vs rural (how does that compare to the inherent percentage of urban vs rural CBGs)?
Of urban areas, how many are identified as "disadvantaged"?
Of rural areas, how many are identified as "disadvantaged"?

Note, this literally applies to scores that generate a classification (1 or 0) of disadvantaged, which admittedly are most of the scores. For score that produce a rank ordering, will need a different definition, like mean or some other way to measure the distribution.

Links to user research or other resources

Tasks

[x] Query map of CBGs to Urban vs Rural from Geocorr. Save download link to add officially to ETL process later
[x] Manually place CSV file that maps CBGs to Urban / Rural locally to computer
[x] Manually generate scores across all CBGs (does the scoring_comparison notebook generate these flat files?)
[ ] Load files into a Jupyter Notebook; Map CBGs to Urban vs Rural
[ ] Do the analysis!
[ ] Check in with Shelby to make sure this actually makes sense
[ ] Add officially to the ETL scripts to make sure it's reproducible moving forward.

Definition of "Done"

Relevant Links

See this relevant Slack Thread for Lucas's PR
See this relevant Slack Thread for finding files on AWS.

VincentLaUSDS commented 3 years ago

The UI will look something like this for Urban/Rural

Look at https://github.com/usds/justice40-tool/pull/660 to see a similar example of how to incorporate into the ETL scripts.

The actual CSV you get is attached here

geocorr2014_2125804280.csv

Basically the data set is unique at the

census block group, Urban/Rural indicator

level. A census block group might show up twice if it has both urban and rural components. If the census block group has both urban and rural components, it will show up as two rows, one for "U" and one for "R" with the weighting given by population.

The query string should look something like this: https://github.com/usds/justice40-tool/blob/47df35b77e840b2b2e303853db751d58c355d874/data/data-pipeline/data_pipeline/etl/sources/census_acs_median_income/etl.py#L178

In our case it is: https://mcdc.missouri.edu/cgi-bin/broker?_PROGRAM=apps.geocorr2014.sas&_SERVICE=MCDC_long&_debug=0&state=Mo29&state=Al01&state=Ak02&state=Az04&state=Ar05&state=Ca06&state=Co08&state=Ct09&state=De10&state=Dc11&state=Fl12&state=Ga13&state=Hi15&state=Id16&state=Il17&state=In18&state=Ia19&state=Ks20&state=Ky21&state=La22&state=Me23&state=Md24&state=Ma25&state=Mi26&state=Mn27&state=Ms28&state=Mt30&state=Ne31&state=Nv32&state=Nh33&state=Nj34&state=Nm35&state=Ny36&state=Nc37&state=Nd38&state=Oh39&state=Ok40&state=Or41&state=Pa42&state=Ri44&state=Sc45&state=Sd46&state=Tn47&state=Tx48&state=Ut49&state=Vt50&state=Va51&state=Wa53&state=Wv54&state=Wi55&state=Wy56&g1_=bg&g2_=ur&g2_=ua&wtvar=pop10&nozerob=1&title=&csvout=1&namoptf=b&listout=1&lstfmt=html&namoptr=b&oropt=&counties=&metros=&places=&latitude=&longitude=&locname=&distance=&kiloms=0&nrings=&r1=&r2=&r3=&r4=&r5=&r6=&r7=&r8=&r9=&r10=&lathi=&latlo=&longhi=&longlo=

I've put the data set in my local laptop as:

/c/github/justice40-tool/data/data-pipeline/data_pipeline/data/geocorr

just to play around with it.

VincentLaUSDS commented 3 years ago

Slack thread on running code: https://usds.slack.com/archives/C0222FBGQ65/p1631808402174600

usds / justice40-tool

As a developer, I want to have population density in the data, so that I can understand the difference between urban and rural communities. #355