open-city / chicago-river-sewage

Is there sewage in the Chicago River?
http://istheresewageinthechicagoriver.com
29 stars 12 forks source link

Geocode CSO outfall locations #7

Closed derekeder closed 10 years ago

derekeder commented 10 years ago

South Suburban Mayors and Managers Association (SSMMA) has a file on the IL data portal of CSO outflow locations with lat long.

Here is a map of these locations:

screen shot 2013-12-18 at 1 13 10 pm

The MWRD has a list of all their outflow locations, but named them slightly differently. I've converted this PDF in to a csv with Tabula.

We will use dedupe to merge these two datasets.

derekeder commented 10 years ago

@fgregg csvdedupe isn't quite yielding the right results yet. Its linking items from the same file that have the same street name, but different cardinal direction. I think this will be better once we get the csvlink functionality implemented.

9,mwrd,6,Devon Ave (W)
9,mwrd,7,Devon Ave (E)
10,mwrd,8,Peterson Ave (E)
10,mwrd,9,Peterson Ave (W)
19,mwrd,11,Ardmore Ave (W)
19,mwrd,12,Ardmore Ave (E)
37,mwrd,13,Bryn Mawr Ave (E)
37,mwrd,14,Bryn Mawr Ave (W)
66,mwrd,16,Berwyn Ave (W)
66,mwrd,17,Berwyn Ave (E)
fgregg commented 10 years ago

@derek, in the output file I uploaded, https://github.com/open-city/chicago-river-sewage/blob/71f5cc0cb572b6d95ce0e0b744785683fe5a93df/data/output.csv, I didn't see any clusters that only included one source. The output of csvdedupe actually seems pretty good to me, it just needs more post-processing to get something useful (i.e. only link records within a cluster that are from different sources).

On Wed, Dec 18, 2013 at 12:08 PM, Derek Eder notifications@github.comwrote:

@fgregg https://github.com/fgregg csvdedupe isn't quite yielding the right results yet. Its linking items from the same file that have the same street name, but different cardinal direction. I think this will be better once we get the csvlink functionalityhttps://github.com/datamade/csvdedupe/issues/26implemented.

9,mwrd,6,Devon Ave (W) 9,mwrd,7,Devon Ave (E) 10,mwrd,8,Peterson Ave (E) 10,mwrd,9,Peterson Ave (W) 19,mwrd,11,Ardmore Ave (W) 19,mwrd,12,Ardmore Ave (E) 37,mwrd,13,Bryn Mawr Ave (E) 37,mwrd,14,Bryn Mawr Ave (W) 66,mwrd,16,Berwyn Ave (W) 66,mwrd,17,Berwyn Ave (E)

— Reply to this email directly or view it on GitHubhttps://github.com/open-city/chicago-river-sewage/issues/7#issuecomment-30865403 .

773.888.2718 2231 N. Monticello Ave Chicago, IL 60647

derekeder commented 10 years ago

@fgregg I renamed the file to ssmma_mwrd_linkage_1.csv

main issue is its not sorting out the (E) and (W) properly. they should be going in to different clusters.

derekeder commented 10 years ago

Merging manually with my eyeballs.

Google doc: https://docs.google.com/spreadsheet/ccc?key=0AtbqcVh3dkAqdDZlcE5ZdjV2cDViRktuQjNTNExMNlE#gid=0

derekeder commented 10 years ago

Ok did a manual merge of the SSMMA and MWRD files. The correlation was pretty good: 189/203 matches.

Google doc: https://docs.google.com/spreadsheet/ccc?key=0AtbqcVh3dkAqdDZlcE5ZdjV2cDViRktuQjNTNExMNlE#gid=0

Then, I merged @sbeslow's scraped data and matched on the TARP connection outflow location name. Not so good of a match rate: 72/202. Essentially, the PDF I grabbed from MWRD and the scraped outfall location do not really match up. Might want to look in to it.

From that merged dataset, I made this map, which generally shows where the distribution of events are happening:

screen shot 2013-12-19 at 5 20 57 pm

fgregg commented 10 years ago

This is great! So, now we have some more data http://iaspub.epa.gov/enviro/ICIS_DETAIL_REPORTS_NPDESID.icis_tst?npdesid=IL0045012&npvalue=1&npvalue=13&npvalue=14&npvalue=3&npvalue=4&npvalue=5&npvalue=6&rvalue=13&npvalue=2&npvalue=7&npvalue=8&npvalue=11&npvalue=12

And https://github.com/open-city/chicago-river-sewage/blob/cso-charting/data/processors/river-viewer/cso_chicago_caws_active.geojson

How many of Scott's scraped CSO points have we not geocoded? Can we list them someplace?

andreweskeclarke commented 10 years ago

After seeing Forest's iaspub.epa.gov info, I found http://apps.mwrd.org/MO/csoapp/outfallNSWRP.htm. This appears to map pipe numbers from the EPA permit to TARP Structure names that we have in our CSO outflow events scraped from MWRD.

EDIT - After spot checking, it only seems to be somewhat accurate

sbeslow commented 10 years ago

So, I got this mass mail from one of the main folks at Sierra Club Illinois, who obviously was not at the meeting that I presented at!!

---------- Forwarded message ---------- From: Jen Hensley jennifer.hensley@sierraclub.org Date: Thu, Feb 20, 2014 at 5:30 PM Subject: Chicago River Article / Sewage App To: Krista Grimm kristamgrimm@gmail.com, Sabolch Horvat sabolch.horvat@gmail.com, Daphne Robinson phne@aol.com, Jen Hensley jennifer.hensley@sierraclub.org, David Martin david.martin34@comcast.net, Brian Larson larson.brian.m@gmail.com, "jsmalachowski@uchicago.edu" jsmalachowski@uchicago.edu, Scott Beslow scott.beslow@gmail.com, Jeff Shelden jeffshelden@yahoo.com, Andrew Novak andrew.d.novak@gmail.com, Tom Schlipmann tslip7@hotmail.com, Colleen Smith colleen.smith@sierraclub.org, Eric Anders eric.adam.anders@gmail.com Cc: PCulhane828 PCulhane828@cs.com, Donna Hriljac misky272000@yahoo.com, Barbara Hill b.c.hill@comcast.net

I'm not a huge twitter fan, but a few people post interesting stuff. Here is a recent Chicago River piece by Michael Hawthorne on the deep tunnel project.

He also posted a link to an app, which has a bit of an ewwwww.... factor, but I also thought it was kind of cool.

http://istheresewageinthechicagoriver.com

It might be worth reaching out to them to see if they would add in Asian carp eDNA info, flood data or something else.

-Jen

derekeder commented 10 years ago

This work is being done in #24. Closing.