open-city / chicago-river-sewage

Is there sewage in the Chicago River?
http://istheresewageinthechicagoriver.com
29 stars 12 forks source link

Geocode CSO synopses names #24

Closed fgregg closed 8 years ago

fgregg commented 10 years ago

We want to be able to say where there is a CSO event. In order to do that we need to geocode the names of the cso locations as they appear in the cso synopses.

Unfortunately, these synopses names are not all included in the MWRD Tarp Connection database. In fact, only 63/226 synopses names seem to match the TARP.

This is a problem because we need the "Location" field from the MWRD Tarp dataset to geocode this points.

However, many (most? all?) of the synopses names that do not appear in the MWRD Tarp Connection pdf, do appear on one of "CSO Representative Monitoring and Reporting Plans" pages, along with a "Location field":

If we have can match the "synopses names" to "Locations," we should then be able to geocode the points by matching against the

So, we need to do the following things

Please call dibs.

derekeder commented 10 years ago

So my previous TARP extraction exercise was incomplete as I only extracted locations owned by Chicago. There are about 50 municipalities with CSOs.

So @evz and I extracted the rest and put them in this document: https://github.com/open-city/chicago-river-sewage/blob/master/data/mwrd_tarp_connection_database_all_owners.csv

We now have 377 CSO outfall locations, but are still missing some. @evz will follow up with the list of missing locations between this updated TARP file and @sbeslow's scraped data.

derekeder commented 10 years ago

Did a csvlink of the MWRD TARP Connection Database and @sbeslow scraped CSO outfall locations: https://github.com/open-city/chicago-river-sewage/blob/master/data/mwrd_tarp_connection_cso_outfall_linked.csv

Results are better:

fgregg commented 10 years ago

Good. We don't really care, I don't think, about failing to match TARP connection Database. All we care about is the 77 unmatched CSO outfall locations right?

On Fri, Feb 21, 2014 at 5:38 PM, Derek Eder notifications@github.comwrote:

Did a csvlink https://github.com/datamade/csvdedupe#csvlink-usage of the MWRD TARP Connection Database and @sbeslowhttps://github.com/sbeslowscraped CSO outfall locations: https://github.com/open-city/chicago-river-sewage/blob/master/data/mwrd_tarp_connection_cso_outfall_linked.csv

Results are better:

  • 149 matches
  • 77 unmatched in the scraped CSO outfall locations
  • 225 unmatched in the TARP Connection Database

— Reply to this email directly or view it on GitHubhttps://github.com/open-city/chicago-river-sewage/issues/24#issuecomment-35785289 .

773.888.2718 2231 N. Monticello Ave Chicago, IL 60647

derekeder commented 10 years ago

The unmatched:

002 Lemont WRP 122ND ST PS (CDS-28) 125TH ST PS (CDS-13) 125th St PS 95TH ST PS (CDS-34) 95th St PS CDS-11 CDS-2 CDS-20-1 CDS-20-2 CDS-4 CDS15 CDS20-1 CDS20-2 CDS20-b CRCW D39 D8 DS-D09 "DS-D19,23" DS-M13 DSM106 DSM107 DSM109 DSM114N DSM90 DSN6 K1/UDP1 K11/UDP3 K14/UDP3 K1A/UDP1A K2/UDP1 K22 K25/UDP5 K26/UDP8 K27/UDP8 K3/UDP2 MS104 MS105W MS109N MS15 MS16 MS3 MS81 MS84 MS88 MWRD DS-M114N NB10A NB10B NB11 NB13R NB17 NB18 NB3 NB5 NB8 NBPS NBPS (DS-M90 & DS-M91) Obrien PULASKI RD PS (18E-PS) Pulaski Road(TARP relief) RAPS "RAPS (DS-M27, DS-M28, DS-M29)" TG-13A TG132 TG13A TGI12 TGI8 TGI9 TGM105 TGM5 TGM71 TGNASH UDP-DS1 (K2-1&K2-2) WCPS WCPS (DS-D34-AI) Wilmette Wilmette DS-M114N-2

PS stands for pumping station. These could be identified in the MWRD wastewater structures geojson file.

fgregg commented 10 years ago

Many of these unmatched locations are listed in the "CSO Representative Monitoring and Reporting Plans." @evz did you make a start on scraping those pages?

andreweskeclarke commented 10 years ago

@evz @fgregg Going to work on scraping "CSO Representative Monitoring and Reporting Plans." unless someone has started.

evz commented 10 years ago

I got a start on it. I guess I forgot to push that up yesterday. If you'd like, I can push up my script when I get home. Prolly won't be for another couple hours.

Eric

On Saturday, February 22, 2014, andreweskeclarke notifications@github.com wrote:

@evz https://github.com/evz @fgregg https://github.com/fgregg Going to work on scraping "CSO Representative Monitoring and Reporting Plans." unless someone has started.

Reply to this email directly or view it on GitHubhttps://github.com/open-city/chicago-river-sewage/issues/24#issuecomment-35814262 .

andreweskeclarke commented 10 years ago

Ah, dang, my bad. Duplicated effort: 4d5f1b9fed248386a0b8d450868c64c5821c3283

derekeder commented 10 years ago

Started a google doc to complete the rest of the CSO outfall identification by hand:

https://docs.google.com/spreadsheet/ccc?key=0AtbqcVh3dkAqdFozZ0FCZnlwSW1YMHBqelB1OHR3S0E#gid=0

attn! @sbeslow @andreweskeclarke

sbeslow commented 10 years ago

http://crystal.isgs.uiuc.edu/nsdihome/webdocs/st-hydro.html

On Feb 25, 2014, at 7:45 PM, Derek Eder notifications@github.com wrote:

Started a google doc to complete the rest of the CSO outfall identification by hand:

https://docs.google.com/spreadsheet/ccc?key=0AtbqcVh3dkAqdFozZ0FCZnlwSW1YMHBqelB1OHR3S0E#gid=0

attn! @sbeslow @andreweskeclarke

— Reply to this email directly or view it on GitHub.

derekeder commented 10 years ago

@sbeslow @fgregg whats the status of this Google doc? looks like there's still some significant cleanup to do on it, starting around line 161

derekeder commented 10 years ago

Here's whats left on this spreadsheet:

Starting on line 497, we need to match the PIPE_DESC (col A) to Merged Locations (col O). There are fields AKA_1, AKA_2 ... AKA_6 because some pipes have multiple entries for TARP Connection Observed.

DavidGinzberg commented 10 years ago

Updated rows 213 and 371 on the google doc. @derekeder or someone else more familiar with the data than I am want to double-check before I go modify the other ~55 rows remaining?

derekeder commented 8 years ago

no longer needed with #39