propublica / il-tickets-notebooks

Explore Chicago ticket data.
MIT License
9 stars 27 forks source link

Geocoding methods without restrictions #1

Open MattTriano opened 6 years ago

MattTriano commented 6 years ago

As David mentioned on Tuesday, they (ProPublica) couldn't include the latitude and longitude (lat and lon) values in the data sets they've made publically available, as the geocoding service (Google) placed that restriction on that data. These lat and lon values are necessary if you want to map features of the data (for example, if you want to map out all locations where a specific vehicle make were ticketed, or all locations a specific officer ticketed).

I've been exploring some techniques for geocoding the addresses in the sample dataset and, using Python's Geocoder library, I've put together a notebook that could reverse geocode about 150 addresses per minute without an API key. It took about 2 hours to get through all 20k unique addresses in the sample dataset. This ran in the background while I did other things, and computationally this was a very light operation, but at this rate, there may be issues scaling up to a much larger number of addresses.

Does anyone know of other reverse geocoding implementations that are faster while still being free (as in beer and as in speech)?

eads commented 6 years ago

The problem is that the Google geocoder terms of service are pretty clear that you can't use it for stuff like this. I think Texas A&M might be an option! I realize I should have probably been a little slower on the draw with your PR by including the output file.