petewarden / dstk

A collection of the best open data sets and open-source tools for data science
http://www.datasciencetoolkit.org/
1.12k stars 186 forks source link

Unexpected Google-style geocoding results #28

Closed yholkamp closed 11 years ago

yholkamp commented 11 years ago

During some testing I noticed that for some strings the results produced by the Google-style geocoder of the dstk API are at least somewhat unexpected.

For example, the following string:

28 2nd St, San Francisco, CA 94105, USA

Results in a response that shows a location somewhere on the border between Turkey and Syria, while Google is able to geocode this properly.

As a workaround I noticed that removing the ', USA' part restores the output to the expected value but it would be great if the geocoder also works when using a more international format.

yholkamp commented 11 years ago

Another example which appears to be broken under certain circumstances: 1600 Amphitheatre Parkway, Mountain View, CA, United States returns the info for some place in NC rather than CA. On the other hand the query 1600 Amphitheatre Parkway, Mountain View, CA produces the correct result.

petewarden commented 11 years ago

Thanks for the report, it's much appreciated! I've put in a fix for both of these cases. The code wasn't stripping out the country suffix, and that was causing the street-level geocoder to get confused.

This does highlight a couple of examples where the geocoder is being over-eager - in both cases it would be preferable to return no results than ones as erroneous as North Carolina or Turkey!

yholkamp commented 11 years ago

Thanks for the quick fix and your great work @petewarden, I much appreciate it!