petewarden / dstk

A collection of the best open data sets and open-source tools for data science
http://www.datasciencetoolkit.org/
1.12k stars 186 forks source link

Google style geocoder returning inconsistent results to same query #31

Open estiens opened 10 years ago

estiens commented 10 years ago

When querying cities in Canada, the google style geocoder is occasionally returning results in Europe. This seems to happen randomly. For example querying through the web interface using

"100 Duncan St Toronto ON Canada" I could press refresh and toggle back and forth between the following results, seemingly randomly

[{"address_components":[{"short_name":"20","types":["administrative_area_level_1","political"],"long_name":"20"},{"short_name":"tr","types":["country","political"],"long_name":"Turkey"}],"types":["administrative_area_level_1","political"],"geometry":{"location_type":"APPROXIMATE","location":{"lat":38.9167,"lng":40.3},"viewport":{"southwest":{"lat":37.9167,"lng":39.3},"northeast":{"lat":39.9167,"lng":41.3}}}}]

[{"geometry":{"location_type":"APPROXIMATE","location":{"lng":-79.4163,"lat":43.70011},"viewport":{"southwest":{"lng":-79.6427230835,"lat":43.5466194153},"northeast":{"lng":-79.2320251465,"lat":43.8083610535}}},"types":["locality","political"],"address_components":[{"short_name":"Toronto","long_name":"Toronto, ON, CA","types":["locality","political"]},{"short_name":"CA","long_name":"Canada","types":["country","political"]}]}]

estiens commented 10 years ago

This only appears to happen with cities in Canada, but it also occurs when geocoding some addresses in Vancouver

petewarden commented 10 years ago

Thanks Eric. The geocoder isn't extended to street-level addresses in Canada yet, but I would expect it to pick up Toronto in that address, and the Turkey result is clearly wrong. To help me reproduce it, is this the right URL for the API call? http://www.datasciencetoolkit.org/maps/api/geocode/json?address=100%20Duncan%20St%20Toronto%20ON%20Canada&callback=jQuery1506013870353344828_1381279072437&_=1381279083648

If you're using the web interface in Chrome, you'll see this if you open View->Developer->Developer Tools and then select the Network tab before sending a query. I appreciate your help tracking this down!

estiens commented 10 years ago

http://www.datasciencetoolkit.org/maps/api/geocode/json?address=100%20Duncan%20St%20Toronto%20ON%20Canada&callback=jQuery1508765704047400504_1381282583344&_=1381282780552 resulted in a correct parsing. Trying to reproduce incorrect parsing now. The only think I can offer so far is that it is happening intermittently when parsing any address we have in Canada that hits the API

estiens commented 10 years ago

This resulted in a Turkish location just now

http://www.datasciencetoolkit.org/maps/api/geocode/json?address=20%20Duncan%20St%20Toronto%20ON%20Canada&callback=jQuery1508765704047400504_1381282583345&_=1381282897790

estiens commented 10 years ago

And then correctly located it in Canada (same query)

http://www.datasciencetoolkit.org/maps/api/geocode/json?address=20%20Duncan%20St%20Toronto%20ON%20Canada&callback=jQuery1508765704047400504_1381282583349&_=1381282954221

estiens commented 10 years ago

Just wondering if any workarounds exist for this? Still getting very inconsistent coding for all of our Canada locations. (Ending up in different countries mostly) for queries that match "Street" "City" "Province" "Canada"

petewarden commented 10 years ago

I've had a chance to dig into this, and here's what appears to be happening:

I don't have a fix for the underlying problem of the TwoFishes server process failing, but I have added a new Pingdom alert for the TwoFishes endpoint. I've restarted TF on all the servers, and now I should be able to catch problems soon after they happen, and hopefully get a clearer idea of what's going on.

I suspect it might be the separate processes fighting over available memory, in which case I might need to look into something like Linux Control Groups to ensure there's enough memory reserved to restart the TwoFishes process if it does ever fail.