thampiman / reverse-geocoder

A fast, offline reverse geocoder in Python
GNU Lesser General Public License v2.1
1.87k stars 160 forks source link

Incorrect Results in ```search``` #57

Open UGuntupalli opened 3 years ago

UGuntupalli commented 3 years ago

Hello, Thank you for the great work in publishing this package. I just came across the package and was trying it out when I ran into the following error, was hoping you could shed some light on this.

   import reverse_geocoder as rg

   coordinates = (33, -115) 

   results = rg.search(coordinates) # default mode = 2

   print results

The result looks something like this: image

If I am not wrong, this is indicating that this location is in Mexico. However, when you actually these coordinates in Google Maps, these coordinates are within USA.

image

csorgod commented 3 years ago

Hello all,

Thanks for the awesome work that you guys did here!

I'm facing the same issue here. I Tested with a huge amount of coordinates, and some of them results wrongly. An important possible correlation: The data analysed is related to California as well. I'll show it below:

Ploting every coordinate using plotly:

import plotly.graph_objects as go

fig = go.Figure(
    data = go.Scattergeo(
    lon = dataset['longitude'],
    lat = dataset['latitude'],
    # text = dataset['description'],
    mode = 'markers',
))

fig.update_layout(
    title = 'Moradias por latitude e longitude',
    geo_scope = 'usa'
).show()

Result: image

Creating a single column with both latitude and longitude and searching the state using search:


import reverse_geocoder as rg

coordinates = [*zip(dataset['latitude'], dataset['longitude'])]
dataset.insert(0, 'coordinates', coordinates)

geo_infos = rg.search(tuple(dataset['coordinates']))

locations = list()

for item in geo_infos:
    locations.append(item['admin1'])

dataset.insert(3, 'location', locations)

dataset['location'].unique()

Result:

array(['California', 'Nevada', 'Oregon', 'Baja California', 'Arizona'], dtype=object)

Filtering the 'Baja California' ocurrences:


dataset[dataset['location'] == 'Baja California']

image

The location resulted by Google Maps:

image

Some info regarding my setup:

Python 3.7.4 reverse-geocoder 1.5.1

csorgod commented 3 years ago

As an additional information, I plotted in Google maps all points that was incorrectly classified.


import os
import gmaps

gmaps.configure(api_key = os.environ['GOOGLE_MAPS_API_KEY'])

coord = dataset[dataset['location'] != 'California']['coordinates']

layer = gmaps.symbol_layer(coord, fill_color="green", stroke_color="green", scale=2)

fig = gmaps.figure(zoom_level = 5, center = [37.5706805, -117.6669101])
fig.add_layer(layer)
fig

The result:

image

The coordinates:

(38.69, -119.78) (41.95, -124.14) (41.88, -123.83) (38.96, -119.94) (38.95, -119.94) (38.94, -119.93) (38.91, -119.92) (32.7, -115.4) (33.07, -114.98) (32.79, -114.65) (32.8, -114.55) (32.76, -114.63) (32.74, -114.66) (36.4, -117.02) (36.0, -116.22) (39.92, -120.09) (41.79, -120.08) (38.51, -119.54) (38.53, -119.44) (39.8, -120.15) (35.55, -115.93) (35.23, -115.75) (34.91, -115.53) (34.4, -114.47) (34.19, -114.31) (34.89, -114.65) (32.56, -116.97) (32.55, -117.04) (32.54, -117.04) (32.69, -116.58) (32.64, -116.2) (32.61, -116.79) (39.61, -120.08) (39.67, -120.24) (39.67, -120.24) (41.86, -121.93)

BoZenKhaa commented 3 years ago

The package labels coordinates by doing a nearest neighbors search to a set of labelled points. Since the points you provide are close to borders, it is likely that the closest reference point was in another state. This happens on the city level as well but it would be harder to notice. This is intrinsic feature of this reverese geocoder, using a different distance metric might help a bit but some errors will always happen.

tsfraser commented 3 years ago

Hello all, I'm facing a similar issue too!

However, it seems to occur even when using latitudes and longitudes that are nowhere near any international borders, as when I run the following script in a Jupyter Notebook, on Python version 3.7.4

import numpy as np
import reverse_geocoder as rg
map_centers = np.array([[ 48.85929631,  56.34151663,  43.6497284,   44.64082733],
 [  2.310121,   -2.81211269, -79.4309882,    0.386007  ]])

Which creates a 2x4 array of latitudes and longitudes, which I then try to search with:

locations = []
num_centers = map_centers.shape[1]
for i in range(num_centers):

    res1 = rg.search(tuple(map_centers[:,i]),mode=2)
    print('Input Lat',map_centers[0,i],'Input Long',map_centers[1,i])
    print('Output Lat',res1[0]['lat'],'Output Long',res1[0]['lon'])
    loc_title = res1[0]['name']+', '+res1[0]['admin1']

    locations.append(loc_title)
print(locations)

Gives me the following output:

Input Lat 48.85929631395349 Input Long 2.310121
Output Lat 48.85341 Output Long 2.3488
Input Lat 56.341516625000004 Input Long -2.8121126875
Output Lat 56.33871 Output Long -2.79902
Input Lat 43.6497284 Input Long -79.4309882
Output Lat 43.70011 Output Long -79.4163
Input Lat 44.640827333333334 Input Long 0.386007
Output Lat 42.57952 Output Long 1.65362
['Paris, Ile-de-France', 'Saint Andrews, Scotland', 'Toronto, Ontario', 'El Tarter, Canillo']

Rerunning this will just give 'El Tarter, Canillo' as the correspondent location for all four map_centers. I'm not quite sure why after three successful initial calls of rg.search() it ends up breaking, but it does. None of these locations are near any borders, but rather it seems like rg.search isn't reading the latitudes and longitudes as I thought.

Many thanks in advance!

UPDATE: This error does not appear when running the above script in an editor (i.e. Spyder), so maybe it's a Jupyter related issue?