jlacko / gmapsdistance

Interface Between R and Google Maps
GNU General Public License v3.0
2 stars 3 forks source link

Results were way too conservative compared to results from googlemaps package in Python or Google Maps #4

Open tle4336 opened 1 year ago

tle4336 commented 1 year ago

Hi Lacko, I am currently noticing an issue on the different results (from what I have inspected, quite a number of origin-destination pairs were off by anywhere from 50%-100% shorter), between your package and the googlemaps package written in Python (https://github.com/googlemaps). For example, with the same set of inputs: origin = 303+Lexington+Ave+New+York+NY, destinations = list("3+E+40th+St+New+York+NY", "136+W+42nd+St+New+York+NY", "49+W+32nd+St+New+York+NY"),

the results given by gmapsdistance() package with distance_mode=bicycling (no change in other default parameters) are actually half the distances given by inputting these addresses directly into Google Maps, as well as the results given by the googlemaps package in Python (I used the same bicycling travel mode). In particular, the distances between the origin and 3 destinations above by gmapsdistance are 0.30633, 0.1036066, 0.46005, which are half as large as the actual distance returned by googlemaps package or Google Maps.

Could you please help "fix" this issue with an updated version for your package, or shed some lights on why this could be the case? Furthermore, should I have more trust on the results from googlemaps package in Python, since it was developed by the Google Maps team?

Thank you very much for your help and time. I look forward to hearing from you as soon as you could.

jlacko commented 1 year ago

This is interesting problem.

Let us consider only the first pair of distances: origin at 303+Lexington+Ave+New+York+NY, destination at 3+E+40th+St+New+York+NY,

In actual Google Maps I get this result: https://www.google.com/maps/dir/310+Lexington+Ave+%238b,+New+York,+NY+10016,+USA/3+E+40th+St,+New+York,+NY+10016,+USA/@40.7509393,-73.9874093,16z/data=!3m1!4b1!4m14!4m13!1m5!1m1!1s0x89c25906b997ef31:0xd005da3ddb36d20b!2m2!1d-73.978298!2d40.7487556!1m5!1m1!1s0x89c25900f6b9a55d:0xdcd7b9a7ce19acb0!2m2!1d-73.9810861!2d40.7520756!3e1

Note that we are offered 3 alternative distances, with the first being double the distance of the second (1.4 vs. 0.7 kilometers).

It is the first result (only) that get served via the XML version of the distance matrix API that gmapsdistance package interfaces.

You can double check that by running https://maps.googleapis.com/maps/api/distancematrix/xml?origins=303%20Lexington%20Ave%20New%20York%20NY&destinations=3%20E%2040th%20St%20New%20York%20NY&mode=bicycling&units=metric&key=your+gey+goes+here (please use your API key at the end, as it is not practical to share mine in a public setting).

So while I am reasonably certain that the package gives correct results in the narrow sense - it constructs the API request in a formally correct way, and returns to the R user exactly what the API said - I fully agree that a reasonable user would expect the result number two (the one that goes via Madison Avenue instead of 6th Avenue).

I will have look at the Python implementation for inspiration and report back.

jlacko commented 1 year ago

The python implementation is built on json version of the API / not XML like the R one - see https://github.com/googlemaps/google-maps-services-python/blob/master/googlemaps/distance_matrix.py

While not overly complicated (how could it be) it is a different route. Rewriting of the R implementation from xml to json would be a major undertaking with possibly breaking changes.

A possible complication is that the raw json seems to follow the logic of raw xml output / see https://maps.googleapis.com/maps/api/distancematrix/json?origins=303%20Lexington%20Ave%20New%20York%20NY&destinations=3%20E%2040th%20St%20New%20York%20NY&mode=bicycling&units=metric&key=your+key+goes+here+again, or directly

{
   "destination_addresses" : [ "3 E 40th St, New York, NY 10016, USA" ],
   "origin_addresses" : [ "303 Lexington Ave, New York, NY 10016, USA" ],
   "rows" : [
      {
         "elements" : [
            {
               "distance" : {
                  "text" : "1.4 km",
                  "value" : 1413
               },
               "duration" : {
                  "text" : "6 mins",
                  "value" : 378
               },
               "status" : "OK"
            }
         ]
      }
   ],
   "status" : "OK"
}

In other words I am unable to reproduce the desired behavior supposedly shown by the python package / the 1413 meters is the same as the R value.

tle4336 commented 1 year ago

@jlacko ji Thank you so much for your thorough investigation. From your second post, does it mean the gmapsdistance package in R gives a more accurate result than the googlemaps package in Python? That is surprising to me, since my investigations over some other addresses reflected the opposite.

jlacko commented 1 year ago

No, what I meant was that both the Python and R implementation of the API wrappers resulted in the same result (for me). Neither was more or less accurate - but both seemed to me to be consistent in 1) parsing the input strings to a url call and 2) interpreting the output of the API itself.

There is a slight variation in implementation as the Python version calls JSON version of the API and R calls XML version - but the output was the same 1413 meters distance.

When you look at the link to Google Maps directly you will be offered 3 options; when I ran python and R calls I was consistently given the number one option. Only.

I confess I have next to none "ground feeling" for NY transport, but the choice of option number one (via 6th Avenue) feels strange, especially when compared to the second option (via Madison). Which should imho be much preferred to the first one based on both time and distance.

But that is a matter with the API itself, not with R nor python API wrapper.

jlacko commented 1 year ago

@tle4336 if you could give me some other address pairs that do not feel quite right I will be happy to investigate further, because the problem is real and really strange... There may be something that I am missing, but I agree that the results are strange (just look at he route - it makes no sense!)

google, what were you thinking?

tle4336 commented 1 year ago

@jlacko Thank you very much for your thought. My apologies for taking so long to get back to you. thought the issue is just on my end, but apparently I found another couple of addresses where the distances returned by the gmapsdistance package in R differ from the distances returned from the distance_matrix.py of googlemaps package in Python. Can you help look into this issue by checking these addresses, as even though the differences are small, they impact some of my end results.

Origin: ['1021+Wilkinson+Trce+Bowling+Green+KY'] Destination: ['675+Kennedy+Ln+Clarksville+TN', '251+Holiday+Dr+Clarksville+TN'] Travel Mode in both R and Python: bike (R) and bicycling (Python)

The distances returned from R for the two pairs of origin-destination above were 77.3735 and 77.4222 miles, while from Python were 77.2312 and 77.27968 miles. I hope you could re-produce these results, but please let me know if you got something completely different.

tle4336 commented 1 year ago

@jlacko Hi Mr. Lacko, did you have a chance to look into the two origin-destination pairs above? If you need more pairs, please let me know.