ropensci-archive / rtweet

šŸ¦ R client for interacting with Twitter's [stream and REST] APIs
https://docs.ropensci.org/rtweet
Other
787 stars 200 forks source link

Implement alternative geocoder #365

Closed katrinleinweber closed 3 weeks ago

katrinleinweber commented 5 years ago

Problem

lookup_coords currently relies on Google Maps and a few hard-coded coordinates (#261).

Expected behavior

Since there are several other options, would it be feasible to implementing at least one of those, so that people can choose not to be dependent on Google?

hadley commented 3 years ago

What's the best R package for geocoding?

dieghernan commented 3 years ago

I would say https://cran.r-project.org/web/packages/tidygeocoder/index.html although I havenā€™t use it extensively

EDIT

I am not sure if returns the bounding box, that is required in the coords object and probably in the Twitter API

EDIT2

It can with geo(full_results = TRUE), seems to be a good option. Imports tibble, dplyr, httr, jsonlite

dieghernan commented 3 years ago

This looks like a good candidate

https://docs.ropensci.org/opencage/

EDIT: No Google Maps support, dedicated to a single provider

dieghernan commented 3 years ago

OK, so I have doing some research and I have found some interesting things that may impact some of the issues related with lookup_coords:

1. Google API does not always return bounds

This is described in the API docs: https://developers.google.com/maps/documentation/geocoding/overview#results

bounds (optionally returned) stores the bounding box which can fully contain the returned result.

This is the variable used on lookup_coors:

https://github.com/ropensci/rtweet/blob/f45b9b3e20275aef6171f6f109ab6e2dba89aa7c/R/coords.R#L122-L126

Potential alternative: Using viewport variable. From the API docs:

viewport contains the recommended viewport for displaying the returned result, specified as two latitude,longitude values defining the southwest and northeast corner of the viewport bounding box. Generally the viewport is used to frame a result when displaying it to a user.

See an example that returns both bounds and viewport: https://developers-dot-devsite-v2-prod.appspot.com/maps/documentation/utils/geocoder#q%3DUnited%2520States%2520of%2520America

Note the difference between the viewport (mainland USA) vs bounds (including also Hawaii, Alaska, etc), that is exactly what these lines try to do:

https://github.com/ropensci/rtweet/blob/f45b9b3e20275aef6171f6f109ab6e2dba89aa7c/R/coords.R#L61-L72

Find an example of a query not returning bounds. This seems the case for narrower searchs (zoom out to see the viewport, blue line): https://developers-dot-devsite-v2-prod.appspot.com/maps/documentation/utils/geocoder#q%3DTimes%2520Square%2520NY

2. Alternatives

a. Custom function (as fallback/replacement)

On #391 I added an alternative for geocoding using Nominatim, that does not require API Key and seems to be reliable enough:

```r lookup_coords_nominatim <- function(address, ...) { if (missing(address)) stop("must supply address", call. = FALSE) stopifnot(is.atomic(address)) place <- address if (grepl("^us$|^usa$|^united states$|^u\\.s", address, ignore.case = TRUE )) { boxp <- c( sw.lng = -124.848974, sw.lat = 24.396308, ne.lng = -66.885444, ne.lat = 49.384358 ) point <- c( lat = 36.89, lng = -95.867 ) } else if (grepl("^world$|^all$|^globe$|^earth$", address, ignore.case = TRUE )) { boxp <- c( sw.lng = -180, sw.lat = -90, ne.lng = 180, ne.lat = 90 ) point <- c( lat = 0, lng = 0 ) } else { ## encode address address <- gsub(" ", "+", address) ## compose query params <- list( q = address, format = "json", limit = 1 ) params <- params[!vapply(params, is.null, logical(1))] params <- paste0( mapply( function(x, y) paste0(x, "=", y), names(params), params ), collapse = "&" ) ## build URL - final name in English geourl <- paste0( "https://nominatim.openstreetmap.org/search?", params, "&accept-language=en" ) ## read and convert to list obj r <- jsonlite::fromJSON(geourl) ## extract and name box and point data frames bbox <- as.double(unlist(r$boundingbox)) boxp <- c( sw.lng = bbox[3], sw.lat = bbox[1], ne.lng = bbox[4], ne.lat = bbox[2] ) point <- c( lat = as.double(r$lat), lng = as.double(r$lon) ) # Full name from Nominatim place <- r$display_name } rtweet:::as.coords(place = place, box = boxp, point = point) # call an internal function } ```

b. Using a geocoding package

Following @hadley suggestion, I did some research (and a call to rspatial comumunity on Twitter, https://twitter.com/dhernangomez/status/1365676793299148803?s=20) and so far it seems to me that https://github.com/jessecambon/tidygeocoder could be the best alternative for the {rtweet} package if this is the preferred way forward.

The function geo allows the user to use several geocoders (including Google and Nominatim), and would be easily implemented. Some adjustments to the environment variables of both packages would be neccesary.

Update: {tidygeocoder} v1.0.3 now supports 12 geocoding services, including all the majors: see https://jessecambon.github.io/tidygeocoder/articles/geocoder_services.html. At least OSM and ArcGIS have global coverage without the need of an API Key, ping @jessecambon

3. Bottom line

I think there are ways to improve this function (using viewport, moving to another free geocoders, fallbacks, using another packages...) but I am not sure if this is a priority right now for {rtweet}.

I would be happy to help if needed, but it seems to me that it would require some work so by now I would leave it as is. If you want me to help just ping me!

llrs commented 3 years ago

Yeah, it is not a priority, so I leave for a while as is. I lend towards Nominatim, the one from Open Street Map, not sure which package would be better, but when we set on this we'll discuss it.

llrs commented 3 weeks ago

This package has been archived; you can request for it to be unarchived if you opt to resume maintenance, for that please contact rOpenSci.