jessecambon / tidygeocoder

Geocoding Made Easy
https://jessecambon.github.io/tidygeocoder
Other
283 stars 20 forks source link

Create custom functions with geocode #197

Closed daranzolin closed 5 months ago

daranzolin commented 6 months ago

Description

I have a fairly complex call to geocode that includes a custom url, custom api args, etc., and I'd like to generalize the call to geocode to handle data frames where the street address or city name is variable.

Towards that end, I'd like the ability to create custom functions with tidygeocoder::geocode. Currently, I can't pass column names to the various geocode arguments via tidyeval (or any other method). Consider the reprex below:

library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 4.2.2
library(tidygeocoder)
#> Warning: package 'tidygeocoder' was built under R version 4.2.2

some_addresses <- tribble(
  ~name,                  ~addr,
  "White House",          "1600 Pennsylvania Ave NW, Washington, DC",
  "Transamerica Pyramid", "600 Montgomery St, San Francisco, CA 94111",     
  "Willis Tower",         "233 S Wacker Dr, Chicago, IL 60606"                                  
)

# Attempt 1 w/tidyeval
my_geocode1 <- function(data, address) {
  data %>% 
    geocode(
      address = {{address}},
      method = 'osm'
    )
}

some_addresses %>% my_geocode1(addr)
#> Error in if (!(all_args[[var]] %in% colnames(.tbl))) {: the condition has length > 1

# Attempt 2 trying to pass string
my_geocode2 <- function(data, address) {

  rm_quote <- function(string) gsub("\"", "", string)
  address_str <- rm_quote(deparse(substitute(address)))

  data %>% 
    geocode(
      address = address_str,
      method = 'osm'
    )
}

some_addresses %>% my_geocode2(addr)
#> Error: "address_str" is not a column name in the input dataset.

Created on 2024-03-19 with reprex v2.0.2

jessecambon commented 5 months ago

Hi @daranzolin, I got a chance to try to debug this, but ran into the same errors you did unfortunately. Because of the way that column names are handled in geocode() it seems it is difficult or not possible to pass arguments in this way. Given that dplyr doesn't seem to have this issue I am guessing that using rlang for tidy eval within geocode() instead of rm_quote(deparse(substitute()) may fix this issue.

jessecambon commented 5 months ago

@daranzolin if you're looking for a workaround for the time-being you could use the geo() function instead of geocode(). It would be something like this:

library(dplyr, warn.conflicts = FALSE)
library(tidygeocoder)
library(rlang)

some_addresses <- tribble(
  ~name,                  ~addr,
  "White House",          "1600 Pennsylvania Ave NW, Washington, DC",
  "Transamerica Pyramid", "600 Montgomery St, San Francisco, CA 94111",     
  "Willis Tower",         "233 S Wacker Dr, Chicago, IL 60606"                                  
)

my_geocode <- function(input_data, addr_col) {
  address_data <- input_data %>% 
    pull({{addr_col}})

  output <- geo(address=address_data)
  return(output)
}

some_addresses %>% my_geocode(addr)
#> Passing 3 addresses to the Nominatim single address geocoder
#> Query completed in: 4.1 seconds
#> # A tibble: 3 × 3
#>   address                                      lat   long
#>   <chr>                                      <dbl>  <dbl>
#> 1 1600 Pennsylvania Ave NW, Washington, DC    38.9  -77.0
#> 2 600 Montgomery St, San Francisco, CA 94111  37.8 -122. 
#> 3 233 S Wacker Dr, Chicago, IL 60606          41.9  -87.6

Created on 2024-04-09 with reprex v2.1.0

elipousson commented 5 months ago

I had previously reported this as a bug last spring and made a couple attempts at a refactor to address this issues: https://github.com/jessecambon/tidygeocoder/issues/189

Unfortunately, it seemed like it would require a somewhat involved rewrite of the existing function so a work-around using geo() may be the better bet. I can dig up the old code if you wanted to work on it, @daranzolin

daranzolin commented 5 months ago

@jessecambon @elipousson thanks everyone! geo() suits my needs perfectly, I'll go ahead and close this issue.