ropensci / opencage

:globe_with_meridians: R package for the OpenCage API -- both forward and reverse geocoding :globe_with_meridians:
https://docs.ropensci.org/opencage
87 stars 11 forks source link

new annotation in OpenCage API results "UN_M49" #78

Closed freyfogle closed 5 years ago

freyfogle commented 5 years ago

First up, wonderful to see all the progress on the new version. Please let me know if any way to help get it across the finish line.

Secondly, there is now a new annotation in the OpenCage results that might be particularly relevant to users of R. We now return UN M49 codes (blog post announcing this) (full docs) that are often used in economic data analysis. This can tell you is a result is in a UN defined region like "sub-Saharan Africa" or in a "Least Developed Country (LDC)".

I don't know if a new annotation requires any code changes in the R module, I would hope not, but just thought I should make you aware. If not of course please just close this issue.

Forward!

maelle commented 5 years ago

Thanks! It seems to already be included (using the dev version, and probably using master as well).

sarzeau <- opencage::oc_forward("Sarzeau", no_annotations = FALSE)
sarzeau[[1]]$UN_M49_regions_WORLD
#> [1] "001" "001"
sarzeau[[1]]$UN_M49_statistical_groupings
#> [[1]]
#> [1] "MEDC"
#> 
#> [[2]]
#> [1] "MEDC"
sarzeau[[1]]$UN_M49_regions_WESTERN_EUROPE
#> [1] "155" "155"
sarzeau[[1]]$UN_M49_regions_FR
#> [1] "250" "250"
sarzeau[[1]]$UN_M49_regions_EUROPE
#> [1] "150" "150"

Created on 2019-05-20 by the reprex package (v0.2.1)

@dpprdan @jessesadler anything we want to do to process the output differently?

dpprdan commented 5 years ago

Indeed, it seems like we are already all set for this one. 🎉 IMO the output is also fine as-is.

When I tried to get the M49 values for a couple of countries ("Bolivia", "Bangladesh", "Belgium", "Botswana") I got the following error.

library("dplyr", warn.conflicts = FALSE)
library("opencage")

tibble(p = c("Bolivia", "Bangladesh", "Belgium", "Botswana")) %>% 
  oc_forward_df(p, no_annotations = FALSE, output = "all") %>% 
  select(p, starts_with("UN")) %>% 
  knitr::kable()
#> Error: Column `callingcode` can't be converted from character to integer

I don't fully understand yet why character cannot be converted to integer (dplyr:::bind_rows_(x, .id) is throwing an Rcpp::exception).

However, the API is indeed returning callingcode as an integer for e.g. Bolivia ("callingcode":591) but as a character for e.g. Belgium ("callingcode":"32"). Is this something that could and should be fixed on the OpenCage API side, @freyfogle?

freyfogle commented 5 years ago

hmm, that's odd. will have a look, thanks for making me aware. More soon

freyfogle commented 5 years ago

ok, thanks for spotting this, very nice bug, it is now fixed, callingcode is now always a number. Feel free to close this issue

dpprdan commented 5 years ago

@freyfogle Sorry I missed another one the first time around. "qibla":72.76 (Bolivia) vs "qibla":"123.65" (Belgium) I checked the types of all returned results now, so this should be it (for the examples mentioned above at least).

freyfogle commented 5 years ago

hmm, thanks. Bizzare as I am not sure what about Belgium is causing this. Fix will be live in a few minutes

dpprdan commented 5 years ago

Bolivia is the outlier in this case compared to the other three.

freyfogle commented 5 years ago

ok, fix is now live, qibla should always be a number. Thanks again, very helpful.

dpprdan commented 5 years ago

🎉

library("dplyr", warn.conflicts = FALSE)
library("opencage")
options(knitr.kable.NA = '')

tibble(p = c("Bolivia", "Bangladesh", "Belgium", "Botswana")) %>% 
  oc_forward_df(p, no_annotations = FALSE, output = "all") %>% 
  select(p, starts_with("UN")) %>% 
  knitr::kable()
p UN_M49_statistical_groupings UN_M49_regions_AMERICAS UN_M49_regions_BO UN_M49_regions_LATIN_AMERICA UN_M49_regions_SOUTH_AMERICA UN_M49_regions_WORLD UN_M49_regions_ASIA UN_M49_regions_BD UN_M49_regions_SOUTHERN_ASIA unknown UN_M49_regions_BE UN_M49_regions_EUROPE UN_M49_regions_WESTERN_EUROPE UN_M49_regions_AFRICA UN_M49_regions_BW UN_M49_regions_SOUTHERN_AFRICA UN_M49_regions_SUB_SAHARAN AFRICA
Bolivia c(“LEDC”, “LLDC”) 019 068 419 005 001
Bangladesh c(“LDC”, “LEDC”) 001 142 050 034 Bangladesh
Belgium MEDC 001 056 150 155
Botswana c(“LEDC”, “LLDC”) 001 002 072 018 202

Maybe not the most useful table/column structure for this particular type of query, but I don't have a better idea for a general solution ATM.