Closed espinielli closed 4 years ago
Probably it is best to retrieve available keys and tags via the taginfo
API:
keys
responses are paged with a max of 999 with a total
attribute indicating the overall amount of entries.
The sequence of the remaining page numbers is 2:(total %% 999)
and each page can be retrieved after the first and combined together.
The following experimental code could be used to retrieve all key
s:
library(stringr)
library(jsonlite)
library(httr)
library(dplyr)
pagenum <- 1
url <- "https://taginfo.openstreetmap.org/api/4/keys/all?page={pagenum}&rp=999"
u <- str_glue(url)
g <- httr::GET(u)
if (httr::status_code(g) == 200) {
c <- httr::content(g, as = "text") %>% jsonlite::fromJSON()
p <- c$data %>% as_tibble()
total <- c$total
pages <- 2:(total %% 999)
str_glue(url, pagenum = pages) %>%
purrr::map_dfr(.f = function(u) {
httr::GET(u) %>%
httr::content(as = "text") %>%
jsonlite::fromJSON() %>%
`[[`("data") %>%
as_tibble()
}) %>%
bind_rows(p)
}
status_code
for pages from 2 onwardquery
string to search for a keyThanks @espinielli, that's a very good catch there. The available_...()
hasn't been updated for way too long, and you've offered a brilliant extension. If you are up to a PR, I'd happily incorporate your suggestions, but noting that the original intention of those functions was to reflect the "official" lists on https://wiki.openstreetmap.org/wiki/Map_Features, rather than the actual complete lists as recorded in the entire database, and as provided by the taginfo
service. I still think this is a sensible approach according to a working hypothesis that these lists of tags are most commonly used to re-map any "rogue" tags into standard values. That said, the ability to extract all tags used in the entire database is also obviously cool and potentially important in some cases, so how about keeping that original functionality by default, but allowing for an additional argument to both functions, taginfo = FALSE
, which enables full values when TRUE
? You could then do a PR for the TRUE
case.
That would then need the following:
taginfo
parameter means and doesstringr
and purrr
. The former is easy to remove, and the latter would have the advantage that you'd be able to insert a utils::txtProgressBar
within a standard loop, which would be really helpful because, yes, it really does take quite some time.url <- sprintf ("https://taginfo.openstreetmap.org/api/4/key/values?key=%s&page={pagenum}&rp=999", value)
Your initial issue can then be solved by re-writing the code for taginfo = FALSE
to something like the following:
library (magrittr)
url_ftrs <- "https://wiki.openstreetmap.org/wiki/Map_Features"
pg <- xml2::read_html (httr::GET (url_ftrs))
taglists <- rvest::html_nodes (pg, "div[class='taglist']") %>%
rvest::html_attr ("data-taginfo-taglist-tags")
taglists <- lapply (taglists, function (i) {
temp <- strsplit (i, "=") [[1]]
res <- NULL
if (length (temp) == 2) {
res <- strsplit (temp [2], ",")
names (res) <- temp [1]
}
return (res) })
taglists [vapply (taglists, is.null, logical (1))] <- NULL
head (taglists)
#> [[1]]
#> [[1]]$aeroway
#> [1] "aerodrome" "apron" "gate" "hangar"
#> [5] "helipad" "heliport" "navigationaid" "runway"
#> [9] "spaceport" "taxiway" "terminal" "windsock"
#>
#>
#> [[2]]
#> [[2]]$barrier
#> [1] "cable_barrier" "city_wall" "ditch" "fence"
#> [5] "guard_rail" "handrail" "hedge" "kerb"
#> [9] "retaining_wall" "wall"
#>
#>
#> [[3]]
#> [[3]]$barrier
#> [1] "block" "bollard" "border_control"
#> [4] "bump_gate" "bus_trap" "cattle_grid"
#> [7] "chain" "cycle_barrier" "debris"
#> [10] "entrance" "full-height_turnstile" "gate"
#> [13] "hampshire_gate" "height_restrictor" "horse_stile"
#> [16] "jersey_barrier" "kissing_gate" "lift_gate"
#> [19] "log" "motorcycle_barrier" "rope"
#> [22] "sally_port" "spikes" "stile"
#> [25] "sump_buster" "swing_gate" "toll_booth"
#> [28] "turnstile" "yes"
#>
#>
#> [[4]]
#> [[4]]$craft
#> [1] "agricultural_engines" "atelier"
#> [3] "bakery" "basket_maker"
#> [5] "beekeeper" "blacksmith"
#> [7] "boatbuilder" "bookbinder"
#> [9] "brewery" "builder"
#> [11] "cabinet_maker" "carpenter"
#> [13] "carpet_layer" "caterer"
#> [15] "chimney_sweeper" "clockmaker"
#> [17] "confectionery" "cooper"
#> [19] "dental_technician" "distillery"
#> [21] "door_construction" "dressmaker"
#> [23] "electronics_repair" "embroiderer"
#> [25] "electrician" "engraver"
#> [27] "floorer" "gardener"
#> [29] "glaziery" "grinding_mill"
#> [31] "handicraft" "hvac"
#> [33] "insulation" "jeweller"
#> [35] "joiner" "key_cutter"
#> [37] "locksmith" "metal_construction"
#> [39] "mint" "musical_instrument"
#> [41] "oil_mill" "optician"
#> [43] "organ_builder" "painter"
#> [45] "parquet_layer" "photographer"
#> [47] "photographic_laboratory" "piano_tuner"
#> [49] "plasterer" "plumber"
#> [51] "pottery" "printer"
#> [53] "printmaker" "rigger"
#> [55] "roofer" "saddler"
#> [57] "sailmaker" "sawmill"
#> [59] "scaffolder" "sculptor"
#> [61] "shoemaker" "signmaker"
#> [63] "stand_builder" "stonemason"
#> [65] "sun_protection" "tailor"
#> [67] "tiler" "tinsmith"
#> [69] "toolmaker" "turner"
#> [71] "upholsterer" "watchmaker"
#> [73] "water_well_drilling" "window_construction"
#> [75] "winery"
#>
#>
#> [[5]]
#> [[5]]$emergency
#> [1] "ambulance_station" "defibrillator"
#> [3] "landing_site" "emergency_ward_entrance"
#>
#>
#> [[6]]
#> [[6]]$emergency
#> [1] "dry_riser_inlet" "fire_alarm_box" "fire_extinguisher"
#> [4] "fire_hose" "fire_hydrant" "water_tank"
#> [7] "suction_point"
Created on 2020-02-13 by the reprex package (v0.3.0)
The whole list can then be returned by default from available_tags
by simply changing
https://github.com/ropensci/osmdata/blob/d0f30d3df0b2eac4b5e5243e8df5a79ab4db3115/R/features.R#L43
to
available_tags = function (feature = NULL)
and only sub-selecting for non-NULL
values of feature
.
If you do a PR, then I'd be more than happy to add you as an official package contributor, because this definitely adds entirely new functionality. Thanks very much for opening the issue, and for already thinking so deeply about an excellent solution :+1: :rocket:
@mpadge I like and subscribe to your proposal to extend the available_()
functions with the extra taginfo
argument.
So I would say the work has two parts:
available_()
functions so as to return what is the official lists of key
s and tag
a as per https://wiki.openstreetmap.org/wiki/Map_Features, along the lines of the code you proposed aboveavailable_()
function with the extra taginfo
as you proposedIf ok with you I would create a separate issue for the extension and keep this one for the fix (or you can do it 😄).
If ok, I would give a try for the relevant PRs for both.
Thanks for all the great input and the PRs @espinielli. Just an FYI that i've broken an :muscle: and will be out of action for a little while. I'll get back onto this as soon as I return to normal capabilities.
no rush, look after your health first and good luck!
On Mon, Feb 17, 2020 at 7:42 PM mark padgham notifications@github.com wrote:
Thanks for all the great input and the PRs @espinielli https://github.com/espinielli. Just an FYI that i've broken an 💪 and will be out of action for a little while. I'll get back onto this as soon as I return to normal capabilities.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/osmdata/issues/207?email_source=notifications&email_token=AAGZWLCOWXNQQGGCN6OTG6LRDLLA3A5CNFSM4KT5SQI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL7LEZA#issuecomment-587117156, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGZWLCXHLH4CPBPEVSWC7TRDLLA3ANCNFSM4KT5SQIQ .
@espinielli update: slowly getting there and hoping to be on to your issues & PRs next week :smile:
Due to the way the feature tables in the OSM Features page are constructed the following line in
available_tags()
does not always workIn fact the tables for
aeroway
,barrier
... are constructed differently (they are autogenerated) compared to the others and hence thea[title^='Tag:%s']
fails to retrieve all the tags.The tags for aeroway are 12 in the Feature Page, but only 1 is returned:
Created on 2020-02-12 by the reprex package (v0.3.0)
available_tags
fails on:aeroway
barrier
craft
emergency
geological
historic
leisure
man_made
/man made
military
natural
office
place
public_transport
/public transport
sport
telecom
tourism
waterway