ropensci / osmdata

R package for downloading OpenStreetMap data
https://docs.ropensci.org/osmdata
314 stars 45 forks source link

[documentation] mention how to filter for "has tag x" #342

Closed joostschouppe closed 1 month ago

joostschouppe commented 2 months ago

I could not find in the documentation how you can query for the existence of a tag (regardless of the value). This is a relatively common way to avoid downloading irrelevant data. I first tried


osm_query <- opq(bbox = bbox, timeout = 60) %>%
  add_osm_feature(key = "leisure", value = "pitch") %>%
  add_osm_feature(key = "name", 'value = "*")

But that gets interpreted literally. It took me a bit of time to realize that it's as simple as

osm_query <- opq(bbox = bbox, timeout = 60) %>%
  add_osm_feature(key = "leisure", value = "pitch") %>%
  add_osm_feature(key = "name")
mpadge commented 2 months ago

Thanks @joostschouppe, and point well made! I use key without value all the time, but ... yeah, all the documentation examples are key-value pairs (with one or two deeply-buried exceptions like https://docs.ropensci.org/osmdata/articles/osmdata.html#extracting-osm-data-from-a-query). We'll make sure to update docs.

joostschouppe commented 2 months ago

This is probably horribly inefficient, but it seems to work for dealing with all the common situations:

features_list<- list(
  list(key = "building", value = "sports_centre"),
  list(key = "name"),
  list(key = "access", value = "private", negate = TRUE),
  list(key = "leisure", key_does_not_exist = TRUE)
)
osm_query <- opq(bbox = bbox, timeout = 60) 
    # Reduce function to add features to the query
    osm_query <- reduce(features_list, function(opq_obj, feature) {
      if ("key_does_not_exist" %in% names(feature) && feature$key_does_not_exist) {
        # Handle requests for "this key does not exist" on the object
        opq_obj <- add_osm_feature(opq_obj, key = paste0("!", feature[[1]]))
      } else if (!("value" %in% names(feature))) {
        # Handle the case where only the key is specified
        opq_obj <- add_osm_feature(opq_obj, key = feature$key)
      } else {
        # Handle the case where both key and value are specified
        if ("negate" %in% names(feature) && feature$negate) {
          # And the key should NOT equal the value
          opq_obj <- add_osm_feature(opq_obj, key = feature$key, value = paste0("!", feature$value))
        } else {
          # And the key should equal the value
          opq_obj <- add_osm_feature(opq_obj, key = feature$key, value = feature$value)
        }
      }
      opq_obj
    }, .init = osm_query)
mpadge commented 2 months ago

@joostschouppe That's not too inefficient, but there's an important caveat to this package: All design designs are intended to minimise load placed on the overpass server. Requests with key but no value very generally place very large requests, and so should be avoided. I'll definitely update the docs as I suggested above, but that will also include a very strong advisory note to avoid any such calls unless absolutely necessary, and unless calls are for very small areas only.

Beyond that, any modifications to code to make it easier to pass sequences of calls including key-only are generally not strongly encouraged here. I hope that makes sense! I'll ping you here when a PR is ready with documentation updates to ask your opinion then. @jmaspons Any thoughts from you on this issue?

joostschouppe commented 1 month ago

Note that this is an AND request, so I'm actually limiting server load: only give me some things I like that also are not unnamed BTW as a heavy OSM contributor I do appreciate those efforts! And thanks a lot for the improvements

jmaspons commented 1 month ago

Notice that negating a key and values is also possible. For your example, @joostschouppe, I would write:

q <- opq(bbox = c(1:4), timeout = 60, osm_types = "nwr") |>
    add_osm_feature(key = "building", value = c("sports_centre", "sports_club")) |> # adding sports_club to show vector
    add_osm_feature(key = "access", value =  "!private") |>
    add_osm_feature(key = c("name", "!leisure"))

cat(opq_string(q))

For the heavy queries part, I believe the main factor is the bbox area. IMO, all facilities in adding filters will result in lighter queries. The alternative is to make a heavier and more generic query and filter locally.