ropensci / osmdata

R package for downloading OpenStreetMap data
https://docs.ropensci.org/osmdata
314 stars 45 forks source link

`key` parameter in add_osm_feature can be negated #297

Closed jmaspons closed 1 year ago

jmaspons commented 1 year ago

Same behaviour as the value parameter

jmaspons commented 1 year ago

Broken tests fixed with #295

jmaspons commented 1 year ago

Ready for review @mpadge

mpadge commented 1 year ago

Thanks as always @jmaspons, but I have to admit that i'm unsure about this one. One of the abiding aims of this package has always been to place as little demand on the overpass server as possible. The simple ability to replace key="this" with a negated version could accidentally lead to much larger queries being sent. I trialled with a very small data set, and negating the key delivered over 4 times the amount of data. These are of course areal data, so that will scale at least by a power of 2 with area size. I think if we're to incorporate this PR, I'd first like to see:

  1. An example of a use case for the ability to request negated keys; and
  2. A clear warning against doing that in anything but specific cases of the kind illustrated by (1).

Does that make sense?

mpadge commented 1 year ago

And this is really interesting:

f1 <- function (key) {
    ifelse (substring (key, 1, 1) == "!",
            sprintf ('[!"%s"]', substring (key, 2, nchar (key))),
            sprintf ('["%s"]', key))
}
f2 <- function (key) {
    ifelse (grepl ("^\\!", key),
            sprintf ('[!"%s"]', substring (key, 2, nchar (key))),
            sprintf ('["%s"]', key))
}
f3 <- function (key) {
    ifelse (grepl ("^\\!", key),
            sprintf ('[!"%s"]', gsub ("^\\!", "", key)),
            sprintf ('["%s"]', key))
}
key <- "!key"
bench::mark (f1 (key), f2 (key), f3 (key)) [, 1:3]
#> # A tibble: 3 × 3
#>   expression      min   median
#>   <bch:expr> <bch:tm> <bch:tm>
#> 1 f1(key)      12.2µs   14.6µs
#> 2 f2(key)      15.9µs   19.1µs
#> 3 f3(key)        20µs   24.2µs

Created on 2023-01-19 with reprex v2.0.2

I probably would have been tempted to use regexes, but your approach is obviously superior. Nice, and good (for me at least) to keep in mind.

jmaspons commented 1 year ago

The key negation should always be accompanied by some other filters. At least that's the use key I'm thinking in.

1. An example of a use case for the ability to request negated keys; and

Which OSM objects with name don't have Catalan translation !"name:ca"? Which peaks (natural=peak) lack the elevation tag (!ele)?

2. A clear warning against doing that in anything but specific cases of the kind illustrated by (1).

Is a warning in the docs enough? Otherwise, I think the best place is during the osmdata_* calls when the query is completed to check if a negated key is the only filter and maybe asking the user if she wants to carry on with the query

jmaspons commented 1 year ago

For the grep vs substring thing, I just replicate the same pattern as for negate values. So blame on you or whoever implement that part (applause in this case :)). I tend to go for grep as a first option as well

mpadge commented 1 year ago

The key negation should always be accompanied by some other filters. At least that's the use key I'm thinking in.

1. An example of a use case for the ability to request negated keys; and

Which OSM objects with name don't have Catalan translation !"name:ca"? Which peaks (natural=peak) lack the elevation tag (!ele)?

These are great - could you please add them to the opq examples? :heavy_check_mark:

2. A clear warning against doing that in anything but specific cases of the kind illustrated by (1).

Is a warning in the docs enough? Otherwise, I think the best place is during the osmdata_* calls when the query is completed to check if a negated key is the only filter and maybe asking the user if she wants to carry on with the query

That's also a good idea, but the package currently does not have any interactive components, and it'd be kind of nice to keep it that way. (I commonly perform huge batch jobs which call various osmdata functions, sometimes interactive, sometimes not, and having any readline() instances can potentially stop such jobs, which is annoying.) Maybe just a warning? That's also pretty annoying, and non-trivial to suppress, so i think that would suffice.

jmaspons commented 1 year ago

Warning and examples added

jmaspons commented 1 year ago

Can you merge this, @mpadge ?

mpadge commented 1 year ago

Sorry, been super-busy doing other things. Yes, of course I'll merge. Thank you so much for all of these continuing contributions!