ropensci / taxadb

:package: Taxonomic Database
https://docs.ropensci.org/taxadb
Other
43 stars 13 forks source link

Fuzzy filter #50

Closed cboettig closed 5 years ago

cboettig commented 5 years ago

@karinorman Can you take a quick look at this? It adds the function fuzzy_filter, which works like so:

#' ## match any common name containing:
name <- c("woodpecker", "monkey")
fuzzy_filter(name, "vernacularName", "itis")

Currently the second argument is by, so you can fuzzy-filter on either vernacular names like above, or scientific name:

fuzzy_filter("Homo ", "scientificName", "itis",
              match = "starts_with")

you can also toggle between "fuzzy" match being either "starts_with" or "contains".

One thing I'm wondering is if the interface would be more intuitive if we had wrapper methods for these options rather than leaving it up to the possibly opaque argument names, e.g. so users could do something like:

common_contains("Woodpecker")
common_starts_with("Johnson")

name_starts_with("Homo")

which is probably a lot more semantically obvious than the fuzzy_filter notation. (I guess in by_* functions we use common to refer to vernacularName and name to refer to scientificName). We kind of do this with the by_* functions, since each is just a thin wrapper around filter_by().

but not sure if that would introduce too many function names and make the whole API feel kinda overly complicated? Or maybe we would also want to revisit the by_* names to be more intuitively named? Anyway, thoughts appreciated.

(Also, currently the vernacular name version of this is just using the one vernacular name, that's something we might revisit once we fix by_common to use the full common name tables...)