ropensci / taxadb

:package: Taxonomic Database
https://docs.ropensci.org/taxadb
Other
43 stars 13 forks source link

Handling multiple matching synonyms #26

Closed cboettig closed 4 years ago

cboettig commented 5 years ago

It would be nice for the id table to return precisely one row for each name in the input query vector. Unfortunately, some recognized synonyms are synonyms to two accepted names, and thus cannot be resolved automatically. For instance, in ITIS, 'Trochalopteron henrici gucenense' is a synonym for both 'Trochalopteron elliotii' and also for 'Trochalopteron henrici'.

What should the function do in this case? Clearly user input is needed to ultimately resolve these names, and the user should be notified, but still unclear what the best return structure should be (in a way that best favors automatic pipelines and reasoning -- e.g. not an interactive prompt, and not just a warning that has to be parsed; better to capture all possible cases in the return data structure natively). Perhaps an additional column(s) indicating the multiple matches?

cboettig commented 5 years ago

get_ids now treats multiple matches as unresolved, ensuring that the length of the id vector returned by the function matches the length (and order) of the queried names. (previously, these multiple matches would create a longer id vector, breaking the alignment between which name matched which id). ids() just returns the larger table indicating the multiple matches.

Still, get_ids should potentially return some notification of this issue.