ojalaquellueva / TNRSbatch

TNRS core application code
Other
7 stars 0 forks source link

Orthographic variants of same name both returned as Accepted when using two or more taxonomic sources #11

Closed ojalaquellueva closed 6 months ago

ojalaquellueva commented 9 months ago

As reported by @bmaitner (& originally by Erica Newman):

So, when I check the names via the TNRS, both actually show as accepted, but pointing to different sources.

Andropogon gerardii is accepted by WFO, who in turn cite the WCSP (https://www.worldfloraonline.org/taxon/wfo-0000846629) Andropogon gerardi is accepted by WCVP/POWO (https://powo.science.kew.org/taxon/urn:lsid:ipni.org:names:392462 ... which is dead)

Under the WFO, Andropogon gerardii is accepted but Andropogon gerardi is no opinion. Under WCVP, Andropogon gerardii is a synonym and Andropogon gerardi is accepted.

Unfortunately, this means that both names show up in the database, so anyone querying one will get an incomplete set of records.

ojalaquellueva commented 9 months ago

Results replicated in TNRSweb, also via direct all to production API from shell:

Input file (orth_var_test_names.csv):

ID,Name_submitted
1,Andropogon gerardii
2,Andropogon gerardi 

Command (using this script):

tnrs_api.sh -f ~/bien/tnrs/admin/bugs/orth_var_acceptance_bug/orth_var_test_names.csv -s wfo,wcvp -u http://vegbiendev.nceas.ucsb.edu:9975/tnrs_api.php

Response:

Names submitted:
| ID | Name_submitted      |
| -- | ------------------- |
|  1 | Andropogon gerardii |
|  2 | Andropogon gerardi  |

Processing with TNRS API @ 'http://vegbiendev.nceas.ucsb.edu:9975/tnrs_api.php'

Name resolution results:
| Name_submitted      | Name_matched        | Overall_score | Taxonomic_status | Accepted_name       | Accepted_name_author | Source |
| ------------------- | ------------------- | ------------- | ---------------- | ------------------- | -------------------- | ------ |
| Andropogon gerardii | Andropogon gerardii |          True | Accepted         | Andropogon gerardii | Vitman               | wfo    |
| Andropogon gerardi  | Andropogon gerardi  |          True | Accepted         | Andropogon gerardi  | Vitman               | wcvp   |
ojalaquellueva commented 9 months ago

@bmaitner: as usual, Tropicos tells the full story. Andropogon gerardi Vitman as published is the correct name, "gerardi" being the correctly latinized form of Gerard. Andropogon gerardii Vitman is a later orthographic variant, unpublished and therefore invalid, but commonly used. Although strictly speaking, invalid names are not a synonym of anything and should not be used, in the case of most othorgraphic variants the intended "correct name" is generally readily apparent. In this case, Andropogon gerardi Vitman should be the accepted name in both cases.

ojalaquellueva commented 9 months ago

@bmaitner: This is an exceedingly tricky issue, as the TNRS resolves each name independently, and does not pay attention to the fact that two names returned in the same batch are almost certainly orthographic variants. If the TNRS were somehow able to compare similarity among the accepted names returned within a single batch, it would be able to flag these two names as suspect. By the rules of nomenclature alone, both names cannot be correct, as names differing by only a single character are prohibited. If I can refactor the core batch application (TNRSbatch) to run a final check of the entire batch, it should be possible to at least flag such issues so they can be manually corrected. I'm afraid there is no way to decide automatically which is correct by consulting these sources. You need to dig deeper on Tropicos. Of course, it would be possible if we still had Tropicos in the TNRS...sigh!

ojalaquellueva commented 9 months ago

I'll see if I can add a post-processing step to TNRSbatch to flag potential orthographic variants. I'll tag this as a feature request.

ojalaquellueva commented 6 months ago

Same as issue #12, which more succinctly described the changes needed.

ojalaquellueva commented 6 months ago

Closing as duplicate of #12.