ojalaquellueva / TNRSapi

API wrapper for TNRS batch application
Other
4 stars 2 forks source link

exact match returns extra match #8

Open nlkinlock opened 2 years ago

nlkinlock commented 2 years ago

I'm running into an issue when resolving large numbers of names using the TNRS API via the R package or the web application.

Periodically (~1 name per 1,000), two names will be returned for a name with a single exact match in the backbone. The second name is often taxonomically unrelated to the first, and both names are included in the input list. For both returned names, the output always shows an Overall_score of exactly 0.9 and the Genus_submitted as the genus of the extra name. Below is an example of this output:

     ID      Name_submitted Overall_score Name_matched_id        Name_matched Name_score Name_matched_rank Author_submitted Author_matched Author_score Canonical_author Name_matched_accepted_family
779 729 Acronychia octandra           0.9         1684851 Acronychia octandra          1           species NA             NA           NA (F.Muell.) T.G.Hartley                     Rutaceae
780 729 Acronychia octandra           0.9         1679170 Actinidia persicina          1           species NA             NA           NA          R.G.Li & L.Mo                Actinidiaceae
    Genus_submitted Genus_matched Genus_score Specific_epithet_submitted Specific_epithet_matched Specific_epithet_score Family_submitted Family_matched Family_score Infraspecific_rank
779       Actinidia    Acronychia           1 persicina                 octandra 1               NA             NA           NA NA
780       Actinidia     Actinidia           1 persicina                persicina 1               NA             NA           NA NA
    Infraspecific_epithet_matched Infraspecific_epithet_score Infraspecific_rank_2 Infraspecific_epithet_2_matched Infraspecific_epithet_2_score Annotations     Unmatched_terms
779                            NA NA                   NA NA                            NA          NA Acronychia octandra
780                            NA NA                   NA NA                            NA          NA Acronychia octandra
                                        Name_matched_url Name_matched_lsid Phonetic Taxonomic_status Accepted_name    Accepted_species   Accepted_name_author Accepted_name_id Accepted_name_rank
779 http://www.worldfloraonline.org/taxon/wfo-0000518852 NA        Y         Accepted Acronychia octandra Acronychia octandra (F.Muell.) T.G.Hartley          1684851 species
780 http://www.worldfloraonline.org/taxon/wfo-0000508289 NA        Y         Accepted Actinidia persicina Actinidia persicina          R.G.Li & L.Mo          1679170 species
                                       Accepted_name_url Accepted_name_lsid Accepted_family Overall_score_order Highertaxa_score_order Source Warnings selected unique_id
779 http://www.worldfloraonline.org/taxon/wfo-0000518852 NA        Rutaceae                   1                      1 wfo              true       778
780 http://www.worldfloraonline.org/taxon/wfo-0000508289 NA   Actinidiaceae                   2                      2 wfo             false       779

Unfortunately, I can't consistently replicate this issue. When resolving the same taxon list repeatedly, it will occur for different names. I'm standardizing using the WFO backbone, but the same issue occurs using the WCVP and Tropicos backbones. I've attached a text file with taxon names that will trigger this issue the majority of the time.

test_taxa.txt .

ojalaquellueva commented 1 year ago

@nlkinlock Sorry I missed this issue report. I was traveling at the time and the notification got buried in my inbox.

This is very strange. At a glance, it looks like Name_submitted="Acronychia octandra" is being repeated and assigned to the results for Actinidia persicina. This is reminiscent of EnquistLab/RTNRS#14, which I believe is related to a bug in the perl parallel controller code. But Overall_score=0.9 doesn't apply to any name in your example batch. I have no idea where that comes from.

Unfortunately I can't replicate the issue, even after trying a couple of times with your names. But I'm about to start working on EnquistLab/RTNRS#14. Refactoring the controller may solve several issues. I'll keep an eye out for this one as well.