Closed kamapu closed 7 years ago
thanks for this issue @kamapu
We don't control the web service behind the tnrs()
function, but I can pass along the feedback to the maintainers. I've thought about taking it over, but it's php, which I don't know
here's the JSON behind that request http://taxosaurus.org/retrieve/e7c69a0a08604652e390c28b328405f0 can see that there's a missing name in the acceptedName
slot - thus the empty slot in the data.frame returned
Looks like different output when using the iplant tnrs website http://tnrs.iplantcollaborative.org/TNRSapp.html , i get
Name matched: Guizotia schultzii
Name matched source(s): GCC
Name matched rank: species
Name score: 1.00
Author matched:
Author score:
Overall score: 1.00
Family matched:
Family score:
Name matched accepted family: Asteraceae
Genus matched: Guizotia
Genus score: 1.00
Specific epithet matched: schultzii
Specific epithet score: 1.00
Infraspecific rank :
Infraspecific epithet matched:
Infraspecific epithet score:
Infraspecific rank 2:
Infraspecific epithet 2 matched:
Infraspecific epithet 2 score:
Annotations:
Unmatched terms:
Taxoxnomic status: Illegitimate
Accepted name: Guizotia scabra
Name matched source(s): GCC
Accepted Name author: (Vis.) Chiov.
Accepted Name Species: Guizotia scabra
Accepted Name Family: Asteraceae
Warnings: Ambiguous match
I'll ask maintainer about this
Thank you again. I'll waiting for the outcome of the discussion. By the way, in the displayed summary there is a typo in Taxoxnomic status.
in the displayed summary there is a typo in Taxoxnomic status
what summary?
Sorry, I was talking about the popup list, the one attached at the end of your previous message:
Name matched: Guizotia schultzii
Name matched source(s): GCC
Name matched rank: species
Name score: 1.00
Author matched:
Author score:
Overall score: 1.00
Family matched:
Family score:
Name matched accepted family: Asteraceae
Genus matched: Guizotia
Genus score: 1.00
Specific epithet matched: schultzii
Specific epithet score: 1.00
Infraspecific rank :
Infraspecific epithet matched:
Infraspecific epithet score:
Infraspecific rank 2:
Infraspecific epithet 2 matched:
Infraspecific epithet 2 score:
Annotations:
Unmatched terms:
Taxoxnomic status: Illegitimate
Accepted name: Guizotia scabra
Name matched source(s): GCC
Accepted Name author: (Vis.) Chiov.
Accepted Name Species: Guizotia scabra
Accepted Name Family: Asteraceae
Warnings: Ambiguous match
ah, that's from their website, not from taxize
email to maintainer sent
There are at least two issues that came into play. I'll try to unpack them:
As Scott pointed out, when one searches Guizotia schultzii via the tnrs web interface, the results is different from the taxosaurus call. Namely, the web interface finds an accepted synonym whereas the taxosaurus API call returns an empty accepted name. The reason for this discrepancy rests on the fact that the taxosaurus API only queries the tropicos list, not all the sources. The match retrieved via the web interface is from GCC.
The tnrs follows a two step process: name scrubbing followed by taxonomic resolution. In the first step, the input string is parsed into its components (family, genus, species, subspecies, authors etc) and each component is then matched to a list of known strings within the appropriate category (aka "known names"). For every component, a score based on the string similarity (Levenshtein distance) is reported. The component scores are then summarized into a single "Overall score", that indicates how close the submitted name is to a "known name" (matched name). The taxonomic status of the matching name has no impact on the score. Only at that point, the tnrs runs the second stage and queries various taxonomic sources to assess the taxonomic status of a "matched name" and, if appropriate, returns whatever name the source considers to be accepted. In this case, you are seeing a score of 1 for the match, indicating that Guizotia schultzii is the best match in the database. However, because the taxonomic status is "No opinion" (according to that particular Tropicos snapshot), the "Accepted name" field is empty.
The TNRS relies on imported versions of the underlying databases and unfortunately those tend to get (badly) out of sync from the corresponding live version, which makes it very difficult to track the source of the problem
I'm painfully aware that this is far from an ideal situation and honestly, the TNRS could use a rewrite from the ground up in order to address these and other issues. Unfortunately, I don't have the resources to do so.
@nmatasci thanks for your input!
The match retrieved via the web interface is from GCC
what is GCC?
@kamapu does the answer above clear things up for you?
It's one of the curated sources the TNRS uses to resolve valid names, the Global Compositae Checklist. One of the idea behind the TNRS was to be able for a user to rank the order of its sources according to their (subjective) preference, so that "lower quality" sources are only used to resolve names that are not found in the better sources. That said, we had to use a default ranking and we prioritized manually curated, clade specific lists.
thanks for clarification @nmatasci
@kamapu does the answer above clear things up for you?
I know, it may be just a small detail, but submitting the following query I got apparently no matching with the database for Guizotia schultzii:
In such a case I would expect to have an empty cell for the column
matchedname
(as in the case ofacceptedname
) and a value 0 in the columnscore
. The later could be very helpful when deciding to accept or reject the suggested names using the score value as criterion.