ojalaquellueva / TNRSbatch

TNRS core application code
Other
7 stars 0 forks source link

Hyphenated names identical to the correct name return match scores <1 #1

Open ojalaquellueva opened 4 years ago

ojalaquellueva commented 4 years ago

For example, "Aegilotriticum sancti-andreae" returns a Name_matched which is exactly the same as the submitted name, yet scores 0.99.

A user reported this issue for the online TNRS, but the behavior originates with the core code in this repository. Also to how the database is populated; see tnrs_db.

ojalaquellueva commented 4 years ago

This issue is related to the hyphen in the specific epithet. Canonical versions of taxon names are stored in the database without internal hyphens. "Aegilotriticum sancti-andreae” is thus stored internally as "Aegilotriticum sanctiandreae”, which leads to a very small match score penalty--even though the hyphenated spelling is used by all taxonomic sources.

ojalaquellueva commented 1 year ago

This could be a heavy lift to fix, for only a very minor change in the match score. But it would be more consistent with the botanical code if the hyphenated version of the name is treated as exactly equivalent ("equally correct") to the non-hyphenated version. Low-priority bugfix.