nbnuk / nbnatlas-issues

Issue tracking for NBN Atlases
https://nbnatlas.org
2 stars 0 forks source link

Name matching routine has matched on synonym rather than recommended name #754

Closed sophiathirza closed 4 years ago

sophiathirza commented 5 years ago

The following record has a supplied scientific name of Rorippa palustris. The Atlas has (incorrectly) matched the name to Rorippa islandica: https://records.nbnatlas.org/occurrences/a07aab28-a2a1-4032-8382-b9dd75d7ac03

In the UKSI, Rorippa palustris auct., non (L.) Besser is a synonym of Rorippa islandica (Oeder ex Gunnerus) Borbás: http://nbn-sd-dev.nhm.ac.uk/taxonbrowser.php?txtSearch=Rorippa+islandica

It looks like the name matching routine has matched Rorippa palustris to Rorippa islandica rather than to the recommended name (http://nbn-sd-dev.nhm.ac.uk/taxon.php?linkKey=NBNSYS0000002897) Rorippa palustris (L.) Besser, i.e. has taken the synonym match over a recommended name.

Apparently there is another example of this mistake, which I am waiting for.

sophiathirza commented 5 years ago

Before making any changes to this, it would be good to investigate how the name matching works in this case.

reupost commented 5 years ago

In our database, Rorippa palustris (L.) Besser is an accepted name: https://species.nbnatlas.org/species/NBNSYS0000002897 . So is Rorippa islandica (Oeder ex Gunnerus) Borbás: https://species.nbnatlas.org/species/NHMSYS0000462432

But as you noted, 'Rorippa palustris' (without authority) is a synonym of Rorippa islandica (Oeder ex Gunnerus) Borbás (as per https://species.nbnatlas.org/species/NHMSYS0000462432#names) .

Given that the naked name was provided for the record, this was matched to 'Rorippa palustris', i.e. the synonym, rather than to the accepted name with the authority. I can see the value of preferentially matching a name to an accepted name, regardless of whether the authority was provided or not, but this could have some unexpected side-effects.

For interest, we have about 19 900 species synonyms without authorities. It might be expected that any records provided with these names will be matched to the accepted name for the naked name in question, regardless of whether there is an accepted name matching the provided name, but with an authority.

sophiathirza commented 5 years ago

This is misleading: 'Rorippa palustris' (without authority) is a synonym of Rorippa islandica (Oeder ex Gunnerus) Borbás (as per https://species.nbnatlas.org/species/NHMSYS0000462432#names), because although it doesn't have an authority it does have a name qualifier, which is missing on the names tab.

sophiathirza commented 4 years ago

I have added a message to the forum: https://forums.nbn.org.uk/viewtopic.php?id=7424 to ask for feedback on changing the name matching routine to not match ambiguous names.

sophiathirza commented 4 years ago

Note on how the UKSI bulk name matching tool works:

When multiple results are found they are prioritised in the following way:

Recommended names before synonyms
Names without attributes over attributed names
Other variations of a recommended name above junior synonyms
Well-formed names over ill-formed names
sophiathirza commented 4 years ago

Here are the stats on how we match our names:

taxonIdMatch: 221,548,378 exactMatch: 630,915 noMatch: 379,852 - many of these are where the supplied tvk was for a common name canonicalMatch: 39,819 higherMatch: 7,225 fuzzyMatch: 1,595 vernacularMatch: 25

I don't know what the terms canonical, higher or fuzzy really mean.

sophiathirza commented 4 years ago

Examples for testing later on:

Exact: https://records.nbnatlas.org/occurrences/search?q=name_match_metric%3AexactMatch

Canonical (some of these could be incorrectly matched): https://records.nbnatlas.org/occurrences/search?q=name_match_metric%3AcanonicalMatch

Higher (some of these look a bit suspicious): https://records.nbnatlas.org/occurrences/search?q=name_match_metric%3AhigherMatch

Fuzzy https://records.nbnatlas.org/occurrences/search?q=name_match_metric%3AvernacularMatch

Common name: https://records.nbnatlas.org/occurrences/search?q=name_match_metric%3AvernacularMatch