Open klausriede opened 6 months ago
Thanks, fixed it. Again duplicate OTU with missing data. https://orthoptera.speciesfile.org/otus/803223/overview
@MMCigliano this problem should be fixed on a higher level! I come across this problem frequently, without spending too much time searching, just everyday use! It should not be too difficult to design some control routines
If TaxonPages always picked the OTU with the lowest number in case there are two or more coordinate OTUs this would be solved. But apparently much more complicated than it seems.
And in Filter nomenclature could perhaps be integrated a filter for names with multiple OTUs.
The duplicated OTU in question is deleted by now? I thought this problem was solved when TP searches started to exclude OTUs having non-blank name
field and those were the only kind of duplicated OTUs that existed (in SFs projects).
Please hold on editing data next time an error like this appears so I can attempt analyzing the problem.
According to the sandcastle data the deleted OTU looked like this:
As Klaus mentions it's easy to find more cases, for example there:
TP shows OTU 926810, which is a duplicate lacking content, like OTU 926811. The correct one is OTU 850457.
Currently the seven sound links are missing. https://orthoptera.speciesfile.org/otus/926810/overview
For the record we will not implement logic that selects the first OTU by id as a solution, it's semantically particular to SFs, not all TW data.
Nevertheless those morrisi examples should not appear except for the last one. However it is surprising that the higher numbered is marked valid.
https://sfg.taxonworks.org/api/v1/otus/926810?project_token=3oerVKf82_196cIECvHYNg -> https://orthoptera.speciesfile.org/otus/926810/overview (full name is Typophyllum sp. 3
Typophyllum morrisi)
https://sfg.taxonworks.org/api/v1/otus/850457?project_token=3oerVKf82_196cIECvHYNg -> https://orthoptera.speciesfile.org/otus/850457/overview (hand-made link, autocomplete does not lead here)
The having_taxon_name_only
is supposed to remove OTUs with non-blank name
, so likely a regression here? Or some extra logic to show the "valid" OTU?
[edit] Sorry, actually extra logic in TP to redirect to valid OTU using valud_otu_id
field since autocomplete indeed shows OTUs with blank name only, so question would be why those temporary names are marked as valid OTU of another.[/edit]
Debug against sandbox data with a huge grain of salt, best not to go down that rabit hole. Practice is better.
The links above all point to production
@typophyllum https://sfg-practice.taxonworks.org/
According to the sandcastle data the deleted OTU looked like this:
@LocoDelAssembly right, was referencing ^
And with previous OTU Klaus found, it was again a problem that the otu_valid_id
points to an OTU that doesn't look valid:
https://sfg-practice.taxonworks.org/api/v1/otus/autocomplete?project_token=3oerVKf82_196cIECvHYNg&having_taxon_name_only=true&term=Tympanophyllum+(Tympanophyllum)+arcufolium (thanks for sfg-practice reminder @mjy :smile:)
How this can be happening?
otu_valid_id
origin seems to be this query:
SELECT DISTINCT ON (otus.id) otus.id, otus.name, otus.taxon_name_id, COALESCE(o2.id, otus.id) AS otu_valid_id
FROM "otus" LEFT JOIN
taxon_names t1 ON otus.taxon_name_id = t1.id LEFT JOIN
otus o2 ON t1.cached_valid_taxon_name_id = o2.taxon_name_id
WHERE "otus"."id" = :some_id
Would it break something if the query favored otus.id = o2.id
over picking one at random when there are several OTUs referencing the same valid TN?
taxonworks_practice=# SELECT DISTINCT ON (otus.id) otus.id, otus.name, otus.taxon_name_id, COALESCE(o2.id, otus.id) AS otu_valid_id
FROM "otus" LEFT JOIN
taxon_names t1 ON otus.taxon_name_id = t1.id LEFT JOIN
otus o2 ON t1.cached_valid_taxon_name_id = o2.taxon_name_id
WHERE "otus"."id" = 803223;
id | name | taxon_name_id | otu_valid_id
--------+------+---------------+--------------
803223 | | 910776 | 929363
(1 row)
taxonworks_practice=# SELECT otus.id, otus.name, otus.taxon_name_id, COALESCE(o2.id, otus.id) AS otu_valid_id
FROM "otus" LEFT JOIN
taxon_names t1 ON otus.taxon_name_id = t1.id LEFT JOIN
otus o2 ON t1.cached_valid_taxon_name_id = o2.taxon_name_id
WHERE "otus"."id" = 803223;
id | name | taxon_name_id | otu_valid_id
--------+------+---------------+--------------
803223 | | 910776 | 929363
803223 | | 910776 | 803223 <<< This one would be the expected result
(2 rows)
@LocoDelAssembly pointer to corresponding code?
Would it break something if the query favored otus.id = o2.id over picking one at random when there are several OTUs referencing the same valid TN?
Probably not, but we can't assume there is anything special about the match, i.e. if we suddenly switched to the last match + some order then our result should be the same. If aggregating data is an issue then we need to resolve at the aggregation level.
@LocoDelAssembly pointer to corresponding code?
Two places, which if I'm not mistaken are not deciding what to show in autocomplete, only what to set as otu_valid_id
. In many cases I think you won't like this redirection, perhaps even with AntWeb this is a problem given them have partially identified specimens and because of that there are OTUs like "Genus sp.
Genus" that perhaps may cause the Genus OTU to have one of those sp.
as the valid OTU. (Perhaps was not sp.
exactly, but we or maybe Dash added something in the importer to import partially valid scientific names in this way)
The idea would be not redirecting the user if the selected OTU is a valid candidate already.
http://orthoptera.archive.speciesfile.org/Common/basic/Taxa.aspx?TaxonNameID=1140042