sfg-taxonpages / orthoptera

0 stars 0 forks source link

song links missing for T arcufolium #90

Open klausriede opened 6 months ago

klausriede commented 6 months ago

http://orthoptera.archive.speciesfile.org/Common/basic/Taxa.aspx?TaxonNameID=1140042

typophyllum commented 6 months ago

Thanks, fixed it. Again duplicate OTU with missing data. https://orthoptera.speciesfile.org/otus/803223/overview

klausriede commented 6 months ago

@MMCigliano this problem should be fixed on a higher level! I come across this problem frequently, without spending too much time searching, just everyday use! It should not be too difficult to design some control routines

typophyllum commented 6 months ago

If TaxonPages always picked the OTU with the lowest number in case there are two or more coordinate OTUs this would be solved. But apparently much more complicated than it seems.

And in Filter nomenclature could perhaps be integrated a filter for names with multiple OTUs.

LocoDelAssembly commented 6 months ago

The duplicated OTU in question is deleted by now? I thought this problem was solved when TP searches started to exclude OTUs having non-blank name field and those were the only kind of duplicated OTUs that existed (in SFs projects).

Please hold on editing data next time an error like this appears so I can attempt analyzing the problem.

typophyllum commented 6 months ago

According to the sandcastle data the deleted OTU looked like this:

imagen

typophyllum commented 6 months ago

As Klaus mentions it's easy to find more cases, for example there:

imagen

TP shows OTU 926810, which is a duplicate lacking content, like OTU 926811. The correct one is OTU 850457.

Currently the seven sound links are missing. https://orthoptera.speciesfile.org/otus/926810/overview

mjy commented 6 months ago

For the record we will not implement logic that selects the first OTU by id as a solution, it's semantically particular to SFs, not all TW data.

LocoDelAssembly commented 6 months ago

Nevertheless those morrisi examples should not appear except for the last one. However it is surprising that the higher numbered is marked valid.

https://sfg.taxonworks.org/api/v1/otus/926810?project_token=3oerVKf82_196cIECvHYNg -> https://orthoptera.speciesfile.org/otus/926810/overview (full name is Typophyllum sp. 3 Typophyllum morrisi) https://sfg.taxonworks.org/api/v1/otus/850457?project_token=3oerVKf82_196cIECvHYNg -> https://orthoptera.speciesfile.org/otus/850457/overview (hand-made link, autocomplete does not lead here)

https://sfg.taxonworks.org/api/v1/otus/autocomplete?project_token=3oerVKf82_196cIECvHYNg&having_taxon_name_only=true&term=Typophyllum+morrisi

The having_taxon_name_only is supposed to remove OTUs with non-blank name, so likely a regression here? Or some extra logic to show the "valid" OTU?

[edit] Sorry, actually extra logic in TP to redirect to valid OTU using valud_otu_id field since autocomplete indeed shows OTUs with blank name only, so question would be why those temporary names are marked as valid OTU of another.[/edit]

mjy commented 6 months ago

Debug against sandbox data with a huge grain of salt, best not to go down that rabit hole. Practice is better.

LocoDelAssembly commented 6 months ago

The links above all point to production

LocoDelAssembly commented 6 months ago

@typophyllum https://sfg-practice.taxonworks.org/

mjy commented 6 months ago

According to the sandcastle data the deleted OTU looked like this:

@LocoDelAssembly right, was referencing ^

LocoDelAssembly commented 6 months ago

And with previous OTU Klaus found, it was again a problem that the otu_valid_id points to an OTU that doesn't look valid:

https://sfg-practice.taxonworks.org/api/v1/otus/autocomplete?project_token=3oerVKf82_196cIECvHYNg&having_taxon_name_only=true&term=Tympanophyllum+(Tympanophyllum)+arcufolium (thanks for sfg-practice reminder @mjy :smile:)

How this can be happening?

LocoDelAssembly commented 5 months ago

otu_valid_id origin seems to be this query:

SELECT DISTINCT ON (otus.id) otus.id, otus.name, otus.taxon_name_id, COALESCE(o2.id, otus.id) AS otu_valid_id
FROM "otus" LEFT JOIN
    taxon_names t1 ON otus.taxon_name_id = t1.id LEFT JOIN
    otus o2 ON t1.cached_valid_taxon_name_id = o2.taxon_name_id
WHERE "otus"."id" = :some_id

Would it break something if the query favored otus.id = o2.id over picking one at random when there are several OTUs referencing the same valid TN?

taxonworks_practice=# SELECT DISTINCT ON (otus.id) otus.id, otus.name, otus.taxon_name_id, COALESCE(o2.id, otus.id) AS otu_valid_id
FROM "otus" LEFT JOIN
    taxon_names t1 ON otus.taxon_name_id = t1.id LEFT JOIN
    otus o2 ON t1.cached_valid_taxon_name_id = o2.taxon_name_id
WHERE "otus"."id" = 803223;
   id   | name | taxon_name_id | otu_valid_id 
--------+------+---------------+--------------
 803223 |      |        910776 |       929363
(1 row)

taxonworks_practice=# SELECT otus.id, otus.name, otus.taxon_name_id, COALESCE(o2.id, otus.id) AS otu_valid_id
FROM "otus" LEFT JOIN
    taxon_names t1 ON otus.taxon_name_id = t1.id LEFT JOIN
    otus o2 ON t1.cached_valid_taxon_name_id = o2.taxon_name_id
WHERE "otus"."id" = 803223;
   id   | name | taxon_name_id | otu_valid_id 
--------+------+---------------+--------------
 803223 |      |        910776 |       929363
 803223 |      |        910776 |       803223 <<< This one would be the expected result
(2 rows)
mjy commented 5 months ago

@LocoDelAssembly pointer to corresponding code?

mjy commented 5 months ago

Would it break something if the query favored otus.id = o2.id over picking one at random when there are several OTUs referencing the same valid TN?

Probably not, but we can't assume there is anything special about the match, i.e. if we suddenly switched to the last match + some order then our result should be the same. If aggregating data is an issue then we need to resolve at the aggregation level.

LocoDelAssembly commented 5 months ago

@LocoDelAssembly pointer to corresponding code?

Two places, which if I'm not mistaken are not deciding what to show in autocomplete, only what to set as otu_valid_id. In many cases I think you won't like this redirection, perhaps even with AntWeb this is a problem given them have partially identified specimens and because of that there are OTUs like "Genus sp. Genus" that perhaps may cause the Genus OTU to have one of those sp. as the valid OTU. (Perhaps was not sp. exactly, but we or maybe Dash added something in the importer to import partially valid scientific names in this way)

The idea would be not redirecting the user if the selected OTU is a valid candidate already.