paleobiodb / bug_reports

Description of recent enhancements to the Paleobiology Database and project management
7 stars 4 forks source link

Linnaean classes not assigned when sub classes used #25

Open cambro opened 5 years ago

cambro commented 5 years ago

The Linnaean ranks are of limited utility, but there is an API route that aims to return canonical ranks for taxa. Many taxa are not classified in this scheme because sub- ranks are ignored. A good example is Decapoda < Malacostraca < Crustacea

In the PBDB representation, no crab is classified in any Linnaean sense, as Malacostraca is assigned "Subclass" status and Crustacea is assigned "Subphylum" status and its parent is a clade... with no Phylum rank in the mix.

I recommend elevating "sub" ranks when there is no exact rank for the Linnaean case. In every sense of the word except PBDB, Malacostraca is a Class. Seeing crabs as "unclassified" in the API response makes the data look bad (when in fact they are better).

dwbapst commented 5 years ago

But which way of looking at the taxonomy? The most detailed view in the PBDB is always tracing parent-child (which I admit isn't easy, but can be done), and that seems to reliably work in the newest version of the API. Calling specific Linnean levels by the API is always a bit unpredictable. You seem to be suggesting that it would be preferable for anything to be listed, rather than leaving the returned value for that level as an empty value.

My issue with the suggested fix has multiple points:

1) Unclassified taxa often have parent higher taxa, but many times these aren't a sub-group, they might also be a supra-group or an unranked taxon. Why are we picking to elevate sub-groups specifically?

2) It seems a little like lying. I've argued before, but the API should be the straight-truth, even if its messy. If the PBDB taxonomy for a higher taxon is wrong (why is Malacostraca a subclass in the PBDB? What opinion places it as a subclass, rather than a class?) then it seems like the solution should be to add an opinion that Malacostraca is a class. Sure, Malacostraca should probably be properly ranked as a class given current taxonomy in the PBDB, and someone has decided it isn't a class. But what about the other cases of subclasses, when-no-class-exists? Then the class entry will now contain something which no one considers to be an actual Linnean class. There are also plenty of things with Indet. class/order/family/etc but belong to what might be ranked as sub-class/sub-family/sub-order, will their lack of a taxonomic opinion be over-written in the API output by things that aren't real classes/orders/families?

A real solution to this issue, would be implement some way of getting parent-child data that isn't iterative. (What parent genus is this species in? Okay, what taxon is parent to this genus? Oh, that subfamily. Okay what taxon is this subfamily a child too? Oh, that supra-class...). Then that parent-child structure could be searched for notable clusters (regardless of what superficial rank label that group has), which would likely be a more universal way of trying to organize the patchy taxonomy of the PBDB.

cambro commented 5 years ago

The issue is that there is an explicit parameter in the API for this purpose: option parameter 'class,' which returns the classification of a taxon: phylum, class, order, family, genus. https://paleobiodb.org/data1.2/taxa/list.json?name=Decapoda&show=class

The "Linnaean" ranks are inherently problematic and nothing will change that. Malacostraca exists as both a class and a subclass in PBDB. In the dynamic way that taxonomy is created from opinions, the subclass rank currently wins. A return of "NO_CLASS_SPECIFIED" is misleading at very best. A Class HAS been specified, it has just be trumped by an opinion that happens to place the class not as a class but as a subclass...

dwbapst commented 5 years ago

I'm well aware of the explicit parameter in the API, having studied the taxonomy output from the PBDB at length- but if I call 'class', I want the PBDB to tell me what the database says the Class is for that taxon, even if that's an NA, rather than guess what I might want, so to avoid situations with NA.

And, yes, the PBDB's taxonomy is mercurial. That's a feature, not a bug!