jackba / arctos

Automatically exported from code.google.com/p/arctos
0 stars 0 forks source link

scientific_name with and w/o subgenus #511

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Purpose of code changes on this branch:
Errors and duplication stem from the inconsistent inclusion of data in the 
subgenus column of taxonomy.  For example, we now have two valid records for 
the same taxon:
Sorex (Otisorex) cinereus Kerr, 1792
Sorex cinereus Kerr, 1792

Subgenera are becoming increasingly critical as Arctos incorporates more and 
more taxonomic complexity in collections like insects, parasites, and 
paleontology. Eliminating subgenus is a poor option.

One option might be alternative formats for scientific names. A search on 
scientific_name = Sorex cinereus should find the same row irrespective of 
whether or not there is a value present for subgenus. Collections might need 
the choice of whether or not to display subgenus when it is present.

When reviewing my code changes, please focus on:

After the review, I'll merge this branch into:
/trunk

Original issue reported on code.google.com by gordon.jarrell on 4 Jan 2012 at 9:16

GoogleCodeExporter commented 9 years ago
Maybe this should be high priority.  Gabor apparently added over a thousand 
records that were already in there with subgenus because he couldn't find them 
as genus+species. He could not grasp my attempts to explain the issue.  The 
longer we wait, the bigger the clean-up...

Original comment by gordon.jarrell on 30 Jan 2012 at 10:32

GoogleCodeExporter commented 9 years ago
To review:

Taxonomy exists to facilitate communication.

"Diptera" (the animal Order) and "Diptera" (the plant Genus) are different 
things.

"Sorex (Otisorex) cinereus Kerr, 1792" and "Sorex cinereus Kerr, 1792" are the 
same things.

Right....

Original comment by dust...@gmail.com on 30 Jan 2012 at 10:53

GoogleCodeExporter commented 9 years ago
Correct.

Original comment by gordon.jarrell on 30 Jan 2012 at 10:59

GoogleCodeExporter commented 9 years ago
I don't think we can represent that in our current model.

Original comment by dust...@gmail.com on 30 Jan 2012 at 11:08

GoogleCodeExporter commented 9 years ago
Gotta get there somehow, even if takes a new model.  This may be unrelated
to how the data are stored (e.g., hierarchical versus long rows), and so we
might be able to use somebody else's solution, if anybody has done it.  A
compromise might be that we only concatenate subgenus into scientific_name
where species is null.  You could still search all records by subgenus, and
it would only show up in scientific where it was really needed.

Original comment by gordon.jarrell on 30 Jan 2012 at 11:49

GoogleCodeExporter commented 9 years ago
This has everything to do with how the data are stored. Neither a "long row" 
nor a hierarchical model will do what you want, at least not in any way that 
I've been able to recognize.

Doesn't the ICZN provide guidelines for how names are formed?

Original comment by dust...@gmail.com on 31 Jan 2012 at 12:07

GoogleCodeExporter commented 9 years ago
So, maybe just an IF clause in the trigger that builds scientific_name?

Original comment by gordon.jarrell on 31 Jan 2012 at 12:08

GoogleCodeExporter commented 9 years ago
IF what?

Original comment by dust...@gmail.com on 31 Jan 2012 at 12:12

GoogleCodeExporter commented 9 years ago
IF subgenus, and species NOT null
THEN concatenate genus + species
ELSE concatenate genus + "(" + subgenus + ")"

or words to that effect...

Original comment by gordon.jarrell on 31 Jan 2012 at 12:21

GoogleCodeExporter commented 9 years ago
Are you suggesting we ignore ICZN guidelines? Are there such things?

Original comment by dust...@gmail.com on 31 Jan 2012 at 12:37

GoogleCodeExporter commented 9 years ago
Not sure what the applicable guidelines might be.  *Sorex cinereus, Sorex
(Otisorex)* and *Sorex (Otisorex) cinereus* are all valid constructions, I
assume.

Original comment by gordon.jarrell on 31 Jan 2012 at 1:03

GoogleCodeExporter commented 9 years ago
Gordon would like to remove subgenus from display when species is given. So 
both "genus=Sorex + species=cinereus" and "genus=Sorex + subgenus = Otisorex + 
species=cinereus" would display as "Sorex cinereus." 

If species is not given, "genus=Sorex + subgenus = Otisorex" would display as 
"Sorex (Otisorex)".

So, when Taxonomy is re-concatenated under this logic, there are likely a few 
thousand non-unique scientific_names. Can you temporarily delete anything Gabor 
added to taxonomy in the past three months?  Or can you script something to 
delete the record with NULL subgenus when the scientific_names are the same?

Original comment by gordon.jarrell on 1 Mar 2012 at 1:41

GoogleCodeExporter commented 9 years ago

Original comment by dust...@gmail.com on 15 Mar 2012 at 4:26