jackba / arctos

Automatically exported from code.google.com/p/arctos
0 stars 0 forks source link

Taxonomy: Changes #144

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
---------- Structural changes ----------
* Remove accepted_fg from taxonomy (use to make accepted relationships first)
* Remove unique index on taxonomy.scientific_name
* Add collection_ID to taxon_relations
* Create unique key on taxon_relations (taxon_id, related_taxonid,
relationship, collection_id)
* Add table: collection_taxonomy (taxon_id, accepted_taxon_id,collection_id)
--Rule: any taxon NOT listed in the taxon_id column (by collection) IS
available for cataloging new specimens.
---------- Interface changes ----------
* Creating/altering relationships adds to collection_taxonomy. Must be way
to force/over-ride this (e.g., to deal with historical data)
* Revise bulkloader/taxa picks to use collection_taxonomy in place of
accepted_fg.

Original issue reported on code.google.com by dust...@gmail.com on 18 Sep 2008 at 6:15

GoogleCodeExporter commented 9 years ago
Add table taxon_resources (taxon_id,resource_name varchar,resource_uri varchar) 
to
allow users to point a taxon at external resources such as uBio, IPNI, etc.

Original comment by dust...@gmail.com on 18 Sep 2008 at 6:46

GoogleCodeExporter commented 9 years ago
Further explanation by Dusty in email to Carla 13 Nov 2008:

The proposed taxonomy solution removes the concept of an accepted or unaccepted
record from the taxonomy table and creates collection/taxonomy tables by which
collections can "claim" taxonomy by establishing collection-specific 
relationships
among taxon terms.

So, given the following data in table taxonomy:

SomeHigheraxonTerm    ScientificName
z                                      a
y                                      b
x                                      b

Term a is available for use by any collection. It is unique; there are no 
conflicts.
Terms b(x) and b(y) are also "available," but may not successfully be used by 
bulk
Arctos application because they are not unique, and therefore not 
distinguishable by
ScientificName. (This constraint does not apply for single-record updates where 
a
person is available to choose from the possibliities.)

So we would also add collection-specific relationships:

GoodScientificName    BadScientificName    Collection
b(x)                               b(y)                            1
b(y)                               b(x)                            2

Now, Collection 1 can find the "good" (in their opinion) version of b (b(x)) 
through
a lookup on the scientific name b.
Collection 2 will, with the same lookup, get a different "taxon concept," b(y), 
as
"their" name when querying on "b".

A record of taxa will be maintained through relationships.

All that said, the structure will be mostly unimportant to users. Users will be 
able to:

Create taxonomy, including "alternate opinions" that share scientific name
"Edit" taxonomy
"Claim" taxonomy as valid for a specific collection
Locate specimens by current, historical, previously applied, or related names
Share taxonomy with anyone

Original comment by carla...@gmail.com on 20 Nov 2008 at 11:02

GoogleCodeExporter commented 9 years ago
Further point for consideration: Can we simply continue to use what we have? 

--
Gordon sez: I'm losing interest in supporting conflicting hierarchies:  Hanner 
has me
convinced that this sport is going fade.  Aside from cases of reticulate 
evolution
(which are largely at the tips of the branches), the tree of life has to have 
one
true topology, and the BarCode of life plus the 10K Genome Project may reveal 
most of
that as a simple by-product of trying to do something useful.  We need to do
something that non-specialists can use, and having GenBank, BoLD, EOL, Arctos, 
etc.,
all un-synched is seriously confusing.  As far as I'm concerned, 
source/authority
applies to scientific_name, and he rest is a band-aid to get us by until we (or
somebody) can do something truely authoritative.  Right now, we're trying to 
get some
kind of complete higher taxonomy with nomenclatural codes so we can make shtuff 
work.
 And even that is taking us years.
--

Under this model, we could also alter table taxon_relations, changing
related_taxon_name_id to related_taxon_uri. That would allow us to relate 
records to
both taxa within Arctos and to external resources (which might record alternate
opinions about higher taxonomy).

Original comment by dust...@gmail.com on 21 Apr 2009 at 8:36

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Summary from Arctos pow-wow@MVZ:

Comment 3 probably won't work; we can't all just agree to get along.

A simple tagging system, where Arctos maintains a list of unique scientific 
names and
a "cluster of assertions" probably won't work either, as MVZ (1) wants to assert
taxon concepts, and (2) can't find their specimens without doing so.

That brings us back to something like the initial proposal. 

Clarified partial Functional Requirements:

1) Locate specimens by any relevant Higher Taxonomy assertion. So, searching for
"Muridae" returns those records that are currently in "Cricetidae" but that 
anyone
ever thought were in Muridae.

2) Find stuff in those collections that are organized by higher taxonomy, which 
means
asserting concepts along with names.

3) Use taxon concepts. Functionally, this means that rows can never change. In 
the
following example, all rows are equally valid and we cannot "fill in the 
blanks" in
rows 1 and 2.

Row---Scientific_name----Family----Order-----Suborder
1-----A------------------X-------------------Z
2-----A----------------------------Y---------Z
3-----A------------------X---------Y---------Z

Immediately practical questions: 

1) Does this mean we should stop trying to fill in the blanks for our current 
data?
Given the current constraint of maintaining one globally unique scientific 
name, can we?

2) How will this affect other applications? We can't properly generate formatted
names without a nomenclatural_code (see Issue 242), and some external 
applications
(like Ornis) demand Class. Neither of these fields is provided by AOU, for 
example.
Can we cheat? If so, how do we formalize what's available for interpretation and
what's locked into the Source? Or do we need some more-elaborate structure to
separate the things provided by a Source from the functional things demanded by
various applications? How far do we wish to take this idea? AOU provides only 4 
terms
(Order, Family, Genus, Species). That, I believe, is bordering on not enough
information to be useful.

I believe we are still lacking workable functional requirements, and those are
absolutely necessary before this discussion can conclude.

Further considerations: 

1) Should we invest the time to publish this as a webservice?
2) If we proceed with (1), should we open up editing (to the extent we allow 
editing)
to the broader community?

(DLM votes "yes" on both.)
-----------------------------------------------------------------------
Somewhat related, the format for taxon_relations should be:
ID (NOT NULL: FKEY taxon_name_id)
Related_ID (NOT NULL: FKEY taxon_name_id)
Relationship (NOT NULL: FKEY, cttaxon_relations)
Whodunit (NOT NULL: FKEY AGENT_ID)
WhenDunit (NOT NULL: DATE)
Authority (NULL, text, hopefully a citation)
Authority_type (NULL, FKEY ctNewCodeTable_TaxonRelationsType)

Authority_Type is an attempt to further quantify the validity or value of a
relationship. Possible values include misspelling, alternate spelling, code 
revision,
checklist update, publication, and personal assertion.

Original comment by dust...@gmail.com on 4 May 2009 at 8:01

GoogleCodeExporter commented 9 years ago

Original comment by dust...@gmail.com on 24 Sep 2009 at 12:20

GoogleCodeExporter commented 9 years ago
For searching, we need to also be able to crawl "nodes." Given:

Specimen-->TaxonA
TaxonA-->SomeRelationship-->TaxonB
TaxonA-->SomeRelationship-->TaxonC
TaxonB-->SomeRelationship-->TaxonD
TaxonD-->SomeRelationship-->TaxonE
TaxonE-->SomeRelationship-->TaxonF
...
we need to be able to find the specimen by any attribute of any of the involved 
taxa.
Furthermore, we need to be able to crawl relationships "backwards" - we need to 
find
specimens attached to TaxonF by TaxonA attributes.

We may also need to prioritize results by distance and relationship type.

Original comment by dust...@gmail.com on 13 May 2010 at 6:58