phyloref / clade-ontology

Ontology of Phylogenetic Clade Definitions
MIT License
1 stars 0 forks source link

Determine how to represent specimens as OWL restrictions #61

Open gaurav opened 5 years ago

gaurav commented 5 years ago

In Model 2.0, we represent scientific name-based taxonomic units as an OWL restriction in the form:

phyloref:includes_TU some (tc:hasName some (ICZN_Name and dwc:scientificName value "scientific name"))

I think a phyloreference that includes a TU represented by a single specimen counts as a single dwc:Organism. In that case, we could say it:

phyloref:includes_TU some (dwc:organismID value "specimen voucher number")

Unfortunately, we don't have a lot of examples of clade definitions that use specimen identifiers as taxonomic units. The best ones we've seen are in Fisher et al, 2007, which defines a few clade definitions that use specimens as specifiers:

Phyloreference Specifier Identified as Note
Leucophanes Wall 2527, Fiji (uc) Exodictyon incrassatum (Mitt.) Cardot Only external specifier for this definition
Exostratum Mishler 7/24/98(3), Queensland, Australia (uc) Exostratum blumii (Nees ex Hampe) L.T. Ellis The entire genus of Exostratum is an additional internal specifier
Arthrocormus Mishler 7/24/98 (5) Queensland, Australia (UC) Arthrocormus schimperi Dozy & Molk The species A. schimperi is listed separately as an internal specifiers

Note that the specimen-based specifiers are completely redundant with scientific-name-based specifiers is two out of the three cases, and none of these specifiers use globally unique identifiers.

I propose that we use dwc:organismID for now, but possibly re-evaluate this once we have more phyloreferences with specimen identifiers to look at.

gaurav commented 5 years ago

A related term is TaxonConcept:circumscribedBy, which is used to indicate a "specimen that forms part of the circumscription of this taxon". Therefore, we could instead state that a phyloreference:

phyloref:includes_TU some (TaxonConcept:circumscribedBy some (dwc:organismID value "specimen voucher number"))

I'm not sure we gain anything with this additional complexity, however.

gaurav commented 5 years ago

I did a quick survey of how other RDF resources record specimen identifiers. Most use separate fields for dwc:collectionID and dwc:catalogNumber rather than a single field that combines both pieces of information. Specimens either have an rdf:type of dsw:Specimen (e.g. Phenoscape) or of dwc:Occurrence with a dwc:basisOfRecord (e.g. GBIF's Beginner's Guide to Persistent Identifiers, BiSciCol Triplifier with example).

When combining these fields into a single field for a specimen identifier, dwc:occurrenceID appears to be the correct place to put the Darwin Core Triple rather than dwc:organismID -- for example, iDigBio uses the latter to store a dataset-specific identifier in this example while VertNet uses occurrenceID but not organismID.

It therefore looks like we should choose between:

phyloref:includes_TU some (dwc:occurrenceID value "urn:catalog:[institutionID]:[collectionID]:[catalogNumber]" and dwc:basisOfRecord value https://terms.tdwg.org/wiki/dwc:PreservedSpecimen)

or

phyloref:includes_TU some (dwc:institutionID value "[institutionID]" and dwc:collectionID value "[collectionID]" and dwc:catalogNumber value "[catalogNumber]" and dwc:basisOfRecord value https://terms.tdwg.org/wiki/dwc:PreservedSpecimen)

I prefer the dwc:occurrenceID approach since it is more compact and easier to read. We could also add support for using a URI in that field.

Semantic Darwin Core adds a new type of dsw:Token -- a token is derived from a dwc:Organism and is evidence for an dwc:Occurrence. I don't think we need this extra layer of complexity.

hlapp commented 5 years ago

How does OpenBioDiv do this? I don't recall off the top of my head, but it's certainly worth checking.

gaurav commented 5 years ago

OpenBioDiv appears to defer to Darwin-SW when it comes to encoding occurrences (see https://github.com/pensoft/OpenBiodiv/issues/14 or the OpenBioDiv-O paper). I couldn't find an example of encoded occurrence data in its Github repository: the closest I could find was the use of dwcFP:hasOccurrenceID to record a dataset-specific occurrence ID (example).

Looking through the OpenBioDiv repository reminded me that the TDWG Ontology used to have a Specimen OWL class with a specimenID property, but that class (along with its planned successor, TaxonOccurrence) have been deprecated since 2015.

I also found an entry in the Darwin Core RDF Guide that recommends the use of dwc:basisOfRecord/institutionCode/collectionCode/catalogNumber.