Open gaurav opened 5 years ago
A related term is TaxonConcept:circumscribedBy, which is used to indicate a "specimen that forms part of the circumscription of this taxon". Therefore, we could instead state that a phyloreference:
phyloref:includes_TU some (TaxonConcept:circumscribedBy some (dwc:organismID value "specimen voucher number"))
I'm not sure we gain anything with this additional complexity, however.
I did a quick survey of how other RDF resources record specimen identifiers. Most use separate fields for dwc:collectionID
and dwc:catalogNumber
rather than a single field that combines both pieces of information. Specimens either have an rdf:type
of dsw:Specimen
(e.g. Phenoscape) or of dwc:Occurrence
with a dwc:basisOfRecord
(e.g. GBIF's Beginner's Guide to Persistent Identifiers, BiSciCol Triplifier with example).
When combining these fields into a single field for a specimen identifier, dwc:occurrenceID
appears to be the correct place to put the Darwin Core Triple rather than dwc:organismID
-- for example, iDigBio uses the latter to store a dataset-specific identifier in this example while VertNet uses occurrenceID
but not organismID
.
It therefore looks like we should choose between:
phyloref:includes_TU some (dwc:occurrenceID value "urn:catalog:[institutionID]:[collectionID]:[catalogNumber]" and dwc:basisOfRecord value https://terms.tdwg.org/wiki/dwc:PreservedSpecimen)
or
phyloref:includes_TU some (dwc:institutionID value "[institutionID]" and dwc:collectionID value "[collectionID]" and dwc:catalogNumber value "[catalogNumber]" and dwc:basisOfRecord value https://terms.tdwg.org/wiki/dwc:PreservedSpecimen)
I prefer the dwc:occurrenceID
approach since it is more compact and easier to read. We could also add support for using a URI in that field.
Semantic Darwin Core adds a new type of dsw:Token
-- a token is derived from a dwc:Organism
and is evidence for an dwc:Occurrence
. I don't think we need this extra layer of complexity.
How does OpenBioDiv do this? I don't recall off the top of my head, but it's certainly worth checking.
OpenBioDiv appears to defer to Darwin-SW when it comes to encoding occurrences (see https://github.com/pensoft/OpenBiodiv/issues/14 or the OpenBioDiv-O paper). I couldn't find an example of encoded occurrence data in its Github repository: the closest I could find was the use of dwcFP:hasOccurrenceID
to record a dataset-specific occurrence ID (example).
Looking through the OpenBioDiv repository reminded me that the TDWG Ontology used to have a Specimen OWL class with a specimenID
property, but that class (along with its planned successor, TaxonOccurrence) have been deprecated since 2015.
I also found an entry in the Darwin Core RDF Guide that recommends the use of dwc:basisOfRecord/institutionCode/collectionCode/catalogNumber.
In Model 2.0, we represent scientific name-based taxonomic units as an OWL restriction in the form:
I think a phyloreference that includes a TU represented by a single specimen counts as a single dwc:Organism. In that case, we could say it:
Unfortunately, we don't have a lot of examples of clade definitions that use specimen identifiers as taxonomic units. The best ones we've seen are in Fisher et al, 2007, which defines a few clade definitions that use specimens as specifiers:
Note that the specimen-based specifiers are completely redundant with scientific-name-based specifiers is two out of the three cases, and none of these specifiers use globally unique identifiers.
I propose that we use
dwc:organismID
for now, but possibly re-evaluate this once we have more phyloreferences with specimen identifiers to look at.