tdwg / gbwg

Genomic Biodiversity Interest Group
Apache License 2.0
18 stars 2 forks source link

Structuring sequence based data based on sampling event core or occurrence core #25

Open ymgan opened 3 years ago

ymgan commented 3 years ago

Recommendation from guidelines to publish DNA-derived data through biodiversity data platforms

The current recommendation is to publish data as Occurrence core (Category I or II) with the DNA derived data extension. This approach compensates for limitations of the DwC star schema, which would not allow any occurrence-level data in extension files (such as processed barcode sequences) to point to records in an event core file. We do, however, recommend including an eventID for each core record, to indicate the association between occurrences derived from the same sampling event

Concern raised during meeting on 2021-02-09

using the extended Measurements of Fact extension, if another extension is added they can only use the occurrence core, no longer the event core.

Challenge because a lot of the observations include environmental data, image based data and aggregating everything into events is what their community wants.

Similarly I also think that many environmental measurements such as depth related terms will have to be replicated many times for each occurrence record.

tucotuco commented 3 years ago

The XML documents used by GBIF to define "cores" and "extensions" of Darwin Core are implementations providing structure using Darwin Core and other terms, but the XML documents are not part of any standard. The Darwin Core standard itself has as its current scope the definitions of terms, but nothing about the structure or semantic relationships between the terms other than the evolution of term versions.

tucotuco commented 3 years ago

My previous comment was a way of checking the scope of the Task Group. I am happy to help with figuring out the best way to model using the existing implementations, changing them, or making new ones.