tdwg / dwc

Darwin Core standard for sharing of information about biological diversity.
https://dwc.tdwg.org
Creative Commons Attribution 4.0 International
203 stars 70 forks source link

New Term - verbatimMeasurementType #518

Open sformel-usgs opened 1 month ago

sformel-usgs commented 1 month ago

New term

TL;DR: to preserve linkage to a bespoke MOF term when mapping to a controlled MOF term. Very similar to the discussion in #181.

One of the challenges of effectively implementing the MeasurementOrFact and extendedMeasurementOrFact extensions is reducing the noise across datasets by mapping bespoke terms to preferred terms (e.g. the OBIS community's preference for BODC terms). If the preferred terms aren't used in original data, then the original term is lost in the mapping. That loss can be difficult for data providers to be at peace with, and they will hesitate to accept the extensions as effective solutions for publishing their data. We think it's logical to ask data providers to map to standardized MOF terms during publishing, like we do for taxonomy and georeferencing. But we also think we should try to preserve the original term, so downstream users can have additional metadata that might provide context when examining a published paper, or raw version of the data.

Proposed attributes of the new term:

verbatimMeasurementType measurementType measurementTypeID
water_temp Temperature of the water body http://vocab.nerc.ac.uk/collection/P01/current/TEMPPR01/
Fish biomass Wet weight biomass of biological entity specified elsewhere per unit area of the bed http://vocab.nerc.ac.uk/collection/P01/current/SDBIOL05/
sampling net mesh size Mesh size of sample collector http://vocab.nerc.ac.uk/collection/P01/current/MSHSIZE1/
jdpye commented 1 month ago

I like this proposal from my data curator perspective! This would be helpful to record original names of biological measurements which have evolved within regional or cultural groups and map well to BODC, but their original names are relevant and might be integral to the information as it was collected.

ymgan commented 1 month ago

Thank you so much Steve! We would like to support this proposal because we want to use a general vocabulary for certain measurements in measurementType and measurementTypeID while keeping the nuance in verbatimMeasurementType.

Our current challenge

At this moment, we (the antarctic OBIS/GBIF node) are placing the verbatim under measurementType which can be different from what is shown in the source of measurementTypeID. This is because the wordings of the same measurementType can be slightly different than what our data provider use in their report/paper and often even contain important details.

measurementType measurementTypeID
The δ13C measured in the considered sample, expressed in per mille and relative to the international reference Vienna Pee Dee Belemnite. https://vocab.nerc.ac.uk/collection/P01/current/C13BTX01/

Cleaner solution

verbatimMeasurementType measurementType measurementTypeID
The δ13C measured in the tegument of the considered sea star specimen, expressed in per mille and relative to the international reference Vienna Pee Dee Belemnite. Enrichment with respect to Vienna Pee Dee Belemnite (VPDB) of carbon-13 {13C CAS 14762-74-4} {delta(13)C} in biota {biological entity specified elsewhere} by mass spectrometry http://vocab.nerc.ac.uk/collection/P01/current/C13BTX01/
The δ13C measured in the adductor muscle of the considered mussel specimen, expressed in per mille and relative to the international reference Vienna Pee Dee Belemnite. Enrichment with respect to Vienna Pee Dee Belemnite (VPDB) of carbon-13 {13C CAS 14762-74-4} {delta(13)C} in biota {biological entity specified elsewhere} by mass spectrometry http://vocab.nerc.ac.uk/collection/P01/current/C13BTX01/

We think that having verbatimMeasurementType will be cleaner as this present consistent information for measurementType and measurementTypeID while allowing the details to be kept under verbatimMeasurementType.

This will help us so much as:

sformel-usgs commented 1 month ago

Just updating to say that the OBIS community is doing a lot of discussion of this suggestion. We plan on discussing it more formally at the next OBIS Vocabulary meeting on Sept 18th and will update this issue with the conclusions.

rubenpp7 commented 1 month ago

Hi everyone,

I totally understand the need of having a place to store the verbatim MeasurementType. Up to this moment in EurOBIS we have been storing the verbatim data under the measurementType field as well, letting it being different to the "name" of the BODC term used.

Please let me know if my understanding is correct, the proposal is to add the BODC term "name" of a concept exactly as it comes in BODC under the measurementType field.

If I got that right and we expect data providers to add this extra value in their submissions, I think that it may suppose an extra amount of work to the data creator that actually belongs more to the data services creators. I see how it is more convenient for us data managers to have these 3 columns close to each other in a table for quality control and filtering data purposes but I wonder if it's really needed to have it at the data standard level.

An alternative would be to let the data services (QC tools, data portals) get used to extract information from BODC just like they do with other vocabulary systems (e.g. WoRMS) in order to quality check and filter data. To me, the point of these vocabularies is to use the ID of a concept to extract all the other information (only) when needed.

Sorry for playing the devil's advocate here, I just think that if we add the "name", what stop us from adding everything else? For example, the deprecated label is also quite relevant. My reasoning is that we should only add new terms to the standard in the case that there is some information that is not being/could not be captured otherwise.

Cheers!