Open sformel-usgs opened 6 months ago
Hi @sformel-usgs,
As you have pointed out, there are two ways to handle this within the Darwin Core format:
If I understand correctly there's 2-3 stable isotope measurements per body part which is only ~9 P01 terms if we went with option 1. Option 2 could result in a more complicated event table. Could you let me know which option you would prefer?
Many thanks, Roseanna
@roswri thank you for the thoughtful response, and my apologies for the delayed answer. I would prefer the second option, with the new P01 term for body part. I think this aligns with the response from #28.
@sformel-usgs Option 2 is indeed attractive and it saves us from creating many combinations for various body parts however we discussed this ticket at the OBIS vocab group meeting today and we were wondering how would the appropriate body part be linked to their respective stable isotope measurement value if they sit under the same occurrence ID? I understand that it is not currently possible to do this. Could you explain how you would see this working?
I found the parentMeasurementID thread we were wondering about yesterday (https://github.com/tdwg/dwc/issues/362) and based on that, I wonder if such data could be formatted like:
Occurrence table eventID | occurrenceID | occurrenceStatus | basisOfRecord | scientificName |
---|---|---|---|---|
e-1 | occ1 | present | materialSample | Salmo salar |
e-1 | occ2 | present | materialSample | Salmo salar |
eMoF Table
eventID | occurrenceID | parentMeasurementID | measurementID | measurementType | measurementValue |
---|---|---|---|---|---|
e-1 | occ1 | occ1_gill | body part | gill | |
e-1 | occ1 | occ1_gill | occ1_gill_isotope | Concentration of isotope | 10 |
e-1 | occ1 | occ1_muscle | body part | muscle | |
e-1 | occ1 | occ1_muscle | occ1_muscle_isotope | Concentration of isotope | 20 |
e-1 | occ2 | occ2_gill | body part | gill | |
e-1 | occ2 | occ2_gill | occ2_gill_isotope | Concentration of isotope | 10 |
e-1 | occ2 | occ2_muscle | body part | muscle | |
e-1 | occ2 | occ2_muscle | occ2_muscle_isotope | Concentration of isotope | 20 |
But I am not sure if parentMeasurementID is implemented in the IPT, or if there is an issue in having multiple measurementType: body part linked to the same occurenceID..
Thank you for the good conversation yesterday at the OBIS vocab meeting. @gwemon , you are correct that the nested MoF won't work in the current eMOF/MOF extension. Even with parentMeasurementID
(which would need to be added to eMOF) it is a bit tricky to model this correctly. Here are some updates on our thoughts:
It turns out that we originally misinterpreted how the data was collected. Now there are three subsamples of each body part that were each measured for four isotopes, resulting in 36 structured measurements per organism. So, the data looks like this:
graph TD
Organism -->bp["Body Part x 3"]
bp -->ss["subsample x 3"]
ss --> N_iso["N Isotope"] & C_iso["C Isotope"] & O_iso["O Isotope"] & S_iso["S Isotope"]
Yesterday we explored what @EliLawrence suggested above and toyed with abusing DwC event core, there were moments where it seemed like we were nearing solutions. But after thinking about it for another day, I'm not satisfied with anything we came up with. Here are some challenges we encountered:
parentMeasurementID
and nesting MoF would imply multiple simultaneous states of the occurrence. Theoretically we could link them to materialEntityID
, organismID
, or create child events for the body part and subsampling events, but these either wouldn't work in the current implementation of OBIS/GBIF, or they would be an abuse of the way things are intended to work. I'm confident that this type of structure will be able to handled in the near future as the GBIF/OBIS data models evolve, but we're not there yet.
Creating extremely specific measurementType
s like the first suggestion by @gwemon , could work, but I'm not sure how to handle the subsampling aspect of it. Can the semantic model handle the incorporation of the replicate identifiers minted in https://github.com/nvs-vocabs/P01/issues/207?
@kylieh10 and I are going to publish the occurrences (i.e. collected organisms), since that should remain stable. Then we can devote time to finding a good solution for the subsampling and chemistry.
Thank you @EliLawrence and @sformel-usgs - I think that the values/results obtained from replication should ideally be handled in the data model rather than in the parameter code. What I understand the P01 codes created in https://github.com/nvs-vocabs/P01/issues/207 allow OBIS users to do is identify the replicate but I have insufficient knowledge of the OBIS schema structure to see how it could provide a pointer to the results of individual replicate. I might need time to sit down with a knowledgeable OBIS schema expert to show me how this could be done. On the other hand, for your combinations @sformel-usgs what we could do is what @roswri suggested in her comment of 4th June: Concentration of [stable isotope] per unit [dry]/[wet] weight of biota {biological entity specified elsewhere [Subcomponent: [gill]/[muscle]/[shell]} I know it might seem complexe to have it all in one P01 term but the P01 is backed up by a semantic model that is machine-actionable via linked data and sparql endpoint. A software code can decompose the elements without overcharging the eMOF and DwCA format - all you need is an occurrence_Id (that specifies the biological entity taxonomic id) and the event/sub-event_ID. Advantage as well is that it would be easily compatible/convertible to the EMODnet chemistry recommended format for contaminants in biota (I know these are not but the pattern of P01 construction is the same).
One question I had: please could you confirm the units of these values please? and also double check whether the results are expressed relative to dry/wet weight or something else? Many thanks.
Hi Gwen, We discussed this at the last vocab meeting and it was more or less agreed by all that the initial approach that @sformel-usgs was thinking of was a bit abusive of the standard. In the meantime @EliLawrence and @sformel-usgs have started exploring a different option, which doesn't try to expand the standard where not possible and still allows for the inclusion of replicates in different body parts. Please do check the meeting notes. @sformel-usgs have I summarise it adequately?
Thanks @JoBeja - Maybe the outcome of the discussion could be summarised here? I'll check the notes.
Problem:
We need to describe stable isotope measurements from various body parts of the same sampled organism. There are 2-3 measurements per body part (e.g. gill, muscle, shell) per organism. The S12 terms could completely meet our needs for
MeasurementValue
but I couldn't find a good way to indicate a MeasurmentType ofbodyPart
. I don't think it's useful to incorporate it into many terms as a subcomponent because (1) it would be useful to be able to group this data by body part value and (2) it will result in taxon x body part # of terms, which feels excessive.Request
Create a new P01 term,
bodyPart
. This term would be used in eMOFMeasurementType
and allow the specification of terms from S12 asMeasurementValue
.Suggested Definition
any part of an organism, such as an organ or extremity defined in vocabulary S12.