nvs-vocabs / OBISVocabs

A repository for the management of issues related to vocabularies used by the OBIS community.
3 stars 0 forks source link

request for new P01 term: bodyPart #25

Open sformel-usgs opened 6 months ago

sformel-usgs commented 6 months ago

Problem:

We need to describe stable isotope measurements from various body parts of the same sampled organism. There are 2-3 measurements per body part (e.g. gill, muscle, shell) per organism. The S12 terms could completely meet our needs for MeasurementValue but I couldn't find a good way to indicate a MeasurmentType of bodyPart. I don't think it's useful to incorporate it into many terms as a subcomponent because (1) it would be useful to be able to group this data by body part value and (2) it will result in taxon x body part # of terms, which feels excessive.

Request

Create a new P01 term, bodyPart. This term would be used in eMOF MeasurementType and allow the specification of terms from S12 as MeasurementValue.

Suggested Definition

any part of an organism, such as an organ or extremity defined in vocabulary S12.

roswri commented 5 months ago

Hi @sformel-usgs,

As you have pointed out, there are two ways to handle this within the Darwin Core format:

  1. Create new P01 codes for each combination of stable isotope and body part e.g. "Concentration of [stable isotope] per unit dry/wet weight of biota {biological entity specified elsewhere [Subcomponent: gill/muscle/shell]}" this would allow you to have one sample event for each organism rather than one event per body part per organism. There would be more parameter codes, they would not be challenging to create, but potentially more complicated to keep track of in your data.
  2. Create new P01 codes for each stable isotope e.g. "Concentration of [stable isotope] per unit dry/wet weight of biota {biological entity specified elsewhere}", and a new P01 for "Body part of biological entity specified elsewhere". I think for this option to work you would need to have one event per body part per organism so you could then link the eMOF for the concentration measurement and the eMOF describing the body part.

If I understand correctly there's 2-3 stable isotope measurements per body part which is only ~9 P01 terms if we went with option 1. Option 2 could result in a more complicated event table. Could you let me know which option you would prefer?

Many thanks, Roseanna

sformel-usgs commented 4 months ago

@roswri thank you for the thoughtful response, and my apologies for the delayed answer. I would prefer the second option, with the new P01 term for body part. I think this aligns with the response from #28.

gwemon commented 4 months ago

@sformel-usgs Option 2 is indeed attractive and it saves us from creating many combinations for various body parts however we discussed this ticket at the OBIS vocab group meeting today and we were wondering how would the appropriate body part be linked to their respective stable isotope measurement value if they sit under the same occurrence ID? I understand that it is not currently possible to do this. Could you explain how you would see this working?

EliLawrence commented 4 months ago

I found the parentMeasurementID thread we were wondering about yesterday (https://github.com/tdwg/dwc/issues/362) and based on that, I wonder if such data could be formatted like:

Occurrence table eventID occurrenceID occurrenceStatus basisOfRecord scientificName
e-1 occ1 present materialSample Salmo salar
e-1 occ2 present materialSample Salmo salar

eMoF Table 

eventID occurrenceID parentMeasurementID measurementID measurementType measurementValue
e-1 occ1   occ1_gill body part gill
e-1 occ1 occ1_gill occ1_gill_isotope Concentration of isotope 10
e-1 occ1   occ1_muscle body part muscle
e-1 occ1 occ1_muscle occ1_muscle_isotope Concentration of isotope 20
e-1 occ2   occ2_gill body part gill
e-1 occ2 occ2_gill occ2_gill_isotope Concentration of isotope 10
e-1 occ2   occ2_muscle body part muscle
e-1 occ2 occ2_muscle occ2_muscle_isotope Concentration of isotope 20

But I am not sure if parentMeasurementID is implemented in the IPT, or if there is an issue in having multiple measurementType: body part linked to the same occurenceID..

sformel-usgs commented 2 months ago

Thank you for the good conversation yesterday at the OBIS vocab meeting. @gwemon , you are correct that the nested MoF won't work in the current eMOF/MOF extension. Even with parentMeasurementID (which would need to be added to eMOF) it is a bit tricky to model this correctly. Here are some updates on our thoughts:

The Plot Thickens

It turns out that we originally misinterpreted how the data was collected. Now there are three subsamples of each body part that were each measured for four isotopes, resulting in 36 structured measurements per organism. So, the data looks like this:


graph TD

Organism -->bp["Body Part x 3"]
bp -->ss["subsample x 3"]
ss --> N_iso["N Isotope"] & C_iso["C Isotope"] & O_iso["O Isotope"] & S_iso["S Isotope"]

Yesterday we explored what @EliLawrence suggested above and toyed with abusing DwC event core, there were moments where it seemed like we were nearing solutions. But after thinking about it for another day, I'm not satisfied with anything we came up with. Here are some challenges we encountered:

  1. parentMeasurementID and nesting MoF would imply multiple simultaneous states of the occurrence. Theoretically we could link them to materialEntityID, organismID, or create child events for the body part and subsampling events, but these either wouldn't work in the current implementation of OBIS/GBIF, or they would be an abuse of the way things are intended to work. I'm confident that this type of structure will be able to handled in the near future as the GBIF/OBIS data models evolve, but we're not there yet.

  2. Creating extremely specific measurementTypes like the first suggestion by @gwemon , could work, but I'm not sure how to handle the subsampling aspect of it. Can the semantic model handle the incorporation of the replicate identifiers minted in https://github.com/nvs-vocabs/P01/issues/207?

Practical Solution

@kylieh10 and I are going to publish the occurrences (i.e. collected organisms), since that should remain stable. Then we can devote time to finding a good solution for the subsampling and chemistry.

gwemon commented 1 month ago

Thank you @EliLawrence and @sformel-usgs - I think that the values/results obtained from replication should ideally be handled in the data model rather than in the parameter code. What I understand the P01 codes created in https://github.com/nvs-vocabs/P01/issues/207 allow OBIS users to do is identify the replicate but I have insufficient knowledge of the OBIS schema structure to see how it could provide a pointer to the results of individual replicate. I might need time to sit down with a knowledgeable OBIS schema expert to show me how this could be done. On the other hand, for your combinations @sformel-usgs what we could do is what @roswri suggested in her comment of 4th June: Concentration of [stable isotope] per unit [dry]/[wet] weight of biota {biological entity specified elsewhere [Subcomponent: [gill]/[muscle]/[shell]} I know it might seem complexe to have it all in one P01 term but the P01 is backed up by a semantic model that is machine-actionable via linked data and sparql endpoint. A software code can decompose the elements without overcharging the eMOF and DwCA format - all you need is an occurrence_Id (that specifies the biological entity taxonomic id) and the event/sub-event_ID. Advantage as well is that it would be easily compatible/convertible to the EMODnet chemistry recommended format for contaminants in biota (I know these are not but the pattern of P01 construction is the same).

One question I had: please could you confirm the units of these values please? and also double check whether the results are expressed relative to dry/wet weight or something else? Many thanks.

JoBeja commented 1 month ago

Hi Gwen, We discussed this at the last vocab meeting and it was more or less agreed by all that the initial approach that @sformel-usgs was thinking of was a bit abusive of the standard. In the meantime @EliLawrence and @sformel-usgs have started exploring a different option, which doesn't try to expand the standard where not possible and still allows for the inclusion of replicates in different body parts. Please do check the meeting notes. @sformel-usgs have I summarise it adequately?

gwemon commented 1 month ago

Thanks @JoBeja - Maybe the outcome of the discussion could be summarised here? I'll check the notes.