iobis / obis-issues

Repository for all OBIS related issues and feature requests
5 stars 3 forks source link

Occurrence extension: subsampling of an individual #170

Closed ymgan closed 3 years ago

ymgan commented 4 years ago

Hey,

I have a question about how to document sub-sampling of an individual in occurrence extension (sampling event core). For example:

Station (sampling event core)
|__ amphipod trap (sub-event)
    |__ amphipod, freeze whole individual (occurrence extension)
        |__ pleopod, preserved in ethanol
        |__ pleopod, preserved in DNA/RNA shield reagent

It will be odd for the pleopod(s) to be a separate occurrence record because they are basically derived from the amphipod occurrence record.

How would you advice to structure this type of information? Thank you so much!

pieterprovoost commented 4 years ago

Would the MaterialSample extension suit your needs? See https://www.idigbio.org/content/darwin-core-hour-making-dna-and-tissue-collections-available-using-ggbn-extensions-ipt

ymgan commented 4 years ago

Hey Pieter,

Sorry for my very late reply. I had a look at the ggbn extensions that you suggested. I am wondering if this model is appropriate? Actually the only information that I have for material sample extension is the MaterialSampleType ...

[event core] station
|__ [event core] amphipod trap 
     |__[occ ext] amphipod --- [preservation ext] freeze
         |    |
         |    [resource relationship ext] subsample 
         |    |__[material sample ext] pleopod 1 -- [resource relationship ext] preserved in --  [preservation ext] ethanol
         |
         [resource relationship ext] subsample
         |__[material sample ext] pleopod 2 -- [resource relationship ext] preserved in -- [preservation ext] DNA/RNA shield reagent

There is no associated sequences with these samples (yet). Do I understand correctly that I shouldn't add the extension's id to another extension (star schema) in this case - adding occurrence id to its corresponding material sample extension record and hence, this relationship has to be described with resource relationship extension? Is my understanding correct?

I am also thinking about using the following model - it is simpler, but resulting in duplication of information: station, amphipod trap needs to be duplicated for each occurrence record.

[occ core] station, amphipod trap, amphipod --- [preservation ext] freeze
|__[material sample ext] pleopod 1=tissue -- [resource relationship ext] preserved in -- [preservation ext] ethanol
|__[material sample ext] pleopod 2=tissue --[resource relationship ext] preserved in -- [preservation ext] DNA/RNA shield reagent

Or can I just put everything in occurrence extension? But in this case, the occurrence is duplicated. How do I document that it is the pleopod that is referred to for the occurrence record then?

[event core] station 
|__ [event core] amphipod trap (sub-event)
    |__ [occurrence ext] amphipod, preparations=freeze whole individual, basisOfRecord=PreservedSpecimen 
    |__ [occurrence ext] pleopod,  preparations=preserved in ethanol, basisOfRecord=MaterialSample
    |__ [occurrence ext] pleopod,  preparations=preserved in DNA/RNA shield reagent, basisOfRecord=MaterialSample

On top of the aforementioned information, there are also pictures of both the whole individual (occurrence record) as well as their pleopod (material sample extension). Do I need to use resource relationship extension to point the record in simple multimedia extension to the material sample extension record for the images of pleopod?

How would you model this type of data? Thank you so much!

pieterprovoost commented 4 years ago

I would model this dataset using an Event core and an Occurrence extension, linking all other extensions (MaterialSample, Multimedia, Preservation, possibly MIxS) with ResourceRelationship. Linking the Preservation extension to the Occurrence extension as in your first example is not going to work because of the star schema. If you want to simplify a bit I think you should get rid of the Preservation first.

But it wouldn't hurt to get a second opinion on this, possibly through TDWG. The current GBIF recommendation for sequence based data is to use Occurrence with MIxS, but your case is more complex with subsamples and images. There's also some ongoing work to move to a more relational data model: https://docs.gbif-uat.org/advancing-ipt-biocase-toolkits/en/ That should solve some of your problems.