how to represent biological replicates and technical replicates

ymgan commented 3 years ago

Hey,

So relationship between samples was brought up in MIxS meeting yesterday. It makes me wonder what is the best way to represent biological replicates and technical replicates of sequence based data in DwC?

We do, however, recommend including an eventID for each core record, to indicate the association between occurrences derived from the same sampling event.

Can I understand that this can be used to address technical replicates since technical replicates are sub-samples of a sample? What about biological replicates? and what do you think about the use of resource relationship extension?

@timrobertson100 Do you think that perhaps this is interesting to be included in the the guidelines to publish DNA-derived data through biodiversity data platforms?

I just gonna dump some other information that I can find here: Please see this google doc for more info.

If I understand correctly, the attributes as mentioned in ENA and INSDC samples are not part of MIxS? (can someone please correct me if I am wrong? thank you) But I think this information is important and should be represented in DwC imho.

ENA

Parent child relationships are established via ‘sample composed of’ and ‘sample derived from’ attributes.

These attributes are captured within the child sample and their value is a list of INSDC sample IDs (comma separated) or a range (no spaces) e.g.:

sample derived from: ERS123456,ERS123654,ERS123123 sample composed of: SAME1234567-SAME1235000

Example use: https://www.ebi.ac.uk/ena/browser/view/SAMEA6150246 (click additional attributes)

GGBN

uses resource relationships from DwC

NCBI

NCBI uses the same relationship attributes that are present in EBI BioSample: (you can view BioSample attribute definitions here: https://www.ncbi.nlm.nih.gov/biosample/docs/attributes/)

same as indicates that the same physical sample has multiple BioSample records
https://www.ncbi.nlm.nih.gov/biosample/?term=SAMEA4447240

derived from indicates where one BioSample was derived from another BioSample
https://www.ncbi.nlm.nih.gov/biosample/?term=SAMN13192999

family role relationships to other samples in the same study; can include multiple relationships
https://www.ncbi.nlm.nih.gov/biosample/?term=SAMN01090941

child of indicates parentage; only applicable to sexual organisms, for bacteria use 'derived from' https://www.ncbi.nlm.nih.gov/biosample/?term=SAMN00690380

Thank you so much!

tucotuco commented 3 years ago

There are multiple ways to make associations in Darwin Core. The richest is through the ResourceRelationship Class (https://dwc.tdwg.org/terms/#resourcerelationship).

maxmaronna commented 3 years ago

Hello, I am interested in mapping MIXS IDs on NCBI datasets. I did not find an unique sheet with all MIXS IDs in just one single tab (version 5), so I created one online: https://docs.google.com/spreadsheets/d/1O8UZ5Myqylpdpk94G_zo2r0LN1hwWvOzRQ0SHDDsQEM/edit?usp=sharing Now, I did not find a proper sheet to map MIXS IDS on all NCBI metadata. Do you know if there is anything like this on any website (even outside the NCBI website)? Thanks in advance.

gdadade commented 3 years ago

You can start here with the feature table: http://www.insdc.org/files/feature_table.html I'm not sure if INSDC supports ALL MIxS terms, but most of them are fully supported.

GGBN uses both Darwin Core and ABCD, in ABCD relationships can be described in multiple ways, for this kind of relationship we use the UnitAssociation (similar to ResourceRelationshipClass in DwC)

maxmaronna commented 3 years ago

Hello Gabi,

Thank you for your answer and tips. I know that INSDC webpage (very helpful indeed). I will try to check with INSDC about MIxS support (at least which ones are supported and their respective MIxS codes).

Best maxmaronna

tdwg / gbwg

how to represent biological replicates and technical replicates #24

ENA

GGBN

NCBI

Related issues: