Open ymgan opened 3 years ago
There are multiple ways to make associations in Darwin Core. The richest is through the ResourceRelationship Class (https://dwc.tdwg.org/terms/#resourcerelationship).
Hello, I am interested in mapping MIXS IDs on NCBI datasets. I did not find an unique sheet with all MIXS IDs in just one single tab (version 5), so I created one online: https://docs.google.com/spreadsheets/d/1O8UZ5Myqylpdpk94G_zo2r0LN1hwWvOzRQ0SHDDsQEM/edit?usp=sharing Now, I did not find a proper sheet to map MIXS IDS on all NCBI metadata. Do you know if there is anything like this on any website (even outside the NCBI website)? Thanks in advance.
You can start here with the feature table: http://www.insdc.org/files/feature_table.html I'm not sure if INSDC supports ALL MIxS terms, but most of them are fully supported.
GGBN uses both Darwin Core and ABCD, in ABCD relationships can be described in multiple ways, for this kind of relationship we use the UnitAssociation (similar to ResourceRelationshipClass in DwC)
Hello Gabi,
Thank you for your answer and tips. I know that INSDC webpage (very helpful indeed). I will try to check with INSDC about MIxS support (at least which ones are supported and their respective MIxS codes).
Best maxmaronna
Hey,
So relationship between samples was brought up in MIxS meeting yesterday. It makes me wonder what is the best way to represent biological replicates and technical replicates of sequence based data in DwC?
As recommended in the guidelines to publish DNA-derived data through biodiversity data platforms:
Can I understand that this can be used to address technical replicates since technical replicates are sub-samples of a sample? What about biological replicates? and what do you think about the use of resource relationship extension?
@timrobertson100 Do you think that perhaps this is interesting to be included in the the guidelines to publish DNA-derived data through biodiversity data platforms?
I just gonna dump some other information that I can find here: Please see this google doc for more info.
If I understand correctly, the attributes as mentioned in ENA and INSDC samples are not part of MIxS? (can someone please correct me if I am wrong? thank you) But I think this information is important and should be represented in DwC imho.
ENA
Parent child relationships are established via ‘sample composed of’ and ‘sample derived from’ attributes.
These attributes are captured within the child sample and their value is a list of INSDC sample IDs (comma separated) or a range (no spaces) e.g.:
sample derived from: ERS123456,ERS123654,ERS123123 sample composed of: SAME1234567-SAME1235000
Example use: https://www.ebi.ac.uk/ena/browser/view/SAMEA6150246 (click additional attributes)
GGBN
uses resource relationships from DwC
NCBI
NCBI uses the same relationship attributes that are present in EBI BioSample: (you can view BioSample attribute definitions here: https://www.ncbi.nlm.nih.gov/biosample/docs/attributes/)
same as
indicates that the same physical sample has multiple BioSample recordshttps://www.ncbi.nlm.nih.gov/biosample/?term=SAMEA4447240
derived from
indicates where one BioSample was derived from another BioSamplehttps://www.ncbi.nlm.nih.gov/biosample/?term=SAMN13192999
family role
relationships to other samples in the same study; can include multiple relationshipshttps://www.ncbi.nlm.nih.gov/biosample/?term=SAMN01090941
child of
indicates parentage; only applicable to sexual organisms, for bacteria use 'derived from' https://www.ncbi.nlm.nih.gov/biosample/?term=SAMN00690380Related issues:
https://github.com/microbiomedata/nmdc-metadata/issues/287 https://github.com/GenomicsStandardsConsortium/mixs/issues/36
Thank you so much!