phenopackets / phenopacket-schema

Repository for the GA4GH phenopacket schema
https://phenopacket-schema.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
75 stars 28 forks source link

Biosamples information - add a preservation method #145

Closed allisonheath closed 3 years ago

allisonheath commented 5 years ago

Reviewing the biospecimen components of GDC the information is typically broken up into:

No claims to how well this represents biosample information for computational use, but it'll probably be a question of how these map (or don't map) to phenopacket fields.

Just looking at the docs perhaps:

Probably worth a check against the ICGC data dictionary as well: https://docs.icgc.org/dictionary/viewer/#?viewMode=details&dataType=specimen

julesjacobsen commented 5 years ago

Thanks for the links @allisonheath that's a really nice resource.

It's up to the users to decide on how the calls are created. If its a standard requirement for the preservation method to be specified, we could easily add this as an OntologyClass.

@pnrobinson any objections?

pnrobinson commented 5 years ago

This could inherit from https://www.ebi.ac.uk/ols/ontologies/ncit/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FNCIT_C49337 (Tissue Preparation or Processing Technique)

julesjacobsen commented 5 years ago

There is no 'frozen'/ 'fresh frozen' concept there..

pnrobinson commented 5 years ago

Freezing is under "action" https://www.ebi.ac.uk/ols/ontologies/ncit/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FNCIT_C48160

julesjacobsen commented 5 years ago

freezing != frozen

julesjacobsen commented 5 years ago

Computationally, I'm not sure how useful this will be. I'm going to put this as a 1.1.0 feature for the time being as I personally don't have enough knowledge on this to say what is or isn't useful and would want feedback from someone who is knowledgeable on these matters before adding it. It might be that we also need storage as well as preservation for this to be useful.

allisonheath commented 5 years ago

Typically, the most important thing is whether it's FFPE - which results in noisier genomic data, but can be corrected for if you're aware. The secondary is that sometimes you'll see batch affects with other preservation methods.

pnrobinson commented 5 years ago

@allisonheath Are there any examples of how this information is structured that we could consult? It seems there are many possible ways of doing so

allisonheath commented 5 years ago

Digging through NCIt a bit more, looks like potentially you have to go up to Laboratory Procedure to find a common parent for FFPE and Cryopreservation (aka Frozen)

Edit: fixed above links

Many of the others I'm aware of are more biobanking operations oriented, e.g. https://github.com/biobanking/biobanking/. Or at a high level description: https://biospecimens.cancer.gov/bestpractices/to/bcpsrd.asp

pnrobinson commented 5 years ago

@allisonheath We are having difficulty with this item. It seems there is preparation, preservation, storage, etc., and probably we do not want to go into this level of detail? Is there a case to be made for sample_processing and recommending terms such as FFPE and Cryopreservation as above? @julesjacobsen

pnrobinson commented 3 years ago

@allisonheath we are hoping to finalize the v2 in the next few months. Can we touch bases about this?

julesjacobsen commented 3 years ago

https://docs.icgc-argo.org/dictionary Specimen

specimen storage

Cut slide Frozen in -70 freezer Frozen in liquid nitrogen Frozen in vapour phase Not Applicable Other Paraffin block RNA later frozen

specimen processing

Cryopreservation in liquid nitrogen (dead tissue) Cryopreservation in dry ice (dead tissue) Cryopreservation of live cells in liquid nitrogen Cryopreservation - other Formalin fixed & paraffin embedded Formalin fixed - buffered Formalin fixed - unbuffered Fresh Other

julesjacobsen commented 3 years ago

Like @allisonheath pointed out, a potential ontology for this might be OBIB, specifically children of OBI_0001472 (specimen with known storage state) and OBI_0000047 (processed material), but these don't map precisely.

These would be added to Biosample

message biosample {
  // other fields above...
  OntologyClass biosample_storage = 1;
  OntologyClass biosample_processing = 2;
}

e.g.

biosample:
    biosampleStorage:
        id: OMIABIS:0000053
        label: flash frozen specimen
    biosampleProcessing:
        id: OBI:0000971
        label: fresh specimen

or lyophilised RNA

biosample:
    biosampleStorage:
        id: OBI:0000922
        label: frozen specimen
    biosampleProcessing:
        id: OBI:0000965
        label: lyophilized specimen

but the storage and processing concepts can get mixed-up.

Another possible ontology could be the EFO and subterms of OBI:0100051 note that these are all riffing on / extending OBI