phenopackets / phenopacket-format

26 stars 10 forks source link

Variant representation #10

Open pnrobinson opened 8 years ago

pnrobinson commented 8 years ago

We should discuss how to best represent variants. Probably we need something flexible like

HGVS NM_123:c.-123C>T with various types that also work for chromosomes, microdeletions, and other sets of findings that might be protein biomarkers etc, so that this standard can be used with a wide range of diseases and publications.
cmungall commented 8 years ago

Apologies, the commit above appears to be unrelated

This is what we have as an example:

schema: phenopacket-level-1
comment: This is an example phenopacket containing one variant to phenotype association
ontologies:
  - id: hp
    version: "2016-02-01"
variants:
  - id: _:v1
    positions:
      - type: HGVS
        value: "NM_123:c.-123C>T"
phenotype_profile:
  - entity: _:v1
    evidence:
      type: TAS
      source:
        id: PMID:FAKE1234
        title: Mutations in NM_123 cause multisystem proteinopathy and ALS
    phenotype:
      type:
        id: HP:0003560
        label: Muscular dystrophy
      onset:
        type:
          id: HP:0003584
          label: Late onset
      description: blah blah
    created: 2016-01-14
    contributors:
      - id: ORCID:nnnn-nnnn-nnnn

on the one hand this is scope creep. On the other hand this is practically v useful. The approach is to be modular. The variant part is separable, can be represented outside and referenced, or can be embedded in. Same approach for ped.

cmungall commented 8 years ago

Can someone take a shot at making some fake examples, we will derive the model from this

tudorgroza commented 8 years ago

@cmungall @pnrobinson @jmcmurry: Why are we not adapting the MME schema for variants? It is fairly comprehensive and would enable PXF to be aligned with it. If you agree, I can have a first stab at implementing it.

cmungall commented 8 years ago

Can you have a go at a PR on the reference implementation?

There is also the main GA4GH variant representation. But why don't you take a first pass at a PR on the reference implementaion?

On 5 Apr 2016, at 21:18, tudorgroza wrote:

@cmungall @pnrobinson @jmcmurry: Why are we not adapting the MME schema for variants? It is fairly comprehensive and would enable PXF to be aligned with it. If you agree, I can have a first stab at implementing it.


You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/phenopackets/phenopacket-format/issues/10#issuecomment-206112165

pnrobinson commented 8 years ago

Tudor and I just discussed this. I would suggest that we design the format to be easily extensible to other lab abnormalities - say a paper about a protein biomarker and some disease. Or ISCN, glycomics, and metabolomics. Might be a lot for v1 cheers Peter

Dr. med. Peter N. Robinson, MSc. Professor of Medical Genomics Professor of Bioinformatics, Freie Universität Berlin Institut für Medizinische Genetik und Humangenetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Germany +4930 450566006 Mobile: 0160 93769872 peter.robinson@charite.de http://compbio.charite.de http://www.human-phenotype-ontology.org I have learned from my mistakes, and I am sure I can repeat them exactly ORCID ID:http://orcid.org/0000-0002-0736-9199 Scopus Author ID 7403719646 Appointment request: http://doodle.com/pnrobinson


Von: Chris Mungall [notifications@github.com] Gesendet: Mittwoch, 6. April 2016 07:12 An: phenopackets/phenopacket-format Cc: Robinson, Peter Betreff: Re: [phenopackets/phenopacket-format] Variant representation (#10)

Can you have a go at a PR on the reference implementation?

There is also the main GA4GH variant representation. But why don't you take a first pass at a PR on the reference implementaion?

On 5 Apr 2016, at 21:18, tudorgroza wrote:

@cmungall @pnrobinson @jmcmurry: Why are we not adapting the MME schema for variants? It is fairly comprehensive and would enable PXF to be aligned with it. If you agree, I can have a first stab at implementing it.


You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/phenopackets/phenopacket-format/issues/10#issuecomment-206112165

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHubhttps://github.com/phenopackets/phenopacket-format/issues/10#issuecomment-206120620

cmungall commented 8 years ago

On 5 Apr 2016, at 22:27, Peter Robinson wrote:

Tudor and I just discussed this. I would suggest that we design the format to be easily extensible to other lab abnormalities - say a paper about a protein biomarker and some disease. Or ISCN, glycomics, and metabolomics. Might be a lot for v1

I'm not totally following the relevance to this ticket (other than ISCN).

Just a clarifying note about versions and levels. These are in theory orthogonal. Think OWL profiles and OWL versions, or GO-vs-GO-slims and GO versions. Version updates will be about clarifying semantics, improvements not related to expressivity, etc. Should stabilize a bit after v1. Levels are more like profiles or subsets.

Having said that since we switched to JSON-schema everything is rolled into the same level. It's actually easier to make the more complete model and then think about the kinds of profiles we would derive from it. It's also likely that we won't be able to capture everything in v1, and some of the higher level stuff will appear in future versions. But just a cautionary note on equating versions with expressivity and flexibility.

Let's capture some of these requirements e.g. glycomics in separate tickets.

tudorgroza commented 8 years ago

@cmungall : Ok. Can you please have a look at the current PR I've put in?

cmungall commented 8 years ago

Thanks!

So Association was originally conceived of as an association between a thing like a person, disease, variant and an ontological description of that thing, Of course it makes perfect sense to genericise this somewhat for person-variant associations, but I'll need to think to make sure that no assumptions are broken. But this can happen later.

On 5 Apr 2016, at 23:23, tudorgroza wrote:

@cmungall : Ok. Can you please have a look at the current PR I've put in?


You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/phenopackets/phenopacket-format/issues/10#issuecomment-206138355

tudorgroza commented 8 years ago

Thanks. I'll add it to PA and see what other things are missing.

julesjacobsen commented 8 years ago

I think we shoul leave out the HGVS description - it only applies to humans and we want this to be more generic than that.

Also I think we should follow the GA4GH variant schema more closely. The MME one is pretty closely aligned to this anyway. We'll only be able to capture SNPs and indels, but that's the current state of things.

We also need the ability to link out to other sources, e.g VCF files. Probably a simple uri will suffice?

cmungall commented 8 years ago

On 6 May 2016, at 3:51, Jules Jacobsen wrote:

Also I think we should follow the GA4GH variant schema more closely. The MME one is pretty closely aligned to this anyway. We'll only be able to capture SNPs and indels, but that's the current state of things.

That's fine - we will use a genotype object for other scenarios