phenopackets / phenopacket-schema

Repository for the GA4GH phenopacket schema
https://phenopacket-schema.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
78 stars 30 forks source link

Rare disease use case: proposed new features (analysis results) #160

Closed lmatalonga closed 5 years ago

lmatalonga commented 5 years ago

After moving to v0.4 and exporting phenotypic data from Rare Disease patients (Solve-RD project context), these are the fields, mostly analytical results related to molecular diagnosis, that would be useful for our use case to have included in the schema:

MolecularDiagnosis message:

pnrobinson commented 5 years ago

Difficult to put variant-intepretation here, because there might be multiple variants. Global mode of inheritance ??

message interpretation {
  boolean resolved =1;
  Disease diagnosis = 2;
  gene status enum UNKNOWN,CAUSATIVE,REJECTED,CANDIDATE, =3
   };
julesjacobsen commented 5 years ago

Or this:

message Interpretation {
  boolean resolved =1;
  Disease diagnosis = 2;
  repeated GeneInterpretation geneInterpretations = 3;
}

message GeneInterpretation {
  Gene gene = 1;
  enum Status {
    UNKNOWN = 0;
    REJECTED = 1;
    CANDIDATE = 3;
    CAUSATIVE = 4;
  }
}

Variant level information ought to be part of the GeneInterpretation but we might want to wait for the GKS stuff?

pnrobinson commented 5 years ago
message Interpretation {
  boolean resolved =1;
  Disease diagnosis = 2;
  repeated InterpretationElement statuses = 3;
   }
message InterpretationElement {
  oneof (Gene, Variant) = 1; // or other 
  enum Status {
    UNKNOWN,CAUSATIVE,REJECTED,CANDIDATE, =3
  }
}
julesjacobsen commented 5 years ago
message Interpretation {
  boolean resolved =1;
  Disease diagnosis = 2;
  repeated GeneInterpretation geneInterpretations = 3;
}

message GeneInterpretation {
  enum Status {
    UNKNOWN = 0;
    REJECTED = 1;
    CANDIDATE = 3;
    CAUSATIVE = 4;
  }
 oneof interpretation {
        Gene gene = 1;
        Variant variant = 2;
  }
  Status status = 3;
}

As @pnrobinson pointed out this makes the Phenopacket.diseases, Phenopacket.genes nad Phenopacket.variants somewhat ambiguous/redundant:

https://github.com/phenopackets/phenopacket-schema/blob/1046be9d41eb54745757657fb00653c7c6c8227c/src/main/proto/phenopackets.proto#L28-L39

julesjacobsen commented 5 years ago

Thinking about this a bit more I think adding this to the Phenopacket directly is a mistake as it makes things confusing and it would be much better to store/ transmit this information in a separate message, possibly wrapping or referring to the phenopacket in some way, either inline or by id. Like this for example:

message Interpretation {
  boolean resolved = 1;
  oneof phenopacketOrFamily {
    Phenopacket phanopacket = 2;
    Family family = 3;
  }
  repeated Diagnosis diagnosis = 4;
}

message Diagnosis {
  Disease disease = 1;
  repeated GenomicInterpretation genomicInterpretations= 2;
}

message GenomicInterpretation {
  enum Status {
    UNKNOWN = 0;
    REJECTED = 1;
    CANDIDATE = 3;
    CAUSATIVE = 4;
  }
 oneof interpretation {
        Gene gene = 1;
        Variant variant = 2;
  }
  Status status = 3;
}
julesjacobsen commented 5 years ago

Output for a populated interpretation object, using the case report from Exome sequencing identifies the cause of a mendelian disorder. looks like this:

{
  "id": "SOLVERD:0000012456",
  "resolutionStatus": "SOLVED",
  "phenopacket": {
    "id": "kindred 1A",
    "subject": {
      "id": "kindred 1A",
      "dateOfBirth": "1998-01-01T00:00:00Z",
      "sex": "MALE"
    },
    "phenotypicFeatures": [{
      "type": {
        "id": "HP:0001159",
        "label": "Syndactyly"
      },
      "classOfOnset": {
        "id": "HP:0003577",
        "label": "Congenital onset"
      }
    }, {
      "type": {
        "id": "HP:0002090",
        "label": "Pneumonia"
      },
      "classOfOnset": {
        "id": "HP:0011463",
        "label": "Childhood onset"
      }
    }, {
      "type": {
        "id": "HP:0000028",
        "label": "Cryptorchidism"
      },
      "classOfOnset": {
        "id": "HP:0003577",
        "label": "Congenital onset"
      }
    }, {
      "type": {
        "id": "HP:0011109",
        "label": "Chronic sinusitis"
      },
      "severity": {
        "id": "HP:0012828",
        "label": "Severe"
      },
      "classOfOnset": {
        "id": "HP:0003581",
        "label": "Adult onset"
      }
    }],
    "variants": [{
      "hgvsAllele": {
        "hgvs": "NM_001361.4:c.403C\u003eT"
      },
      "zygosity": {
        "id": "GENO:0000135",
        "label": "heterozygous"
      }
    }, {
      "hgvsAllele": {
        "hgvs": "NM_001361.4:c.454G\u003eA"
      },
      "zygosity": {
        "id": "GENO:0000135",
        "label": "heterozygous"
      }
    }, {
      "hgvsAllele": {
        "hgvs": "NM_001369.2:c.12599dupA"
      },
      "zygosity": {
        "id": "GENO:0000136",
        "label": "homozygous"
      }
    }]
  },
  "diagnosis": [{
    "disease": {
      "term": {
        "id": "OMIM:263750",
        "label": "Miller syndrome"
      }
    },
    "genomicInterpretations": [{
      "status": "CAUSATIVE",
      "gene": {
        "id": "HGNC:2867",
        "symbol": "DHODH"
      }
    }]
  }]
}
julesjacobsen commented 5 years ago

@lmatalonga please note that this is not in the release candidate yet as we need your feedback first!

pnrobinson commented 5 years ago

@lmatalonga Please see this new element https://phenopackets-schema.readthedocs.io/en/latest/interpretation.html that we have added to the release candidate.

lmatalonga commented 5 years ago

Dear @julesjacobsen and @pnrobinson,

Thank you very much, this will be very useful for our use case, my only concern / question is about listing the status of the variant as UNKNOWN = 0; REJECTED = 1; CANDIDATE = 3; CAUSATIVE = 4; instead of ACMG variant interpretation guidelines (pathogenic, likely pathogenic, VUS, etc).

The information concerning causativeness regarding the primary disorder is included at the gene level, if we keep the same structure for variants it might be at some point redundant and we would be loosing specific information at the variant level (I can think on several examples, e.g. it could be interesting to know in a rejected gene that there is a pathogenic variant or in a candidate gene for an AR disorder that there is a pathogenic variant and two VUS- not knowing which one is the second "causative", for reporting incidental findings, etc.).

I understand that this is a first implementation and that maybe this kind of granularity is not meant to be included in this phenopacket schema, but I do think that at some point it would be necessary to include ACMG variant significance at the variant level - this is the first information we gather from variant interpretation and in my opinion thinking in a broader personalised medicine vision, this type of information might be necessary in any individual phenotypic description.

Hope this feedback helps! Thanks again. Best, Leslie

pnrobinson commented 5 years ago

Hi Leslie, the ACMG categories refer to general knowledge. The interpretation refers to a specific case. There may be reasons why these two differ

  1. e.g., a het variant associated with a recessive disease can be ACMG pathogenic but not causative in a given patient
  2. For a specific case, one might interpret a variant to be causative, even though it might not be ACMG pathogenic according to ClinGen. So it is on purpose that we are not using the same vocabulary. The GA4GH VA team is working on knowledge standards for variants that in essence are representations of ACMG (and additional aspects).