monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

add pathogenecity models #149

Closed nlwashington closed 6 years ago

nlwashington commented 9 years ago

We need a new class of annotations which allows us to add a quality to variants such as pathogenic/benign, etc., together with the evidence for the call.

We should consider using the 5-level pathogenicity levels such as the IARC scale. Here's some related reading: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3075918/ http://www.ncbi.nlm.nih.gov/pubmed/18951436 and a more recent overview of the ACMG guidelines: http://www.ncbi.nlm.nih.gov/pubmed/25741868

(shall these qualities get added to PATO? or integrated into a different ontology like SO? (SO already has pathogenic/benign qualities) @cmungall what should be the relationship between the variant and it's pathogenecity call. a simple RO:has_quality ? or something new and more specific like has_pathogenicity? we should be able to indicate what the variant is pathogenic for (as in what phenotype or disease). how might that be linked? should the pathogenicity call be an attribute of a variant-disease association instead?

So, the point here is to design how to make the association...either add new attributes to the existing variant-phenotype OBAN association, or make a new kind that is specific for pathogenicity. Ideally, the evidence for the association might include papers, and allow for the possibility of case reports (if in a protected PHI system).

@mbrush and @pnrobinson add your thoughts.

cmungall commented 9 years ago
  1. datatypeProperty with range xsd:int 1<=x<=5 (expressible in owl)
  2. datatypeProperty for probability or percentage pathogenicity (or log-odds, ...)
  3. bin as classes, optionally with grouping classes

These aren't mutually exclusive, we can infer between these to some extent.

Between 1 and 3 I think I favor 3, although 1 has some advantages (sorting, speed of processing by range).

I don't have strong opinions on the ontology. SO would be fine. Currently all they have is

  is_a SO:0001769 ! variant_phenotype *** 
    is_a SO:0001770 ! benign_variant
    is_a SO:0001771 ! disease_associated_variant
    is_a SO:0001772 ! disease_causing_variant
    is_a SO:0001774 ! quantitative_variant

Are the categories human specific? HP might actually be OK.

Ideally the 5 categories would be a DisjointUnion in OWL but just having IDs is the main requirement for now

mbrush commented 9 years ago

Looks like the SO classes are sequence_attributes - so if we use this approach we would relate variants to pathogenicity attribute classes with an object property assertion.

e.g. <:variant1 :p :non-pathogenic>

where :p could be something general (has_attribute / has_quality / bearer_of) or something more specific (has_pathogenicity).

@cmungall is this preferable to an approach whereby we type variants as instances of pathogenic variant classes, e.g. <:variant1 rdf:type 'non-pathogenic variant'>?

mbrush commented 8 years ago

Consider that the pathogenicity of a variant applies only in reference to a disease or condition. A given variant might be asserted to be pathogenic for disease1 and benign for disease2, and thus have multiple, different pathogenic calls if we use any of the proposals above (typing it as a 'pathogenic variant', assigning some 'pathogenic' attribute directly to the variant, or linking it to a numeric scale of pathogenicity). These approaches make some blanket statement that a variant is pathogenic for at least one disease, but do not specify the relevant disease. Additional modeling is needed to describe the pathogenic in relation to a specific disease. Three alternate/preferred approaches are presented below:

1 Encode pathogenicity in the relationship linking the variant to the disease

 :variant1    is_pathogenic_for/is_benign_for    :disease1

2 Use a single generic relation between the variant and the disease (e.g. has_condition) and then hang some qualifier from the reified association that specifies a pathogenicity call

:association1     has_subject        :variant1
:association1     has_object         :disease1 
:association1     has_predicate      :has_condition
:association1     has_qualifier      [pathogenic / likely pathogenic / likely benign / benign].

3 To refine things further in light of our evolving provenance/evidence model, a pathogenicity qualification may more appropriately hang from a line of evidence for the association - as a given association may have numerous lines of evidence that make different pathogencity calls for a variant-disease association.

:association1     has_subject         :variant1
:association1     has_object          :disease1 
:association1     has_predicate       :has_condition
:association1     has_evidence        :evidence1
:evidence1        supports_call_of    [pathogenic / likely pathogenic / likely benign / benign].

This last approach has the drawback of separating the variant from the its pathogenicity (i.e. need to traverse the association and evidence nodes to find its pathogenicity). But it has the benefits of allowing us to: (1) represent the different and not always compatible scales/systems of pathogenicity assignments that are used by different sources (because we can create qualifier terms for each of them); and (2) represent in a clearer way scenarios where different sources make different pathogenicity calls of a variant for a given disease (because they disagree about its pathogenicity, or use different scale/system to describe the pathogenicity)

kshefchek commented 6 years ago

added with https://github.com/monarch-initiative/GENO-ontology/commit/fe1f65935df5521ea1c591fb4ca65c6e0086d57d

https://github.com/monarch-initiative/SEPIO-ontology/wiki/The-ClinGen-ACMG-Variant-Interpretation-Profile