monarch-initiative / GENO-ontology

Repository for representing genotypes and their association with phenotypes
18 stars 6 forks source link

Representing uniparental disomy #43

Open mbrush opened 5 years ago

mbrush commented 5 years ago

The notion of uniprental disomy can be framed in different ways, to describe different aspects of biology:

We need to decide which of these perspectives is relevant to capture in GENO, and define/re-use appropriately.

pnrobinson commented 5 years ago

The particular types of uniparental disomy are diseases, and the top level term is just for grouping. There is no general type of uniparental disomy disease.

mbrush commented 4 years ago

To support the GA4GH Variant Annotation and ClinVar use case where 'inherited', 'uniparental', and 'biparental' are values used to represent allele origin, I am proposing to add terms reflecting this perspective to the GENO 'allele origin' hierarchy as follows.

_Note: please read the definitions and comments on all allele origin classes carefully before evaluating this proposal, to understand the distinctions we use to classify and define these concepts in the ontology. They can be found here._


Proposed definitions for these new concepts:

Proposed classification:

allele origin
    de novo allele origin
    somatic allele origin    
    inherited allele origin
        germline allele origin
            maternal allele origin
            paternal allele origin
        uniparental allele origin
        biparental allele origin

One area to explore further is how to classify these new concepts to germline allele origin - currently germline is a child of inherited, but germline is not a parent of uni- or bi- parental. We should consider if/how this is different than 'germline allele origin'. One scenario that potentially distinguishes them is the case where a de novo mutation occurs in the germ cells of a parent, and is passed to offspring. This does not qualify as 'germline allele origin', as currently defined. But it would qualify as 'inherited'


Another way to classify these is to keep the 'inherited allele origin' hierarchy completely separate - and treat as a different axis of classification altogether.

allele origin
    de novo allele origin
    somatic allele origin    
    germline allele origin
        maternal allele origin
        paternal allele origin
    inherited allele origin
      uniparental allele origin
      biparental allele origin

Again, please review definitions and comments on all of these classes in the GENO ontology before commenting.

cbizon commented 4 years ago

I have read the definitions, but I don't see any difference between germline and inherited allele origins.

I also think you have a spurious tab at the end there making biparental a child of uniparental.

I think I would do this:

allele origin
    de novo allele origin
    somatic allele origin    
    germline allele origin
        maternal allele origin
        paternal allele origin
        uniparental allele origin
        biparental allele origin

You could even get rid of biparental in favor of saying both maternal and paternal origin.

I think what makes this hard is that the uniparental / biparental distinction is really a statement about the genotype (or allelic complement) instead of the allele.

mbrush commented 4 years ago

Thanks Chris. I fixed that spurious tab. As for your other points:

  1. Good thought about uni/bi parental concepts really being about the allelic complement. The definition in the ontology tries to get around this in the way it defines these concepts:

"Describes an allele that is part of an allelic complement where one allele is maternally inherited and other paternally inherited".

Does the way the definition is crafted address your concern here, such that these terms can remain in the allele origin hierarchy? Or would you prefer if these two classes were captured in a separate 'allelic complement origin' hierarchy next to allele origin? Curious what others think here.

  1. As for inherited vs germline, I agree the definitions right now make them sound essentially the same. The editor note in the ontology expresses the same concern, but suggests that:

"One scenario that potentially distinguishes them is the case where a de novo mutation occurs in the germ cells of a parent, and is passed to offspring. This does not qualify as 'germline allele origin', as currently defined (as it is not truly present in the germline of the parent). But it would qualify as 'inherited'"

Curious if you would agree that this sufficiently warrants the distinction?

FYI the need for 'inherited' came from ClinVar's value set for allele origin, which includes 'germline' and inherited'. I didn't see any definitions in their documentation that explain the difference. Maybe @larrybabb can find out, as he originally requested these terms to support the ClinVar data?

cbizon commented 4 years ago

1 . It's really the definition that makes me think that the uniparental stuff is in the wrong place, because it's pointing at an entity other than the subject of this relation. That said, I don't know that it's worth creating allele complement origin. I mean, I think from a logical / cleanliness point of view it's nicer and more accurate, but it might be splitting hairs so finely that it becomes hard to use.

  1. I don't think that the note justifies the distinction. If I understand it correctly, there would be no way experimentally to distinguish between these cases. So while it might be logically consistent, it seems kind of useless. But maybe I'm off on this... Interested to hear how ClinVar distinguishes...
larrybabb commented 4 years ago

Here's the official documentation from ClinVar. ClinVar Submission Instructions : Allele Origin

We can refine and perfect definitions - which is certainly important. But we should keep our eye on getting the big picture completed "a 'seed' value set that can be extended". I would much prefer to drop any controversial values versus spending valuable time in the weeds. I'm in favor of to moving past these issues more quickly considering the amount of work to do to get the VA efforts out to the public in a pilot-able form. I'm also very much against defining new terms or ontologies that may or may not be adoptable by the community. Let's point to the fewest reasonable set of terms that are out in the community and move forward. Provide guidance on mapping and extending. We will NEVER get all the value set terms to a place where all parties agree or treat them with the same level of precision.

cbizon commented 4 years ago

So what's your proposition about what to add @larrybabb ?

mbrush commented 4 years ago

Hi Larry. It is with my GENO hat on that am looking to improve the allele origin hierarchy in the ontology. With my GA4GH-VA hat on, I agree that we don't need to perfect things, or worry about hierarchy right now.

For VA, what I think we do need is simply to decide if we want to include all of the ClinVar terms in the list you shared. I think we do - as long as we make sure 'inherited' is clearly distinguished from 'germline'. With the link you shared, I think I have all the input I need to do this. I will make it so, unless you or Chris have additional comments/objections.

Also, any further comments relevant to the GA4GH-VA work can go in the VA-repo ticket here: https://github.com/ga4gh/va-spec/issues/64. Thanks!

larrybabb commented 4 years ago

That works. To be clear, I'm fine with a shorter list for the value set. The list doesn't have to be ClinVar's values. I was simply offering examples of what folks are sharing in practice.

mbrush commented 4 years ago

Considering the discussion and use cases above, I am proposing the following GENO 'allele origin' hierarchy. It includes a few new terms to support ClinVar and ClinGen use cases.

allele origin
    somatic allele origin        
    germline allele origin
        de novo germline allele origin
        inherited germline allele origin
            maternal allele origin
            paternal allele origin
            uniparental allele origin
            biparental allele origin
    unspecified allele origin
    unknown allele origin

The text below outlines definitions and distinctions in more detail, in support of the updated allele origin hierarchy shown above.


Foundational Concepts

In GENO, allele origin terms describe how a particular allele came to be in the genome of a cell or organism. We distinguish different categories of allele origin such as germline, somatic, and de novo, based on if and how it was inherited from a parent. Two key concepts to understand for distinguishing these terms are the notion of being inherited from a parent, and that of being heritable by offspring.

As defined here, there are two criteria for being inherited: (1) that the allele was passed down to the proband via germ line transmission, adn (2) that the allele was constitutional in that parent (i.e. present in every cell of the body, including germ cells).

Being heritable means that the allele can be passed down to future offspring of the proband, in virtue of its being present in the germ line.

We will see why these nuances and distinctions are important in the definitions of different allele origin types below (e.g. for 'de novo allele' in particular)

Proposed Classes

Based on these foundational definitions, we can define the following types of alleles:

We also define two flavors of allele origin that is not specified. While these are strictly not valid ontological concepts, in practice they are useful as terms used in value sets to support standard and unambiguous data capture. So we provide them for this purpose.

Finally, we include in the allele origin hierarchy terms for uniparental and biparental origin.

These definitions and distinctions are the basis for the allele origin hierarchy as proposed above.

mbrush commented 4 years ago

A couple issues were raised for discussion about the proposal above:

1. Do the bi / uniparental terms really apply to the origin of a single allele - as strictly speaking, they are really about the origin of a pair of alleles at a location - the allelic complement? While this is true, I would be comfortable leaving these terms in the allele origin hierarchy for pragmatic purposes. I think grouping all these terms together mirrors how they are used in curation and data generation activities they support. And I don’t think the liberties we take here will break any reasoning or analysis use cases. If it does, we can always move these later. If people feel this is acceptable, we can mitigate concerns in the ontology in a couple ways: (1) the definitions of bi / uni parental terms will be framed to be about "an allele that is part of an allelic complement where . . . "); and/or (2) we can define the root 'allele origin' term to cover alleles and allelic complements (e.g. "a quality inhering in an allele (or complement of alleles at a given locus) that describes its genetic origin - i.e. how it came to be part of a cell's genome").

2. Is the distinction between inherited and germline needed/useful? ClinVar's allele origin value set and definitions are the basis for us making this distinction. IMO the labels and definitions we provide make the distinction clear. And there is a real use case for the distinction in the ClinVar data -where there are variants annotated to both concepts. But if we do choose not to make this nuanced distinction, we would want to think about what the resulting term hierarchy would look like, and how to map ClinVar data to it.

pnrobinson commented 4 years ago

biparental allele origin sounds a little strange. In most usages, an allele is on one chromosome and so can either come from mom or dad.

larrybabb commented 4 years ago

Sharing some feedback from Heidi Rehm (this is my translation)... from HRehm../

I did have the same reaction that Peter and others did regarding uniparental and biparental but read Matt's answer and think it's fine. I do think his suggestion of adding "an allele that is part of an allelic complement where . ." to the definition would be useful.

mbrush commented 4 years ago

Of relevance - the 'Cellular origin of variant' section on p 3 of the ENIGMA article here: https://jmg.bmj.com/content/56/6/347.full