Open mbrush opened 5 years ago
The particular types of uniparental disomy are diseases, and the top level term is just for grouping. There is no general type of uniparental disomy disease.
To support the GA4GH Variant Annotation and ClinVar use case where 'inherited', 'uniparental', and 'biparental' are values used to represent allele origin, I am proposing to add terms reflecting this perspective to the GENO 'allele origin' hierarchy as follows.
_Note: please read the definitions and comments on all allele origin classes carefully before evaluating this proposal, to understand the distinctions we use to classify and define these concepts in the ontology. They can be found here._
Proposed definitions for these new concepts:
Proposed classification:
allele origin
de novo allele origin
somatic allele origin
inherited allele origin
germline allele origin
maternal allele origin
paternal allele origin
uniparental allele origin
biparental allele origin
One area to explore further is how to classify these new concepts to germline allele origin - currently germline is a child of inherited, but germline is not a parent of uni- or bi- parental. We should consider if/how this is different than 'germline allele origin'. One scenario that potentially distinguishes them is the case where a de novo mutation occurs in the germ cells of a parent, and is passed to offspring. This does not qualify as 'germline allele origin', as currently defined. But it would qualify as 'inherited'
Another way to classify these is to keep the 'inherited allele origin' hierarchy completely separate - and treat as a different axis of classification altogether.
allele origin
de novo allele origin
somatic allele origin
germline allele origin
maternal allele origin
paternal allele origin
inherited allele origin
uniparental allele origin
biparental allele origin
Again, please review definitions and comments on all of these classes in the GENO ontology before commenting.
I have read the definitions, but I don't see any difference between germline and inherited allele origins.
I also think you have a spurious tab at the end there making biparental a child of uniparental.
I think I would do this:
allele origin
de novo allele origin
somatic allele origin
germline allele origin
maternal allele origin
paternal allele origin
uniparental allele origin
biparental allele origin
You could even get rid of biparental in favor of saying both maternal and paternal origin.
I think what makes this hard is that the uniparental / biparental distinction is really a statement about the genotype (or allelic complement) instead of the allele.
Thanks Chris. I fixed that spurious tab. As for your other points:
"Describes an allele that is part of an allelic complement where one allele is maternally inherited and other paternally inherited".
Does the way the definition is crafted address your concern here, such that these terms can remain in the allele origin hierarchy? Or would you prefer if these two classes were captured in a separate 'allelic complement origin' hierarchy next to allele origin? Curious what others think here.
"One scenario that potentially distinguishes them is the case where a de novo mutation occurs in the germ cells of a parent, and is passed to offspring. This does not qualify as 'germline allele origin', as currently defined (as it is not truly present in the germline of the parent). But it would qualify as 'inherited'"
Curious if you would agree that this sufficiently warrants the distinction?
FYI the need for 'inherited' came from ClinVar's value set for allele origin, which includes 'germline' and inherited'. I didn't see any definitions in their documentation that explain the difference. Maybe @larrybabb can find out, as he originally requested these terms to support the ClinVar data?
1 . It's really the definition that makes me think that the uniparental stuff is in the wrong place, because it's pointing at an entity other than the subject of this relation. That said, I don't know that it's worth creating allele complement origin. I mean, I think from a logical / cleanliness point of view it's nicer and more accurate, but it might be splitting hairs so finely that it becomes hard to use.
Here's the official documentation from ClinVar. ClinVar Submission Instructions : Allele Origin
We can refine and perfect definitions - which is certainly important. But we should keep our eye on getting the big picture completed "a 'seed' value set that can be extended". I would much prefer to drop any controversial values versus spending valuable time in the weeds. I'm in favor of to moving past these issues more quickly considering the amount of work to do to get the VA efforts out to the public in a pilot-able form. I'm also very much against defining new terms or ontologies that may or may not be adoptable by the community. Let's point to the fewest reasonable set of terms that are out in the community and move forward. Provide guidance on mapping and extending. We will NEVER get all the value set terms to a place where all parties agree or treat them with the same level of precision.
So what's your proposition about what to add @larrybabb ?
Hi Larry. It is with my GENO hat on that am looking to improve the allele origin hierarchy in the ontology. With my GA4GH-VA hat on, I agree that we don't need to perfect things, or worry about hierarchy right now.
For VA, what I think we do need is simply to decide if we want to include all of the ClinVar terms in the list you shared. I think we do - as long as we make sure 'inherited' is clearly distinguished from 'germline'. With the link you shared, I think I have all the input I need to do this. I will make it so, unless you or Chris have additional comments/objections.
Also, any further comments relevant to the GA4GH-VA work can go in the VA-repo ticket here: https://github.com/ga4gh/va-spec/issues/64. Thanks!
That works. To be clear, I'm fine with a shorter list for the value set. The list doesn't have to be ClinVar's values. I was simply offering examples of what folks are sharing in practice.
Considering the discussion and use cases above, I am proposing the following GENO 'allele origin' hierarchy. It includes a few new terms to support ClinVar and ClinGen use cases.
allele origin
somatic allele origin
germline allele origin
de novo germline allele origin
inherited germline allele origin
maternal allele origin
paternal allele origin
uniparental allele origin
biparental allele origin
unspecified allele origin
unknown allele origin
The text below outlines definitions and distinctions in more detail, in support of the updated allele origin hierarchy shown above.
In GENO, allele origin terms describe how a particular allele came to be in the genome of a cell or organism. We distinguish different categories of allele origin such as germline, somatic, and de novo, based on if and how it was inherited from a parent. Two key concepts to understand for distinguishing these terms are the notion of being inherited from a parent, and that of being heritable by offspring.
As defined here, there are two criteria for being inherited: (1) that the allele was passed down to the proband via germ line transmission, adn (2) that the allele was constitutional in that parent (i.e. present in every cell of the body, including germ cells).
Being heritable means that the allele can be passed down to future offspring of the proband, in virtue of its being present in the germ line.
We will see why these nuances and distinctions are important in the definitions of different allele origin types below (e.g. for 'de novo allele' in particular)
Based on these foundational definitions, we can define the following types of alleles:
Germline alleles are necessarily heritable, and typically but not necessarily inherited.
Inherited germline alleles are the typical case of being both inherited and heritable - as they are passed down from a parent in whom they are present constitutionally, and passable down to offspring. Inherited germline alleles can be maternal alleles or paternal alleles depending on which parent they are inherited form.
De novo germline alleles are the less common exception - being heritable but not inherited. They arise through a spontaneous mutation in a germ cell of a parent, or in the fertilized egg itself during early embryogenesis. Accordingly, they are heritable - as they can be passed to offspring in virtue of their being present in the individual's germ cells, but not inherited - as they are not present constitutionally in either parent.
Somatic alleles are neither inherited or heritable - having originated via a spontaneous mutation in a non-germ cell of the proband after fertilization. They are not constitutional (present in every cell in the body), and not passed on to offspring.
We also define two flavors of allele origin that is not specified. While these are strictly not valid ontological concepts, in practice they are useful as terms used in value sets to support standard and unambiguous data capture. So we provide them for this purpose.
Finally, we include in the allele origin hierarchy terms for uniparental and biparental origin.
These definitions and distinctions are the basis for the allele origin hierarchy as proposed above.
A couple issues were raised for discussion about the proposal above:
1. Do the bi / uniparental terms really apply to the origin of a single allele - as strictly speaking, they are really about the origin of a pair of alleles at a location - the allelic complement? While this is true, I would be comfortable leaving these terms in the allele origin hierarchy for pragmatic purposes. I think grouping all these terms together mirrors how they are used in curation and data generation activities they support. And I don’t think the liberties we take here will break any reasoning or analysis use cases. If it does, we can always move these later. If people feel this is acceptable, we can mitigate concerns in the ontology in a couple ways: (1) the definitions of bi / uni parental terms will be framed to be about "an allele that is part of an allelic complement where . . . "); and/or (2) we can define the root 'allele origin' term to cover alleles and allelic complements (e.g. "a quality inhering in an allele (or complement of alleles at a given locus) that describes its genetic origin - i.e. how it came to be part of a cell's genome").
2. Is the distinction between inherited and germline needed/useful? ClinVar's allele origin value set and definitions are the basis for us making this distinction. IMO the labels and definitions we provide make the distinction clear. And there is a real use case for the distinction in the ClinVar data -where there are variants annotated to both concepts. But if we do choose not to make this nuanced distinction, we would want to think about what the resulting term hierarchy would look like, and how to map ClinVar data to it.
biparental allele origin sounds a little strange. In most usages, an allele is on one chromosome and so can either come from mom or dad.
Sharing some feedback from Heidi Rehm (this is my translation)... from HRehm../
I did have the same reaction that Peter and others did regarding uniparental and biparental but read Matt's answer and think it's fine. I do think his suggestion of adding "an allele that is part of an allelic complement where . ." to the definition would be useful.
Of relevance - the 'Cellular origin of variant' section on p 3 of the ENIGMA article here: https://jmg.bmj.com/content/56/6/347.full
The notion of uniprental disomy can be framed in different ways, to describe different aspects of biology:
We need to decide which of these perspectives is relevant to capture in GENO, and define/re-use appropriately.