monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

RO_0002600 ('causes disease') vs. RO_0002200 ('has phenotype') for G2D associations #195

Closed mbrush closed 6 years ago

mbrush commented 9 years ago

In several data sources (ClinVar, OMIM, Orphanet), we use the RO_0002200 'has phenotype' relation for linking variants to diseases. This doesn’t seem right, and is not in line with the definition of 'has phenotype'. Since it is our practice is to use 'has phenotype' in this data only when a variant causes a disease, can we switch to use RO_0002600 ('capable of upregulating or causing pathological process')? Seems a fit, given its alternative term 'causes disease'.

One potential issue here is the fact that this relation is under the 'causal relation between material entity and a process' branch of RO, which of course holds properties relating material entities and processes. A variant in our modeling is not a material entity, but a generically dependent continuant that is materialized in material DNA molecules (where 'materialized in' is shorthand for 'is concretized as' o 'inheres in'). This raises the more general question of whether we can use properties like this that are defined for material entity subjects to describe both (1) the relationship between material genetic variants and a disease, and (2) the relationship between a GDC sequence that is materialized in such genetic material and a disease. It would obviously be nice to have this flexibility so as to leverage existing formal relationships, and not have to create duplicative ones where needed to apply at the GDC level. Posted a more detailed ticket in the RO tracker here on this topic.

@cmungall and @mellybelly, care to comment?

mellybelly commented 9 years ago

Some thoughts from our discussion:

Domain: genotype, genotypic part, and/or Environment Range: disease or phenotype

correlates with causes or contributes to ----causes ----contributes_to --------contributes to severity/expressivity --------contributes to penetrance/frequency ----preventative_for


Domain: disease or breed/strain Range: phenotype use has_phenotype for Disease -> phenotype and for breed/strain genotype -> phenotype but not for G or E -> phenotype

mbrush commented 9 years ago

Need one clarification here. RO already has a 'correlated with' property (with sub-property 'is marker for') that are not specific for linking to 'conditions'. Did we decide to use these as they are for linking variants to conditions, or did we want to create new properties specifically to be used for linking to conditions, like the causes relations (e.g. 'correlates with condition' and 'marker for condition', as sub-properties of the more general existing ones)? @cmungall and @nlwashington?

nlwashington commented 9 years ago

i have been using those properties thus far for those instances where we don't have strong causal evidence between a variant and a disease/condition.

however, it feels wrong to use these to link whole genotypes (like for strains/breeds), but maybe it's okay.

nlwashington commented 9 years ago

but i think they are under a separate hierarchy than that for has_phenotype, so when doing the scigraph queries, i have to specify different parts of the RO subclasses.

mbrush commented 9 years ago

Also, we really didn't consider the relationship we would assert between a gene and a condition. This is not really one of causation - because the gene itself isn't causative, only specific variants of it. Ideally we'd like to use the same relationship between genes-conditions as we do between variants-conditions which was one of the benefits of having a very general relation like has_phenotype.

Perhaps we create a similarly generic property has_condition, and place the new properties of causation here (in addition to under the causal relationship hierarchy). Then at least the gene-condition relation will be a direct ancestor of the variant-condition causation properties.

Something like:

has condition (new property - very generic relation that can hold between a gene and condition) ---causes or contributes to condition (new property - used for variants but not genes) ------causes condition (new property) ------contributes to condition (new property) ----------contributes to severity of condition (new property) ----------contributes to frequency of condition (new property) ------preventative for condition (new property)

and this property hierarchy would also live under causally related to as originally proposed:

causally related to (exists already) ---causes or contributes to condition (new property) ------causes condition (new property) ------contributes to condition (new property) ----------contributes to severity of condition (new property) ----------contributes to frequency of condition (new property) ------preventative for condition (new property)

_Note that it is entirely possible that I am overthinking this and we can agree to go ahead use the original causation/contribution relations between genes and conditions, and not worry about the nuances outlined above._

cmungall commented 9 years ago

I think having a grouping relation is fine

mbrush commented 9 years ago

OK. I think for now I will hold off on implementing the generic/grouping relation because is it not needed immediately. Want to consider more if we really need it - i.e. perhaps we can just use 'causes or contributes to condition' to link genes to conditions, in which case we may not need this more generic relation. And for now I will also create a 'correlated with condition' sub-property of the existing 'correlated with' property, for consistency and grouping with the causes condition properties.

mbrush commented 8 years ago

The question arose this week of variants that are risk factors that increase susceptibility to a disease. Is a new property needed here ('contributes to susceptibility to condition', or 'increases susceptibility for condition'). Or can we use the existing 'contributes to frequency of condition' property here?

Condition frequency is a population-level concept, while risk factor/susceptibility is an individual-level concept - but the increased susceptibility of an individual would increase the frequency in a population. Still, i would think a new relation is needed to describe the individual-level concept whereby a variant makes an individual more likely to get a disease, but doesn't directly and deterministically cause the disease. Thoughts @mellybelly, @cmungall , @nlwashington

pnrobinson commented 8 years ago

contributes to frequency of condition definitely seems wrong, but on the other hand, we probably we never be asserting that a particular variant in a given individual is causal for a common complex disease, since we probably will never know exactly enough. i.e., I think it would be wrong to say

var 1 contributes to susceptibility in individual A var 2 contributes to susceptibility in individual A var 3 contributes to susceptibility in individual A ... var 257 contributes to susceptibility in individual A

since in the end we do not know if it was a combination of some subset of those variants and some environmental exposure and some stochasticity. Thus maybe we are just annotating these variants on a population level anyway. I do not like the relation "contributes to frequency of condition" at all, and wonder if we cannot come up with something better to say "is a risk factor for".

-Peter

Dr. med. Peter N. Robinson, MSc. Professor of Medical Genomics Professor of Bioinformatics, Freie Universität Berlin Institut für Medizinische Genetik und Humangenetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Germany +4930 450566006 Mobile: 0160 93769872 peter.robinson@charite.de http://compbio.charite.de http://www.human-phenotype-ontology.org I have learned from my mistakes, and I am sure I can repeat them exactly ORCID ID:http://orcid.org/0000-0002-0736-9199 Scopus Author ID 7403719646 Appointment request: http://doodle.com/pnrobinson


Von: mbrush [notifications@github.com] Gesendet: Donnerstag, 3. Dezember 2015 23:16 An: monarch-initiative/dipper Betreff: Re: [dipper] RO_0002600 ('causes disease') vs. RO_0002200 ('has phenotype') for G2D associations (#195)

The question arose this week of variants that are risk factors that increase susceptibility to a disease. Is a new property needed here ('contributes to susceptibility to condition', or 'increases susceptibility for condition'). Or can we use the existing 'contributes to frequency of condition' property here?

Condition frequency is a population-level concept, while risk factor/susceptibility is an individual-level concept - but the increased susceptibility of an individual would increase the frequency in a population. Still, i would think a new relation is needed to describe the individual-level concept whereby a variant makes an individual more likely to get a disease, but doesn't directly and deterministically cause the disease. Thoughts @mellybellyhttps://github.com/mellybelly, @cmungallhttps://github.com/cmungall , @nlwashingtonhttps://github.com/nlwashington

— Reply to this email directly or view it on GitHubhttps://github.com/monarch-initiative/dipper/issues/195#issuecomment-161802141.

mbrush commented 8 years ago

I agree - I am not a fan of the 'contributes to frequency of condition' relation, and would prefer replacing/redefining this as a relation about susceptibility. In my mind, the notion of contributing to susceptibility to a condition is the same as being a risk factor it, so my proposal is to rename/replace the 'contributes to frequency of condition' relation with 'contributes to susceptibility to' (and give this alternative labels 'is risk factor for condition' and 'increases susceptibility to condition'). Any variant that increases susceptibility to a condition in an individual would increase the frequency of the condition in a population, so this relation could potentially be used to describe variant-condition associations at either level.

In any case, I would like to explore some real data use cases around this new hierarchy of properties to see how it works before committing to final decisions here.

kshefchek commented 6 years ago

replaced with https://github.com/monarch-initiative/dipper/issues/254