monarch-initiative / monarch-ui

The previous version of the Monarch Initiative website
https://previous.monarchinitiative.org/
BSD 3-Clause "New" or "Revised" License
17 stars 29 forks source link

Pathway and phenotype #312

Closed pnrobinson closed 4 years ago

pnrobinson commented 4 years ago

It is a little unclear why there is a link between this pathway and 8 phenotypes

https://beta.monarchinitiative.org/pathway/REACT:R-HSA-381426#phenotype

If we want to use transitive relations (pathway->gene->disease) then this pathway is related to over 100 diseases currently which in turn have hundreds of phenotypic features, so we should either be listing zero or hundreds, but not 8.

kshefchek commented 4 years ago

This is looks to be possibly an integration bug, HPO calls OMIM:107680 a disease, we're calling it a gene and merging it with HGNC:600. The pathway inference will hop over direct gene to phenotype associations, which typically we don't have for humans and only for nonhuman data. But in this case there is a direct edge because of the gene/disease node mixup.

pnrobinson commented 4 years ago

AFAIK HPO does not list this as a disease, e.g. https://hpo.jax.org/app/browse/gene/335 where are you seeing this?

kshefchek commented 4 years ago

When I go to https://hpo.jax.org/app/browse/term/HP:0004398, in the disease column I see OMIM:107680, which goes to https://hpo.jax.org/app/browse/disease/OMIM:107680

pnrobinson commented 4 years ago

That is a bug, thanks for pointing it out @iimpulse let's touch bases about this!

kshefchek commented 4 years ago

Here are 23 ids in which we have an unexpected direct gene-phenotype association, not certain this is from our HPO ingest but it might be worth checking these as well:

OMIM:610271
OMIM:141900
OMIM:109270
OMIM:300897
OMIM:141800
OMIM:107680
OMIM:177400
OMIM:182870
OMIM:159555
OMIM:187395
OMIM:114835
OMIM:600522
OMIM:400048
OMIM:147892
OMIM:138300
OMIM:142000
OMIM:152200
OMIM:151430
OMIM:116790
OMIM:124060
OMIM:132810
OMIM:168820
OMIM:173470
pnrobinson commented 4 years ago

There appears to be a bug on the HPO website as well that affects these items. I will try to investigate next week with MG

pnrobinson commented 4 years ago

See https://github.com/TheJacksonLaboratory/hpo-web/issues/117

kshefchek commented 4 years ago

great! I added a qc check for this as well so I can let you know if it pops up again after it's fixed.

pnrobinson commented 4 years ago

@kshefchek Thanks for picking this up. It seems to be related to some changes upstream, with phenotypes from '+' entries migrating to '#' entries. I have started to correct this and it should be taken care of by next week.

https://github.com/monarch-initiative/hpo-annotation-data/issues/421

Of note for the MonarchUI -- it does not seem that we are taking the OMIM '+' entries into account as diseases (the '+' entries describe genes and diseases simultaneously), and these are actually valid (although often difficult) entries. Also, there were two entries that did not seem to be erroneous in the small files, please check

kshefchek commented 4 years ago

This is good to know and I wonder if dipper is affected by this change as well (@TomConlin)

Looks like I made a mistake on two of them: OMIM:300897 should have been ORPHA:85283 OMIM:400048 should have been OMIM:400003

It's been a while since I've looked at OMIM types, but glancing at it I think you're correct, we treat + entries more like genes than diseases

pnrobinson commented 4 years ago

I have fixed everything that could be easily fixed. There are three entries that might need to go onto our omit list, but they are borderline cases that I will make issues for so people can chime in. The new version of phenotype.hpoa is being made as we speak and should have these corrections.

TomConlin commented 4 years ago

https://github.com/monarch-initiative/dipper/blob/master/dipper/sources/OMIMSource.py#L194

Plus (+) becomes typed as 'has_affected_feature' "GENO:0000418"

if it should be something else, please let me know what that is.

octothorp (#) is typed as Phenotype

https://github.com/monarch-initiative/dipper/blob/master/dipper/sources/OMIMSource.py#L189

We do maintain the list of obsolete/previous omim-numbers so if they split we can return both new types

kshefchek commented 4 years ago

I think we type these as genes, see https://github.com/monarch-initiative/dipper/pull/725#pullrequestreview-232806448

typing them as 'has_affected_feature' wouldn't make sense because this is a predicate/object property.

TomConlin commented 4 years ago

Will try again:

Which term for a "type" which is not already taken by another conditional would be a better that the term 'has_affected_feature' please note it should not be 'gene' (here) as that is taken by the more exact designation asterisk (*) and conflating the two is a choice to be made when used down stream from here.

Recall these type designations are shared in half a dozen ingests which might differ in the flavor of how the designation is interpreted.

kshefchek commented 4 years ago

as OMIM describes it it's a union type of phenotype (disease in monarch land) and a gene: "A plus sign (+) before an entry number indicates that the entry contains the description of a gene of known sequence and a phenotype. "

RDF supports this, we just type it as both. However, this gets trickier from an application perspective where we do not want an identifier to represent both a gene and a disease. In the sample we picked these looked more like genes so we type them as genes in our RDF model iirc.

typing them in the code as 'has_affected_feature' is fine, and I can't think of anything better. As long as this typing doesn't make it to the RDF model - which I'm pretty certain it isn't.

TomConlin commented 4 years ago

I never liked the term as it smacks of being weasel wordy. but is must have seemed the least confusing of the preexisting usages.

kshefchek commented 4 years ago

we could make up a new label in the global translation, such as globaltt['omim_phenotype_and_gene'] that resolves to the SO term for a gene.

TomConlin commented 4 years ago

No.

Made up terms which do not resolve somewhere official along with their description, structure and additional information are an abomination.

Get someone to put it in OLS/Ontobee and we can talk about it.

kshefchek commented 4 years ago

How about a made up label that resolves to a real term, localtt[‘omim_phenotype_and_gene’]: ‘gene’

TomConlin commented 4 years ago

That at least is legit. especially if another source had

localtt[‘omim_phenotype_and_gene’]: ‘phenotype’

kshefchek commented 4 years ago

We still have some of these discrepancies as of the July 2020 dataset, the issue being the monarch considers OMIM plus sign identifiers as genes, and HPO considers these diseases (OMIM:151430, OMIM:109270, and a handful more).

However, the original issue being that there is a phenotype annotated to https://beta.monarchinitiative.org/pathway/REACT:R-HSA-381426 has been fixed, so I'm going close this