Closed realmarcin closed 5 years ago
discussed via gitter, but to answer here
Monarch had a major data release on August 29th. This included an update that removed many gene to phenotype associations that are inferred from variant/gene -> disease -> phenotype associations. The production database only includes disease to phenotype associations that have an obligate qualifier from the HPO annotations. I've adjusted the beta inferences to also the "very frequent" qualifier.
The more experimental join can still be done in biolink via multiple queries, eg
Do you have any more context for the decision here?
I think restricting the propagation to causal genes makes sense
However, I don't get the reason to restrict the propagation to Obligates.
UPDATE: this may be less of a problem than I thought, see below, I thought we had lost all annotations for key genes. However, it would still be good to know the justification and have our rules clearly documented. Why are we doing something different from the HPO site? https://hpo.jax.org/app/browse/gene/55215
@realmarcin, I am not sure I follow. You say:
The list of genes with this issue (only phenotype annotation returned are EFO terms) include: ... FANCI ...
However, I get many annotations when I query the monarch API for FANCI:
I also see that we have phenotypes for FANCI here:
https://beta.monarchinitiative.org/gene/HGNC:25568#phenotype
Therefore there must be something wrong with the workflow, as the API is giving phenotypes.
FANCI was not our example, so looks like this issue is not affecting all genes.
Here is the EPM2A example of diff example between beta and previous release -- 5 vs 46 phenotypes:
https://beta.monarchinitiative.org/gene/HGNC:3413#phenotype https://monarchinitiative.org/gene/HGNC:3413#phenotypes
@realmarcin this should be fixed by tomorrow morning. Apologies all -- it was requested that we remove variant to phenotype inferences. I was thinking in terms of graph distance, variants are closer to phenotypes than genes, so I incorrectly thought this meant we should tighten the gene to phenotype inferences as well. @TomConlin and @julesjacobsen pointed out why this is flawed and it all makes sense now.
We are trying to debug an issue with our Translator modules which rely on Monarch data and ontobio semantic similarity.
Here are some notes from Colleen Xu:
Between Monday (8/26) and Thursday (8/29), there seems to have been a change to the phenotype annotation. Specifically, EFO terms are being counted as the only phenotype annotation for some genes in our queries and leading to odd results.
Queries for genes with shared phenotype annotations (context: Fanconi anemia) are returning associations like…
LINC00471 has a jaccard similarity score of 1 with FANCI. They both have EFO:0004339 (“body height” https://www.ebi.ac.uk/ols/ontologies/efo/terms?iri=http%3A%2F%2Fwww.ebi.ac.uk%2Fefo%2FEFO_0004339). This is odd because FANCI has many other phenotype annotations in Monarch (that aren’t shared by the LINC gene), so the score shouldn’t be one (https://monarchinitiative.org/gene/HGNC:25568#phenotypes).
ADIPOR1 has a jaccard similarity score of 1 with XRCC2. They both have EFO:0004584 (“mean platelet volume” https://www.ebi.ac.uk/ols/ontologies/efo/terms?iri=http%3A%2F%2Fwww.ebi.ac.uk%2Fefo%2FEFO_0004584).
The list of genes with this issue (only phenotype annotation returned are EFO terms) include: BRIP1 ERCC4 FANCB FANCC FANCD2 FANCE FANCI FANCL MAD2L2 RFWD3 SLX4 XRCC2
So it looks like something may have changed in Monarch phenotype data for these genes this week? I can't figure out if this is change is sensible or somehow these EFO terms shouldn't be considered a phenotype.
Regardless, one solution for us for now may be to simply filter the EFO terms out.
The other corollary is that biolink API seems to be already using the beta Monarch -- I assume there is no way to switch that via parameter settings.