Open jmcmurry opened 8 years ago
Figuring out where the gene is being linked to parkinson's disease would be a good starting point, and why this genotype is not linked to the disease with "is model of/RO:0003301" as this would prevent the phenotypes from being shown here. We don't (intentionally) infer phenotypes through homology.
According to our beta site where sources should be working, this is coming from CTD, pub: http://www.ncbi.nlm.nih.gov/pubmed/18353766, so this gene part of a fly model of Parkinsons. We may need to make an anonymous "model" that has the genes in this pub, which is linked to parkinsons with RO:0003301.
Neither NCBIGene:32133
nor HP:0002356
exist directly in the triples produced by my run of the dipper script last night.
Jenkins says the last successful flybase ingest was Feb 9, 2016
The triples in http://data.monarchinitiative.org/ttl/ are dated:
flybase.nt 17-Oct-2015 01:47 3057877576
flybase.ttl 25-Aug-2015 05:07 1226398584
flybase_dataset.nt 17-Oct-2015 01:34 1142
flybase_dataset.ttl 17-Oct-2015 00:40 734
flybase_test.ttl 17-Oct-2015 00:41 3594184
Unfortunately you can't easily debug by looking at the source-specific ttl files in isolation, the value we bring is in merging multiple sources of assertions. But this means we have to pay special attention to the provenance.
In this case you can see the equivalence axioms for the gene:
The flybase assertions are made about the FlyBase ID
@kshefchek is the evidence view you are building helpful here?
We need a better way of exploring scigraph-data, @jnguyenx started looking at this yesterday but we're both out today
Here is the inferred edge in golr:
we can work backwards from here to the asserted edges in scigraph-data
here is what the evidence graph json looks like:
{
"nodes": [
{
"id": "DOID:14330",
"lbl": "Parkinson's disease",
"meta": {}
},
{
"id": "PMID:3504239",
"lbl": null,
"meta": {}
},
{
"id": "PMID:18353766",
"lbl": null,
"meta": {}
},
{
"id": "FlyBase:FBgn0001218",
"lbl": "Hsc70-3",
"meta": {}
},
{
"id": "PMID:19232169",
"lbl": null,
"meta": {}
},
{
"id": "PMID:17131231",
"lbl": null,
"meta": {}
},
{
"id": "ECO:0000033",
"lbl": "traceable author statement",
"meta": {}
},
{
"id": "_:genid1902872",
"lbl": "some variant of HSC70-3 that is marker/mechanism for Parkinson Disease",
"meta": {}
},
{
"id": "PMID:9074398",
"lbl": null,
"meta": {}
},
{
"id": "PMID:2296384",
"lbl": null,
"meta": {}
},
{
"id": "ECO:0000246",
"lbl": "computational combinatorial evidence used in automatic assertion",
"meta": {}
},
{
"id": "MONARCH:3ff74c7344d8679dde438ef8db3f55f1",
"lbl": null,
"meta": {}
},
{
"id": "MONARCH:17df78989012e2402d56252855d51c7e",
"lbl": null,
"meta": {}
},
{
"id": "HP:0002356",
"lbl": "Writer's cramp",
"meta": {}
}
],
"edges": [
{
"sub": "MONARCH:3ff74c7344d8679dde438ef8db3f55f1",
"obj": "DOID:14330",
"pred": "http://purl.org/oban/association_has_object",
"meta": {}
},
{
"sub": "MONARCH:3ff74c7344d8679dde438ef8db3f55f1",
"obj": "PMID:18353766",
"pred": "http://purl.org/dc/elements/1.1/source",
"meta": {}
},
{
"sub": "MONARCH:17df78989012e2402d56252855d51c7e",
"obj": "PMID:2296384",
"pred": "http://purl.org/dc/elements/1.1/source",
"meta": {}
},
{
"sub": "MONARCH:17df78989012e2402d56252855d51c7e",
"obj": "DOID:14330",
"pred": "http://purl.org/oban/association_has_subject",
"meta": {}
},
{
"sub": "DOID:14330",
"obj": "HP:0002356",
"pred": "http://purl.obolibrary.org/obo/RO_0002200",
"meta": {}
},
{
"sub": "MONARCH:3ff74c7344d8679dde438ef8db3f55f1",
"obj": "ECO:0000033",
"pred": "http://purl.obolibrary.org/obo/RO_0002558",
"meta": {}
},
{
"sub": "MONARCH:17df78989012e2402d56252855d51c7e",
"obj": "PMID:17131231",
"pred": "http://purl.org/dc/elements/1.1/source",
"meta": {}
},
{
"sub": "_:genid1902872",
"obj": "FlyBase:FBgn0001218",
"pred": "http://purl.obolibrary.org/obo/GENO_0000408",
"meta": {}
},
{
"sub": "_:genid1902872",
"obj": "DOID:14330",
"pred": "http://purl.obolibrary.org/obo/RO_0002607",
"meta": {}
},
{
"sub": "MONARCH:17df78989012e2402d56252855d51c7e",
"obj": "PMID:19232169",
"pred": "http://purl.org/dc/elements/1.1/source",
"meta": {}
},
{
"sub": "MONARCH:17df78989012e2402d56252855d51c7e",
"obj": "HP:0002356",
"pred": "http://purl.org/oban/association_has_object",
"meta": {}
},
{
"sub": "MONARCH:17df78989012e2402d56252855d51c7e",
"obj": "PMID:3504239",
"pred": "http://purl.org/dc/elements/1.1/source",
"meta": {}
},
{
"sub": "MONARCH:3ff74c7344d8679dde438ef8db3f55f1",
"obj": "_:genid1902872",
"pred": "http://purl.org/oban/association_has_subject",
"meta": {}
},
{
"sub": "MONARCH:17df78989012e2402d56252855d51c7e",
"obj": "ECO:0000246",
"pred": "http://purl.obolibrary.org/obo/RO_0002558",
"meta": {}
},
{
"sub": "MONARCH:17df78989012e2402d56252855d51c7e",
"obj": "PMID:9074398",
"pred": "http://purl.org/dc/elements/1.1/source",
"meta": {}
}
]
}
See https://github.com/monarch-initiative/configs/issues/18 for the plan to have better provenance in the evidence graph, which is good to do in general, not just our own data debugging.
curl -s "https://scigraph-data.monarchinitiative.org/scigraph/graph/neighbors/NCBIGene:32133?depth=1&blankNodes=false&direction=BOTH&project=*"|\
jq -c '.edges[]|[.sub,.pred,.obj]'|\
sed 's/\[\"/</g;s/\"\,\"/\> \</g;s/\"\]/\> \./g'|\
rapper -i ntriples -o dot -I http://example.com - |\
dot -T png -o NCBIGene_32133.png
display NCBIGene_32133.png
Doing the same for FlyBase:FBgn0001218 returns only the edge to NCBIGene:32133 increasing the path length "&depth=2" reproduces an identical graph.
nice chaining together of apps there; but we're looking at the wrong graph. The source assertions may be >1 hop away. The evidence graph is what you should be looking at:
so we can see that the fly gene Hsc70-3 has a variant which is a marker for Parkinsons which has Writer's cramp as a phenotype.Our inference rules are clearly too liberal here
(aside: I took a path that is nearly as tortuous as yours to visualize it. This should not be necessary. We have multiple options for visualizing a bbop-graph, @kshefchek or @kltm or @DoctorBud could advise better)
In regards to the original issue, should we infer phenotypes 1..n of disease x when gene y is a marker/mechanism of disease x? This seems incorrect. I would rather have us link the genes to an anonymous fly model, or through an anonymous variant in an anonymous model and use the "is model of" relation. But I'm not sure if this can be done for all CTD data where a gene from a model organism is a marker/mechanism for a disease.
@cmungall I have not had a chance to develop the dag view of the evidence graph but would like to prioritize this.
Need to look in more detail at CTD, but I think your analysis is correct @kshefchek
Even if the inferences are wrong we should still be listing the source(s); is this hard to do?
awesome! has the PMID:PMID issue already been logged?
yes but not sure where we are in fixing the parser: https://github.com/monarch-initiative/dipper/issues/267
The phenotype profile for an HSP fly gene has an inferred association with "writer's cramp"; however, the lack of any transparent basis for the inference makes statements like these look especially weird.
Moreover, the only human homolog listed for the gene is heat shock 70kDa protein 5 (glucose-regulated protein, 78kDa); however, this homolog isn't associated with "writer's cramp" either.. I realize that HSPs are a bit of a special case, but our results should nevertheless not confuse people. Perhaps this phenotype is inferred from a homology that is not visible from the fly gene homologs page?
@TomConlin perhaps you could investigate where this result originates? Not sure if it is dipper, scigraph, owlsim, ontologies, or my own fundamental lack of understanding.