monarch-initiative / monarch-legacy

Monarch web application and API
BSD 3-Clause "New" or "Revised" License
42 stars 37 forks source link

Some inferred phenotypes look weird (Flies with writer's cramp) #1232

Open jmcmurry opened 8 years ago

jmcmurry commented 8 years ago

The phenotype profile for an HSP fly gene has an inferred association with "writer's cramp"; however, the lack of any transparent basis for the inference makes statements like these look especially weird.

Moreover, the only human homolog listed for the gene is heat shock 70kDa protein 5 (glucose-regulated protein, 78kDa); however, this homolog isn't associated with "writer's cramp" either.. I realize that HSPs are a bit of a special case, but our results should nevertheless not confuse people. Perhaps this phenotype is inferred from a homology that is not visible from the fly gene homologs page?

@TomConlin perhaps you could investigate where this result originates? Not sure if it is dipper, scigraph, owlsim, ontologies, or my own fundamental lack of understanding.

kshefchek commented 8 years ago

Figuring out where the gene is being linked to parkinson's disease would be a good starting point, and why this genotype is not linked to the disease with "is model of/RO:0003301" as this would prevent the phenotypes from being shown here. We don't (intentionally) infer phenotypes through homology.

According to our beta site where sources should be working, this is coming from CTD, pub: http://www.ncbi.nlm.nih.gov/pubmed/18353766, so this gene part of a fly model of Parkinsons. We may need to make an anonymous "model" that has the genes in this pub, which is linked to parkinsons with RO:0003301.

TomConlin commented 8 years ago

Neither NCBIGene:32133 nor HP:0002356 exist directly in the triples produced by my run of the dipper script last night.

Jenkins says the last successful flybase ingest was Feb 9, 2016

The triples in http://data.monarchinitiative.org/ttl/ are dated:

flybase.nt                                         17-Oct-2015 01:47          3057877576
flybase.ttl                                        25-Aug-2015 05:07          1226398584
flybase_dataset.nt                                 17-Oct-2015 01:34                1142
flybase_dataset.ttl                                17-Oct-2015 00:40                 734
flybase_test.ttl                                   17-Oct-2015 00:41             3594184
cmungall commented 8 years ago

Unfortunately you can't easily debug by looking at the source-specific ttl files in isolation, the value we bring is in merging multiple sources of assertions. But this means we have to pay special attention to the provenance.

In this case you can see the equivalence axioms for the gene:

https://scigraph-data.monarchinitiative.org/scigraph/graph/neighbors/NCBIGene:32133?depth=1&blankNodes=false&direction=BOTH&project=*

The flybase assertions are made about the FlyBase ID

@kshefchek is the evidence view you are building helpful here?

We need a better way of exploring scigraph-data, @jnguyenx started looking at this yesterday but we're both out today

cmungall commented 8 years ago

Here is the inferred edge in golr:

https://solr-dev.monarchinitiative.org/solr/golr/select/?q=subject_closure:%22NCBIGene:32133%22%20AND%20object_closure:%22HP:0002356%22&wt=json

we can work backwards from here to the asserted edges in scigraph-data

cmungall commented 8 years ago

here is what the evidence graph json looks like:

{
 "nodes": [
  {
   "id": "DOID:14330",
   "lbl": "Parkinson's disease",
   "meta": {}
  },
  {
   "id": "PMID:3504239",
   "lbl": null,
   "meta": {}
  },
  {
   "id": "PMID:18353766",
   "lbl": null,
   "meta": {}
  },
  {
   "id": "FlyBase:FBgn0001218",
   "lbl": "Hsc70-3",
   "meta": {}
  },
  {
   "id": "PMID:19232169",
   "lbl": null,
   "meta": {}
  },
  {
   "id": "PMID:17131231",
   "lbl": null,
   "meta": {}
  },
  {
   "id": "ECO:0000033",
   "lbl": "traceable author statement",
   "meta": {}
  },
  {
   "id": "_:genid1902872",
   "lbl": "some variant of HSC70-3 that is marker/mechanism for Parkinson Disease",
   "meta": {}
  },
  {
   "id": "PMID:9074398",
   "lbl": null,
   "meta": {}
  },
  {
   "id": "PMID:2296384",
   "lbl": null,
   "meta": {}
  },
  {
   "id": "ECO:0000246",
   "lbl": "computational combinatorial evidence used in automatic assertion",
   "meta": {}
  },
  {
   "id": "MONARCH:3ff74c7344d8679dde438ef8db3f55f1",
   "lbl": null,
   "meta": {}
  },
  {
   "id": "MONARCH:17df78989012e2402d56252855d51c7e",
   "lbl": null,
   "meta": {}
  },
  {
   "id": "HP:0002356",
   "lbl": "Writer's cramp",
   "meta": {}
  }
 ],
 "edges": [
  {
   "sub": "MONARCH:3ff74c7344d8679dde438ef8db3f55f1",
   "obj": "DOID:14330",
   "pred": "http://purl.org/oban/association_has_object",
   "meta": {}
  },
  {
   "sub": "MONARCH:3ff74c7344d8679dde438ef8db3f55f1",
   "obj": "PMID:18353766",
   "pred": "http://purl.org/dc/elements/1.1/source",
   "meta": {}
  },
  {
   "sub": "MONARCH:17df78989012e2402d56252855d51c7e",
   "obj": "PMID:2296384",
   "pred": "http://purl.org/dc/elements/1.1/source",
   "meta": {}
  },
  {
   "sub": "MONARCH:17df78989012e2402d56252855d51c7e",
   "obj": "DOID:14330",
   "pred": "http://purl.org/oban/association_has_subject",
   "meta": {}
  },
  {
   "sub": "DOID:14330",
   "obj": "HP:0002356",
   "pred": "http://purl.obolibrary.org/obo/RO_0002200",
   "meta": {}
  },
  {
   "sub": "MONARCH:3ff74c7344d8679dde438ef8db3f55f1",
   "obj": "ECO:0000033",
   "pred": "http://purl.obolibrary.org/obo/RO_0002558",
   "meta": {}
  },
  {
   "sub": "MONARCH:17df78989012e2402d56252855d51c7e",
   "obj": "PMID:17131231",
   "pred": "http://purl.org/dc/elements/1.1/source",
   "meta": {}
  },
  {
   "sub": "_:genid1902872",
   "obj": "FlyBase:FBgn0001218",
   "pred": "http://purl.obolibrary.org/obo/GENO_0000408",
   "meta": {}
  },
  {
   "sub": "_:genid1902872",
   "obj": "DOID:14330",
   "pred": "http://purl.obolibrary.org/obo/RO_0002607",
   "meta": {}
  },
  {
   "sub": "MONARCH:17df78989012e2402d56252855d51c7e",
   "obj": "PMID:19232169",
   "pred": "http://purl.org/dc/elements/1.1/source",
   "meta": {}
  },
  {
   "sub": "MONARCH:17df78989012e2402d56252855d51c7e",
   "obj": "HP:0002356",
   "pred": "http://purl.org/oban/association_has_object",
   "meta": {}
  },
  {
   "sub": "MONARCH:17df78989012e2402d56252855d51c7e",
   "obj": "PMID:3504239",
   "pred": "http://purl.org/dc/elements/1.1/source",
   "meta": {}
  },
  {
   "sub": "MONARCH:3ff74c7344d8679dde438ef8db3f55f1",
   "obj": "_:genid1902872",
   "pred": "http://purl.org/oban/association_has_subject",
   "meta": {}
  },
  {
   "sub": "MONARCH:17df78989012e2402d56252855d51c7e",
   "obj": "ECO:0000246",
   "pred": "http://purl.obolibrary.org/obo/RO_0002558",
   "meta": {}
  },
  {
   "sub": "MONARCH:17df78989012e2402d56252855d51c7e",
   "obj": "PMID:9074398",
   "pred": "http://purl.org/dc/elements/1.1/source",
   "meta": {}
  }
 ]
}
cmungall commented 8 years ago

See https://github.com/monarch-initiative/configs/issues/18 for the plan to have better provenance in the evidence graph, which is good to do in general, not just our own data debugging.

TomConlin commented 8 years ago
curl -s "https://scigraph-data.monarchinitiative.org/scigraph/graph/neighbors/NCBIGene:32133?depth=1&blankNodes=false&direction=BOTH&project=*"|\
jq -c '.edges[]|[.sub,.pred,.obj]'|\
sed 's/\[\"/</g;s/\"\,\"/\> \</g;s/\"\]/\> \./g'|\
rapper -i ntriples -o dot -I http://example.com - |\
dot -T png -o NCBIGene_32133.png
display  NCBIGene_32133.png

ncbigene_32133

Doing the same for FlyBase:FBgn0001218 returns only the edge to NCBIGene:32133 increasing the path length "&depth=2" reproduces an identical graph.

cmungall commented 8 years ago

nice chaining together of apps there; but we're looking at the wrong graph. The source assertions may be >1 hop away. The evidence graph is what you should be looking at:

e

so we can see that the fly gene Hsc70-3 has a variant which is a marker for Parkinsons which has Writer's cramp as a phenotype.Our inference rules are clearly too liberal here

(aside: I took a path that is nearly as tortuous as yours to visualize it. This should not be necessary. We have multiple options for visualizing a bbop-graph, @kshefchek or @kltm or @DoctorBud could advise better)

kshefchek commented 8 years ago

In regards to the original issue, should we infer phenotypes 1..n of disease x when gene y is a marker/mechanism of disease x? This seems incorrect. I would rather have us link the genes to an anonymous fly model, or through an anonymous variant in an anonymous model and use the "is model of" relation. But I'm not sure if this can be done for all CTD data where a gene from a model organism is a marker/mechanism for a disease.

@cmungall I have not had a chance to develop the dag view of the evidence graph but would like to prioritize this.

cmungall commented 8 years ago

Need to look in more detail at CTD, but I think your analysis is correct @kshefchek

jmcmurry commented 8 years ago

Even if the inferences are wrong we should still be listing the source(s); is this hard to do?

kshefchek commented 8 years ago

yes, see beta: https://beta.monarchinitiative.org/gene/NCBIGene:32133#phenotypes

jmcmurry commented 8 years ago

awesome! has the PMID:PMID issue already been logged?

kshefchek commented 8 years ago

yes but not sure where we are in fixing the parser: https://github.com/monarch-initiative/dipper/issues/267