monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
56 stars 26 forks source link

add kegg pathways #89

Closed nlwashington closed 8 years ago

nlwashington commented 9 years ago

need to import the kegg pathways and gene members. kegg maps its pathways to a uber-gene (kegg ortholog), to which we can link to a species-specific gene via their orthology maps.

here is the relevant stuff: lists of pathways can be found here (this will get us identifiers and labels): http://rest.genome.jp/list/pathway

ortholog classes: http://rest.genome.jp/list/orthology

orthology mapping to gene ids: http://rest.genome.jp/link/orthology/mmu (where the last part is the species prefix... we should get at minimum: hsa, mmu, rno, dme, dre, cel. and more in the future)

gene ids (human): http://rest.genome.jp/list/hsa unfortunately, all genomes have a different prefix, and i don't think the complete list can be obtained together

"reference" pathways, which are annotated with ortholog classes (ko). http://rest.genome.jp/link/pathway/ko

or human genes to pathway map (similar things could be obtained for each species): http://rest.genome.jp/link/pathway/hsa

@cmungall shall we take in the uber-genes (ortholog classes) here? or shall we just figure out the species-specific mapping at ingest time to map a pathway to species-specific genes? (as in, create some kind of monarch-association that has as evidence the inference graph of pathway -> kegg ortholog -> human gene ? and do this for any of the species? happy to keep this as the pathways having the grouping "kegg ortholog" nodes for abstraction.

nlwashington commented 9 years ago

see related NIF ticket https://support.crbs.ucsd.edu/browse/NIF-11883

bryanlaraway commented 9 years ago

Latest commit should have most of this wrapped up, but requires review.

nlwashington commented 9 years ago
nlwashington commented 9 years ago
nlwashington commented 8 years ago

@cmungall what exactly would a "disease pathway" be, ontologically? is the disease pathway equivalent to the disease? or something else?

cmungall commented 8 years ago

I would say that the disease ontology should be focused on causal models and that the two are equivalent. This is an ongoing ontological debate, I am in favor of the river flow model http://content.iospress.com/articles/applied-ontology/ao147

but we should be guided by practicalities. Here I would tend towards equivalence between DO classes, KEGG pathway classes, PW classhes...

pnrobinson commented 8 years ago

interesting paper, but what do you mean, equivalence of DO and KEGG classes?

cmungall commented 8 years ago

KEGG IDs for e.g. Parkinson Disease

On 16 Oct 2015, at 0:38, Peter Robinson wrote:

interesting paper, but what do you mean, equivalence of DO and KEGG classes?

Dr. med. Peter N. Robinson, MSc. Professor of Medical Genomics Professor in the Bioinformatics Division of the Department of Mathematics and Computer Science of the Freie Universität Berlin Institut für Medizinische Genetik und Humangenetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Germany +4930 450566006 Mobile: 0160 93769872 peter.robinson@charite.de http://compbio.charite.de http://www.human-phenotype-ontology.org Introduction to Bio-Ontologies: http://www.crcpress.com/product/isbn/9781439836651 I have learned from my mistakes, and I am sure I can repeat them exactly ORCID ID:http://orcid.org/0000-0002-0736-9199 Scopus Author ID 7403719646 Appointment request: http://doodle.com/pnrobinson


Von: Chris Mungall [notifications@github.com] Gesendet: Freitag, 16. Oktober 2015 07:36 An: monarch-initiative/dipper Betreff: Re: [dipper] add kegg pathways (#89)

I would say that the disease ontology should be focused on causal models and that the two are equivalent. This is an ongoing ontological debate, I am in favor of the river flow model http://content.iospress.com/articles/applied-ontology/ao147

but we should be guided by practicalities. Here I would tend towards equivalence between DO classes, KEGG pathway classes, PW classhes...

— Reply to this email directly or view it on GitHubhttps://github.com/monarch-initiative/dipper/issues/89#issuecomment-148615068.


Reply to this email directly or view it on GitHub: https://github.com/monarch-initiative/dipper/issues/89#issuecomment-148638631

nlwashington commented 8 years ago

just to be clear, there are kegg disease ids, and kegg pathway ids for those diseases. are you saying that those should be equivalent?

cmungall commented 8 years ago

ah, I see, I forgot they were distinct entities in KEGG, even when the pathway is named for the disease. Yes, in the interest of practicality, equivalence should only be disease<->disease.

For the relationship between the pathway and the disease, doing this the "correct" way will require looking at the different kinds of implicit relationships (note: my knowledge of KEGG may be out of date or incorrect), which I think are

This may be OTT for now, some kind broad has-phenotype type relationship may be most practical

nlwashington commented 8 years ago

when you say "has phenotype relationship", you mean to say like: KEGG:disease_id RO:has_phenotype KEGG:disease_pathway_id that doesn't make sense to me. or did you mean KEGG:disease_pathway_id RO:has_phenotype KEGG:disease_id? that makes slightly more sense, but is still odd.

also, i don't think that a pathway is a "normal" pathway; i think that it is in fact showing the progression of a disease, which itself is a deviation from normal. for example, here's the parkinson's "reference" pathway KEGG:map05012, or with the human genes highlighted KEGG:hsa05012, and the disease entry KEGG-ds:H00057

pnrobinson commented 8 years ago

I dont see these pathways being useful in this way. The pathways are "involved in the pathogenesis of" but this is extremely complicated, often unknown, and something that is crying out for a more detailed representation. What actually is the intended use case? -Peter

Dr. med. Peter N. Robinson, MSc. Professor of Medical Genomics Professor of Bioinformatics, Freie Universität Berlin Institut für Medizinische Genetik und Humangenetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Germany +4930 450566006 Mobile: 0160 93769872 peter.robinson@charite.de http://compbio.charite.de http://www.human-phenotype-ontology.org I have learned from my mistakes, and I am sure I can repeat them exactly ORCID ID:http://orcid.org/0000-0002-0736-9199 Scopus Author ID 7403719646 Appointment request: http://doodle.com/pnrobinson


Von: Nicole Washington [notifications@github.com] Gesendet: Montag, 2. November 2015 19:38 An: monarch-initiative/dipper Cc: Robinson, Peter Betreff: Re: [dipper] add kegg pathways (#89)

when you say "has phenotype relationship", you mean to say like: KEGG:disease_id RO:has_phenotype KEGG:disease_pathway_id that doesn't make sense to me. or did you mean KEGG:disease_pathway_id RO:has_phenotype KEGG:disease_id? that makes slightly more sense, but is still odd.

also, i don't think that a pathway is a "normal" pathway; i think that it is in fact showing the progression of a disease, which itself is a deviation from normal. for example, here's the parkinson's "reference" pathway KEGG:map05012http://www.kegg.jp/dbget-bin/www_bget?map05012, or with the human genes highlighted KEGG:hsa05012http://www.kegg.jp/kegg-bin/show_pathway?hsa05012, and the disease entry KEGG-ds:H00057http://www.kegg.jp/dbget-bin/www_bget?ds:H00057

— Reply to this email directly or view it on GitHubhttps://github.com/monarch-initiative/dipper/issues/89#issuecomment-153119554.

nlwashington commented 8 years ago

here's another example where there's a disease (Pyruvate carboxylase deficiency, KEGG-ds:H00073) that involves two pathways: Pyruvate metabolism (KEGG-path:hsa00620) and Citrate cycle (TCA cycle) (KEGG-path:hsa00020). in this case it's more like the "involoved in the biogenesis of" as @pnrobinson suggests, but I don't think there's a relationship like this in RO. the closest i see is "causally upstream of or within" RO:0002418, which is between processes. i can use this for now, which seems to be the most appropriate.

nlwashington commented 8 years ago

this is now implemented. if it is wrong, please re/open.