Closed nlwashington closed 8 years ago
initial commit (just gets files) with 0ae696e
i've made an initial commit for this source, here: c863aa9 it will dump id, label, description. as well as mark equivalent (moved) and deprecated (removed) classes.
some examples of things i saw, but didn’t do yet are:
processing this source requires some REST calls to their server. at the moment we do not cache the raw data (json) from those calls, but we should. (see related ticket #28.)
as suggested by @mellybelly, @mbrush can use the output of this file to load into DO for definitions of the omim diseases. please update this ticket with changes that you'd like on a file-wide basis.
@cmungall here do you want the pithy disease description to be a dc:description (that's what i have them now), or do you want them to be iao:0000115 definition or http://www.w3.org/2009/08/skos-reference/skos.html#definition? right now scigraph only makes the 'definitions' available in the vocab services, but they ought to be in the graph service. or should it be both a description and definition?
for example:
OBO:OMIM_105150 a owl:Class ;
rdfs:label "CEREBRAL AMYLOID ANGIOPATHY, CST3-RELATED" ;
dc:description "Cerebral amyloid angiopathy (CAA), defined by the deposition of congophilic material in the vessels of the cortex and leptomeninges, is a major cause of intracerebral hemorrhage in the elderly ...
Let's go with definition
Even though it's more of a description than a definition, various points in the consumer chain like definitions (SG, DO editors viewing in Protege
Some of the descriptions seem to have excessive quoting e.g. OBO:OMIM_101200; not a big deal, FYI
And can we reinsert the code to make the labels a bit friendlier?
@cmungall or @pnrobinson, for something that is a susceptibility locus, like: http://omim.org/entry/607339 my automated pipeline dumbly creates, OMIM:607339 has_phenotype OMIM:607339 because this is what the morbidmap file says.
should this omim identifier just be considered a locus, and then assume that it will be mapped to the proper HPO phenotype elsewhere, or should it be considered a "disease" too, and mapped listed as-is, where it can be defined as both a disease and a genomic locus? it would be easy for me to filter out those items where gene_id == phenotype_id when processing the morbidmap, if it is confusing.
I think we should skype about this, it is too complicated for email, but I think there is a solution! I am available any time next week for instance. -Peter
Dr. med. Peter N. Robinson, MSc. Professor of Medical Genomics Professor in the Bioinformatics Division of the Department of Mathematics and Computer Science of the Freie Universität Berlin Institut für Medizinische Genetik und Humangenetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Germany +4930 450566006 Mobile: 0160 93769872 peter.robinson@charite.de http://compbio.charite.de http://www.human-phenotype-ontology.org Introduction to Bio-Ontologies: http://www.crcpress.com/product/isbn/9781439836651 I have learned from my mistakes, and I am sure I can repeat them exactly ORCID ID:http://orcid.org/0000-0002-0736-9199 Scopus Author ID 7403719646 Appointment request: http://doodle.com/pnrobinson
Von: Nicole Washington [notifications@github.com] Gesendet: Samstag, 10. Januar 2015 00:30 An: monarch-initiative/dipper Cc: Robinson, Peter Betreff: Re: [dipper] add omim (#18)
@cmungallhttps://github.com/cmungall or @pnrobinsonhttps://github.com/pnrobinson, for something that is a susceptibility locus, like: http://omim.org/entry/607339 my automated pipeline dumbly creates, OMIM:607339 has_phenotype OMIM:6073399 because this is what the morbidmap file says.
should this omim identifier just be considered a locus, and then assume that it will be mapped to the proper HPO phenotype elsewhere, or should it be considered a "disease" too, and mapped listed as-is, where it can be defined as both a disease and a genomic locus? it would be easy for me to filter out those items where gene_id == phenotype_id when processing the morbidmap, if it is confusing.
— Reply to this email directly or view it on GitHubhttps://github.com/monarch-initiative/dipper/issues/18#issuecomment-69419263.
for the labels, i'm making the following modifications:
here's some examples: MUCOPOLYSACCHARIDOSIS, TYPE IIIA; MPS3A --> Mucopolysaccharidosis, Type 3A MUCOPOLYSACCHARIDOSIS, TYPE VII; MPS7 --> Mucopolysaccharidosis, Type 7 MOYAMOYA DISEASE 1; MYMY1 --> Moyamoya Disease 1 MUCOLIPIDOSIS III GAMMA --> Mucolipidosis 3 Gamma MUCOPOLYSACCHARIDOSES, UNCLASSIFIED TYPES --> Mucopolysaccharidoses, Unclassified Types
@cmungall, the turtle syntax definition is: "Literals are written either using double-quotes when they do not contain linebreaks like "simple literal" or """long literal""" when they may contain linebreaks. "
so the extra quoting in the definitions reflects this. do you want me to remove the linebreaks instead (and replace them with some kind of separator)?
@pnrobinson i've added a request for a new relationship type that we'd be able to use here (and for gwas data, etc.). see: https://code.google.com/p/obo-relations/issues/detail?id=31 please comment with any additional requirements, use cases, disambiguations, etc.
@cmungall for omim variants, they are usually referred to in descriptions with an id like 157140.0009, but resolve to http://omim.org/entry/157140#0009. for omim disease URI, we actually use the OMIM purl. what URI should i use for the variants?
.
we can get the omim variants straight from clinvar, so i will punt on pulling the variant info here. edit: we have to at least get the variant labels here for the omim-style variants as they are the authoritative source.
Association structure along with variants and links to publications are drawn:
@mbrush please review and close if satisfied.
add omim data source, esp to get classes, labels, and descriptions.