monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

add genereviews descriptions #51

Closed nlwashington closed 8 years ago

nlwashington commented 9 years ago

we want to bring in the genereviews data, like: http://www.ncbi.nlm.nih.gov/books/NBK1262/

we may get access to this via word docs. there are some docx parsers available, and discussions are underway with the genereviews folks about getting these.

initially, we just want to get their nice concise descriptions of the disease to augment the disease ontology.

each of the gene reviews documents has versioning information recorded on a per-file basis. @mbrush i am not sure how to deal with this. can you list how we would add to the dataset description for these files? we have this kind of information:

Document Title  Created (YYYY-MM-DD)    Updated (YYYY-MM-DD)    Revised (YYYY-MM-DD)
(GeneReviews Project)   Label   Indicate “chapter” or “appendix”    Alternate Form (pdf)
Huntington Disease  1998-10-23  2010-04-22  2005-08-30          

as well as specific individuals who have authored the document, like:

Given Name(s)   Last Name   Suffix  Degrees Affiliation Email   CorrAu?(Yes)    Au. Footnote
Simon C Warby       PhD Department of Medical Genetics  University of British Columbia 
Vancouver, British Columbia, Canada simon@cmmt.ubc.ca       
Rona K  Graham      PhD Department of Medical Genetics University of British Columbia  Vancouver, British Columbia, Canada  ronakg@cmmt.ubc.ca      
Michael R   Hayden      MB, ChB, PhD, FRCP(C), FRSC Department of Medical Genetics 
University of British Columbia Vancouver, British Columbia, Canada  mrh@cmmt.ubc.ca     
nlwashington commented 9 years ago

the genereviews ids and mappings to omim have been added in f9f7b54. because the mapping between genereviews ids and omim is 1:many, i have added the genereviews ids as grouping classes. these will need to be cleaned up manually, but this is a good start for adding to the disease ontology (@cmungall and @mellybelly). note that because i don't know the types that the omim ids are, there may be rogue gene ids that show up as subclasses of the nbk disease ids. also, do let me know if i should make each of the genereviews ids a subclass of genetic disease (DOID:630)...right now they are orphaned.

selewis commented 9 years ago

Just wondering. Might we be able to do something similar with SNOMED? If not the definitions, then at least the IDs for analogous classes. It is something that's been requested while here in Cambridge. I don't believe we have this yet, although DO had the links once upon a time (links are allowed by their licensing I believe)

On Thu, Feb 5, 2015 at 10:05 PM, Nicole Washington <notifications@github.com

wrote:

the genereviews ids and mappings to omim have been added in f9f7b54 https://github.com/monarch-initiative/dipper/commit/f9f7b5480cd66c717367bff65b273cd465027ec1. because the mapping between genereviews ids and omim is 1:many, i have added the genereviews ids as grouping classes. these will need to be cleaned up manually, but this is a good start for adding to the disease ontology (@cmungall https://github.com/cmungall and @mellybelly https://github.com/mellybelly). note that because i don't know the types that the omim ids are, there may be rogue gene ids that show up as subclasses of the nbk disease ids. also, do let me know if i should make each of the genereviews ids a subclass of genetic disease (DOID:630)...right now they are orphaned.

— Reply to this email directly or view it on GitHub https://github.com/monarch-initiative/dipper/issues/51#issuecomment-73140082 .

nlwashington commented 9 years ago

we are in negotiations with genereviews folks to get access to all word docs, but probably not before the june release. parking for now.

nlwashington commented 9 years ago

i notice that the copyright for the genereviews says that they grant distribution to others.
http://www.ncbi.nlm.nih.gov/books/NBK138602/ i have inquired with GeneReviews staff again about getting this content, and with NCBI helpdesk as well. If i get no reply, I am closing this ticket, as we cannot make any progress.

nlwashington commented 9 years ago

ncbi responded that they will not be distributing the genereviews work. i have not heard back from genereviews themselves after many months of trying to contact. therefore, i am closing this ticket.

nlwashington commented 9 years ago

here is the copyright notice from GeneReviews:

GeneReviews® chapters are owned by the University of Washington, Seattle, © 1993-2015. Permission is hereby granted to reproduce, distribute, and translate copies of content materials provided that (i) credit for source (www.ncbi.nlm.nih.gov/books/NBK1116/) and copyright (University of Washington, Seattle) are included with each copy; (ii) a link to the original material is provided whenever the material is published elsewhere on the Web; and (iii) reproducers, distributors, and/or translators comply with this copyright notice and the GeneReviews Usage Disclaimer.

however, NCBI does not allow bulk download via robots of their content. we can fetch it manually and process it in dipper to get the clinical descriptions.

jmcmurry commented 9 years ago

yay! this is so important

nlwashington commented 9 years ago

i am adding obo-style citation definitions, by appending [GeneReviews:NBK138602] to the definition of the term. this may/not make it into what is displayed in the interface.

nlwashington commented 8 years ago

now these just have to get integrated into mondo, but is out of dipper's hands. https://github.com/monarch-initiative/monarch-disease-ontology/issues/34