monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

add swine medical models #129

Open nlwashington opened 9 years ago

nlwashington commented 9 years ago

this page: http://nsrrc.missouri.edu/index.asp

has a nice listing of pig research models. i don't see any way of getting the data that underlies this without scraping. perhaps we can email the maintainers of the website at NSRRC@missouri.edu.

these might need to be hand curated, but there's only a few. let's first check with them to see if they maintain any properly identified data, otherwise we'll curate.

nlwashington commented 9 years ago

i have sent an email to their helpdesk.

nlwashington commented 9 years ago

having reviewed the info on their site, i think we should just hand-curate this resource, and then have some process via dipper that will notify us if there are new models that become available through this site.

@nicolevasilevsky can you help with this task? there are ~80 lines lines outlined on this page: http://nsrrc.missouri.edu/StrainAvail.asp each is annotated to some kind of research area, and a subset are linked to individual pages with references.

the task here would be to iterate through each strain (linked from the identifier), and:

for example, NSRRC:0011:

this could be done in a simple excel doc, and exported as tsv. let's put it into one of our monarch gits.

nlwashington commented 9 years ago

@mbrush should these animals just be instances of NCBITaxon:9823 (pig) or is there another class that we should use to indicate an organism (just like for humans we have foaf:Person)?

nlwashington commented 9 years ago

@nicolevasilevsky took a first pass on the data here: https://docs.google.com/spreadsheets/d/1cBdWOs2Hk4z3Bbjvz1jEzj0Oh9eOuMbX9ePtzjydkdk/edit#gid=0

This will need a final pass to make it uniform for processing by dipper. Once cleaned up, it should be exported to TSV and long-term storage in some monarch-data git repo (TBD).