monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

ingest data sources by external collaborators #495

Open lwinfree opened 7 years ago

lwinfree commented 7 years ago

This ticket is for starting documentation to allow external collaborators to ingest their data sources. This ticket was inspired by Xenbase contacting us requesting help with documentation for the core data necessary for a source to be ingested.

We are interested in identifying the core complement of data 
that Monarch typically requires to be available for the purpose 
of preparing RDF turtle formatted data for consumption by 
SciGraph, through the Dipper ingestion pipeline.

Goals:

lwinfree commented 7 years ago

Copied from an email, from Kent: "We don't currently support a standard dipper format, but we aim to support common formats such as GAF, VCF, GFF3. I would be curious what their thoughts are on using Panther for orthology as we already support this, or if they have additional or improved models for orthology.

As always any phenotype or data on disease models would be a top priority, if they have that."

kltm commented 7 years ago

Noting #329 . Over on that side, we want to have a fairly standardized ingest method using SPARQL, etc.

lwinfree commented 7 years ago

Taken from email from Xenbase, about the info they have compiled so far:

malcolmfisher103 commented 7 years ago

All of those are our best approximate equivalents to the ZFIN files pulled by Dipper. They may not be the best sources for the specific data required by Monarch.

lwinfree commented 7 years ago

Thanks Malcolm all of the information from y'all is very helpful right now! :)