monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
15 stars 2 forks source link

NCBI Ingest #165

Closed putmantime closed 2 years ago

putmantime commented 2 years ago

Gene integrates information from a wide range of species, and includes nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide. Taxon lists the taxinomic organization of organisms. Pub2Gene serves links between genes and pubmed identifiers where they are mentioned. How do we use it? We use NCBIGene ids and symbols as the primary identifier and label for human genes in our system and NCBITaxon identifiers and scientific name for species-specific labeling. For any given gene, we also list the annotated pmids from Pub2Gene. Download RDF Ingested files: http://ftp.ncbi.nih.gov/gene/DATA/gene_group.gz http://ftp.ncbi.nih.gov/gene/DATA/gene2pubmed.gz http://ftp.ncbi.nih.gov/gene/DATA/gene_info.gz https://data.omim.org/downloads/mimTitles.txt retrieved on 2021-03-10 http://ftp.ncbi.nih.gov/gene/DATA/gene_history.gz Monarch Ingest Date: 2021-03-09 License information from the (Re)usable Data Project

putmantime commented 2 years ago

For our first iteration we will do a node ingest only. THis includes genes, their ids, labels, symbols and the taxon the gene comes from. The file to use for the initial node ingest is gene_info Limit the node ingest to current list of taxon ids in the graph 9615 9913 9823 9031 162425