monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
15 stars 2 forks source link

Dictybase Gene2Phenotype Ingest #182

Closed putmantime closed 2 years ago

putmantime commented 2 years ago

We will ingest gene 2 phenotype associations from dictybase.

Downloads available at http://dictybase.org/Downloads/

putmantime commented 2 years ago

Dictyostelium http://dictybase.org/Downloads/all-mutants.html - Data is available in a well-documented easy-to-parse GAF-like format with associations to a upheno-compliant ontology. See for example, Dictyostelium discoideum --a model for many reasons - https://pubmed.ncbi.nlm.nih.gov/19387798

putmantime commented 2 years ago

The current download file doesn't have gene ids (just symbols). It's a comm sep list of genes. We would only want phenotypes with a single gene listed. We need to reach out to dictybase and request that they include gene ids.

putmantime commented 2 years ago

@matentzn Do you know who we could reach out to at dictybase to request their g2p files include ontology ids instead of names?

matentzn commented 2 years ago

@pfey03 Petra, could you propose this at dicty base?

pfey03 commented 2 years ago

@matentzn write to Sidd please, I'm off today and tomorrow or send it to dictybase@northwestern.edu.

Also, our download page from the old dictybase.org is outdated with phenotypes and I can currently only say what I uploaded in UPHENO is the latest. and the phenotype ontology I last added. Now with discussion of single step processes I wait out what can be done before I start editing again if I must. Thanks!

pfey03 commented 2 years ago

Also, our phenotypes are linked to strains, and the strains are linked to genes. Sidd should be able to give you that, especially as we are migration with all data to our new database

pfey03 commented 2 years ago

@putmantime it’s true for now this list is ok as I always updated the data when I last edited phenotype ontology: http://dictybase.org/Downloads/all-mutants.html

From the downloads page http://dictybase.org/Downloads/ there is tab delimited and Excel download and gene Ids are included in the mutant and phenotype list

pfey03 commented 2 years ago

I was off a few days because of traveling, sorry for late correct response

RichardBruskiewich commented 2 years ago

@kevinschaper and @putmantime, "...phenotypes are linked to strains, and the strains are linked to genes...." suggests an indirect model ofStrain-[has gene]->Gene, and Strain-[has phenotype]->PhenotypicFeature or Tim... if I accept your guidance above "...We would only want phenotypes with a single gene listed...." then perhaps the gene to strain mapping ought to be just one gene (if all the phenotypes for a given strain are listed on just one line... which I suspect is the case...), in which case, we'd just collapse the Gene-Strain->(Phenotype)+ to (Gene->Phenotype)+ with 'Strain' perhaps as some kind of contextual annotation(?)

RichardBruskiewich commented 2 years ago

For Genes, the http://dictybase.org/Downloads/gene_information.html page doesn't have gene symbols as gene locus identifiers but rather, reports them as gene names or synonyms, whereas, the mutant page uses both gene locus identifiers and symbols interchangeably. However, it is obvious that the gene_information page can be used as a mapping file (I checked some of the gene symbols and they do map onto the gene locus identifiers)

RichardBruskiewich commented 2 years ago

Resolved by PR https://github.com/monarch-initiative/monarch-ingest/pull/250