obophenotype / upheno

The Unified Phenotype Ontology (uPheno) integrates multiple phenotype ontologies into a unified cross-species phenotype ontology.
https://obophenotype.github.io/upheno/
Creative Commons Zero v1.0 Universal
76 stars 17 forks source link

Create a uPheno release product for data analysis #932

Open matentzn opened 3 months ago

matentzn commented 3 months ago

@pnrobinson requested a uPheno release product that we should add to the uPheno2 release before the end of March. Given this picture

image

I hope I understood correctly @pnrobinson that, given the above picture, you want the following table:

taxon upheno_id original_phenotype gene human_orthologue
NCBITaxon:10090 UPHENO:0034327 MP:0030719 ncbi.gene:68646 hgnc:26404

Is this correct?

pnrobinson commented 3 months ago

@matentzn this table would be exactly what we need!

matentzn commented 3 months ago

First draft

Code to generate Table ``` from neo4j import GraphDatabase # Connect to the Neo4j database bolt_url = "ASK_NICO" driver = GraphDatabase.driver(bolt_url) # Define the Cypher query query = """ MATCH (upheno:`biolink:PhenotypicFeature` WHERE upheno.id STARTS WITH "UPHENO:")<-[:`biolink:subclass_of`]-(phenotype:`biolink:PhenotypicFeature`)<-[gena:`biolink:has_phenotype`]-(gene:`biolink:Gene`)-[:`biolink:orthologous_to`]-(human_gene:`biolink:Gene` WHERE "NCBITaxon:9606" IN human_gene.in_taxon) RETURN upheno.id, phenotype.id, gene.id, gena.negated, CASE WHEN gene.in_taxon IS NOT NULL AND size(gene.in_taxon) > 0 THEN REDUCE(s = "", x IN gene.in_taxon | s + x + CASE WHEN x <> gene.in_taxon[size(gene.in_taxon)-1] THEN "|" ELSE "" END) ELSE "" END AS gene_in_taxon, human_gene.id, gena.primary_knowledge_source, gena.publications """ # Run the query and print the results data = [] with driver.session() as session: results = session.run(query) for record in results: data.append(record) import pandas as pd df = pd.DataFrame(data, columns=["upheno_grouping", "phenotype", "gene", "negated", "taxon", "human_orthologue", "source", "publications"]) df ```

Draft result:

upheno_grouping phenotype gene negated taxon human_orthologue source publications
UPHENO:0000508 ZP:0000606 ZFIN:ZDB-GENE-040426-1675 NCBITaxon:7955 HGNC:9721 infores:zfin ['ZFIN:ZDB-PUB-170311-8']
UPHENO:0000508 ZP:0000606 ZFIN:ZDB-GENE-040426-1675 NCBITaxon:7955 HGNC:30262 infores:zfin ['ZFIN:ZDB-PUB-170311-8']
UPHENO:0000508 WBPhenotype:0000848 WB:WBGene00044068 NCBITaxon:6239 HGNC:12927 infores:wormbase ['PMID:16803962']
UPHENO:0000508 WBPhenotype:0000848 WB:WBGene00009178 NCBITaxon:6239 HGNC:15664 infores:wormbase ['PMID:22073243']
UPHENO:0000508 WBPhenotype:0000848 WB:WBGene00009178 NCBITaxon:6239 HGNC:15663 infores:wormbase ['PMID:22073243']
UPHENO:0000508 WBPhenotype:0000848 WB:WBGene00000914 NCBITaxon:6239 HGNC:9984 infores:wormbase ['PMID:29301909']
UPHENO:0000508 WBPhenotype:0000848 WB:WBGene00000914 NCBITaxon:6239 HGNC:9983 infores:wormbase ['PMID:29301909']
UPHENO:0000508 WBPhenotype:0000848 WB:WBGene00000914 NCBITaxon:6239 HGNC:9982 infores:wormbase ['PMID:29301909']
UPHENO:0000508 WBPhenotype:0000848 WB:WBGene00022620 NCBITaxon:6239 HGNC:20165 infores:wormbase ['PMID:25635455']
UPHENO:0000508 WBPhenotype:0000848 WB:WBGene00022620 NCBITaxon:6239 HGNC:17407 infores:wormbase ['PMID:25635455']

@pnrobinson if this works for you, you can do a first experiment with this table:

https://www.dropbox.com/scl/fi/zbjt48afy4efkbki8szy5/upheno_gene_human_orthologues.tsv?rlkey=yr0vl7ky3ldeaura8kllagubn&dl=0

@kevinschaper did all the heavy lifting, so THANK YOU!