Open mbrush opened 8 years ago
cc @balhoff and @jnguyenx to help think about a general solution.
Looking at the complexities of the linked ticket, the easiest solution may be procedural code (that makes use of a declarative query language like SPARQL) which writes new edges into the graph along the manner you suggest.
After that we could explore a more generic rule engine driven more by the semantics of the ontology
We also run into this issue when modeling variant-phenotype associations and propagating to the gene.
When modeling variants that cover more than one gene we use the relation: GENO:0000418 ! has_affected_locus
However, in cases where a variant covers more than one gene (some haplotypes, large deletions), we may not want to propagate the variant-phenotype relation to each gene affected by the variation.
Currently investigating if we can get around this in our cypher queries with no luck.
This is how I plan to solve the variant-gene issue:
MATCH (locus:gene)<-[:GENO:0000418!]-(feature)
WITH feature, COUNT(DISTINCT(locus)) as gene_count
WHERE gene_count = 1
AND NOT feature:snp
MATCH path=(subject:gene)<-[geno:GENO:0000418!]-(feature)-[:RO:0002200|RO:0002326|RO:0003302!]->(object:Phenotype)
RETURN DISTINCT path, subject, object
UNION
MATCH path=(subject:gene)<-[geno:GENO:0000418!]-(feature:snp)-[:RO:0002200|RO:0002326|RO:0003302!]->(object:Phenotype)
RETURN DISTINCT path, subject, object
We exclude snps from the filter because they can affect more than one gene, either via two genes on opposite strands, or overlapping genes on the same strand, for example: https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=rs3827760 (two genes same strand) https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=rs1551570 (two genes opposite strands)
This isn't perfect but will get the job done for now.
A couple separate issues here.