greenelab / snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊
Other
59 stars 17 forks source link

Gold standard of epilepsy-associated genes #9

Closed dhimmel closed 7 years ago

dhimmel commented 7 years ago

@danich1 has been prototyping with extracting epilepsy associated genes. This has been convenient since we don't have to deal with mapping PubTator diseases, which use the MEDIC vocabulary. Additionally, PubTator tags genes using Entrez identifiers, which Hetionet uses as well.

Here is a Cypher query to get a "gold standard" of epilepsy-associated genes from https://neo4j.het.io (adapted from here):

MATCH (disease:Disease)-[assoc:ASSOCIATES_DaG]-(gene:Gene)
WHERE disease.name = 'epilepsy syndrome'
RETURN
  gene.name AS gene_symbol,
  gene.description AS gene_name,
  gene.identifier AS entrez_gene_id,
  assoc.sources AS sources
 ORDER BY gene_symbol

There are 399 epilepsy-associated genes. As an aside, these genes are not all guaranteed to be bonafide epilepsy genes. We integrated several databases -- the list is not perfect but it should be good enough.

I downloaded the results as a CSV: epilepsy-associated-genes.csv.txt. @danich1 does this look like it will suit your needs?

danich1 commented 7 years ago

This file will do just fine. It will be a great reference while we work on issue #8!!