greenelab / snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊
Other
59 stars 17 forks source link

Extracting relationships from Hetionet v1.0 #22

Closed dhimmel closed 7 years ago

dhimmel commented 7 years ago

For each relationship we're trying to model, we'll need to extract the Hetionet relationships. Right now we'll be using Hetionet v1.0 relationships as the only knowledge base for a relationships. In the future, we could use multiple resources as knowledge bases for a specific relationship type. Each knowledge base forms its own labeling function.

You can read all Hetionet relationships (with no relationship properties) from this TSV. It's formatted like:

source  metaedge        target
Gene::9021      GpBP    Biological Process::GO:0071357
Gene::51676     GpBP    Biological Process::GO:0098780
Gene::19        GpBP    Biological Process::GO:0055088
Gene::3176      GpBP    Biological Process::GO:0010243

Alternatively, you can make a cypher query for each relationship type to https://neo4j.het.io, like:

MATCH path = (disease:Disease)-[:ASSOCIATES_DaG]->(gene:Gene)
RETURN
  disease.identifier AS disease_id,
  gene.identifier AS gene_id
ORDER BY disease_id, gene_id

You can make these queries programmatically to return pandas DataFrames in Python.