typedb-osi / typedb-bio

TypeDB Bio: Biomedical Knowledge Graph
73 stars 30 forks source link
bioinformatics biomedical drug-discovery knowledge-base knowledge-discovery knowledge-graph

TypeDB Bio: Biomedical Knowledge Graph

Overview | Installation | Datasets | Examples | How You Can Help | Further Learning

Discord Discussion Forum Stack Overflow Stack Overflow

Overview

TypeDB Bio is an open source biomedical knowledge graph to enable research in areas such as drug discovery, precision medicine and drug repurposing. It provides biomedical researchers an intuitive way to query interconnected and heterogeneous biomedical data in one single place.

For example, by querying for the virus SARS-CoV-2, we can find the associated human protein, proteasome subunit alpha type-2 (PSMA2), a component of the proteasome, implicated in SARS-CoV-2 replication, and its encoding gene (PSMA2). Additionally, we can identify the drug carfilzomib, a known inhibitor of the proteasome that could therefore be researched as a potential treatment for patients with Covid-19.

image

By examining these specific relationships and their attributes, we can further investigate any connected biological components and better understand their inter-relations. This helps researchers to efficiently study the mechanisms of protein interactions, infections, the immune response, and help to find targets for the development of treatments or drugs more efficiently. We can also expand our search to include contextual information as is shown below:

image

The team behind TypeDB Bio consists of a partnership between GSK, Oxford PharmaGenesis and Vaticle

The schema that models the underlying knowledge graph alongside the descriptive query language, TypeQL, makes writing complex queries an extremely straightforward and intuitive process. Furthermore, TypeDB's automated reasoning, allows TypeDB Bio to become an intelligent database of biomedical data in the biomedical field that infers implicit knowledge based on the explicitly stored data. TypeDB Bio can understand biological facts, infer based on new findings and enforce research constraints, all at query (run) time.

Installation

Prerequesites: Python >= 3.10, JDK >= 11, TypeDB Core >= 2.18.0, TypeDB Python Driver >= 2.18.0, TypeDB Studio >= 2.18.0

Clone this repo:

git clone https://github.com/vaticle/typedb-bio.git

Download the CORD-NER data set from this link and add it to this directory: dataset/cordner

Set up a virtual environment and install the dependencies:

cd <path/to/typedb-bio>/
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Start typedb

typedb server

Start the loader script

python loader.py

Config options can be set in: config.ini Some options can be overridden with command line arguments. For help with those arguments:

python loader.py -h

If using TypeDB Enterprise or Cloud, the connection password can only be supplied via command line for security:

python loader.py -p my-password

Now grab a coffee (or two) while the loader builds the schema and data for you!

Testing

Install the test dependencies:

pip install -r requirements_test.txt

Run the tests:

python -m pytest -v -s tests

Development

Install the development dependencies:

pip install -r requirements_dev.txt
pre-commit install

Examples

TypeQL queries can be run either in TypeDB Studio, in TypeDB Console, or through driver APIs. However, we encourage running the queries on TypeDB Studio to have the best visual experience.

# What are the drugs that interact with the genes associated to the virus Sars?

match
$virus isa virus, has virus-name "SARS";
$gene isa gene;
$drug isa drug;
$rel1 ($gene, $virus) isa gene-virus-association;
$rel2 ($gene, $drug) isa drug-gene-interaction;
offset 0; limit 20;
image

Datasets

Currently the datasets we've integrated include:

In progress:

We plan to add many more datasets!

How You Can Help

This is an on-going project and we need your help! If you want to contribute, you can help out by helping us including:

If you wish to get in touch, please talk to us on the #typedb-bio channel on our Discord (link here).

Further Learning