MCKB is a pure python application to extract and transform clinically actionable cancer linked variants and metadata into a directed graph.
Represent cancer data using controlled vocabularies (ontologies) and output as a directed graph serialized as RDF triples. MCKB is currently a proof of concept to explore the benefits and challenges of mapping cancer data to available ontologies and storing both the output data and ontologies in a single datastore. As a test set, MCKB is using a subset of a physician curated dataset by Dr. Rodrigo Dienstmann which was curated and transformed into a RDMS by the OHSU Clinical Genomics Database team.
While the output data files can be stored in various databases, we also provide [configuration files] (https://github.com/monarch-initiative/mckb/tree/master/conf/SciGraph) to insert data into a Neo4J graph database using the SciGraph application. These files also contain extensions to the default REST services that include queries specific to cancer use cases using the Cypher query language and SciGraph query expansion.
Load a MySQL database using the dump file in the resources directory
Create a configuration file in the conf directory using the example_conf.json as a template
Run:
./GraphGenerator.py --config conf/conf.json
This will create a directory called out in your working directory containing the output turtle files
Example output can be found here: https://raw.githubusercontent.com/monarch-initiative/mckb/master/ttl/cgd.ttl