progenetix / arraymap2ga4gh

Schema examples
2 stars 2 forks source link

Implementation of the GA4GH schema based on genome profiles and metadata from arrayMap

This repository will contain data and information regarding the arrayMap based implementation of a GA4GH schema structure. While it is not expected that GA4GH compliant resources mirror the schema in their internal structure, this project is aimed at showing the principle feasibility of such an approach, mainly to test & drive schema development.

Data & schemas represented here are not kept in a stable/versioned status, but are updated together with or anticipating GA4GH schema changes.

Structure:

How to import the data

The data is in JSON format, you can use MongoDB for easy import and manipulation

The download and installation instructions of the community version of MongoDB can be found here.

Each zip file contains not only the demo data in json, but also a shell script to import the data into json. You can simply run:

sh importdb.sh

Data manipulation with MongDB shell

To query from MongoDB shell

use test
db.biosamples.find({'attributes.country.values.string_value' : 'United Kingdom'})
db.biosamples.findOne({'description' : {'$regex' : 'breast'}})
db.variants.find({variant_type:"DEL", reference_name:"17", start:{$gte:30000000}, end:{$lte:31000000}},{"calls.call_set_id":1})

Data manipulation with Python

In the tools directory, IPython/Jupyter notebooks are provided for exploring the datasets and -structures.

The instruction for installing Jupyter can be found here

Perl server CGI & Beacon+ implementation

A Perl based backend and Beacon+ UI can be found at beaconplus-server and beaconplus-ui.