mklarqvist / tomahawk

Fast calculations of linkage-disequilibrium in large-scale human cohorts
https://mklarqvist.github.io/tomahawk/
MIT License
42 stars 9 forks source link
bioinformatics genetics genomics linkage-disequilibrium population-genetics vectorization

Build Status Release License Docs

screenshot

Fast calculation of LD in large-scale cohorts

Tomahawk is a machine-optimized library for computing linkage-disequilibrium from population-sized datasets. Tomahawk permits close to real-time analysis of regions-of-interest in datasets of many millions of diploid individuals on a standard laptop. All algorithms are embarrassingly parallel and have been successfully tested on datasets with up to 10 million individuals using thousands of cores on hundreds of machines using the Wellcome Trust Sanger Institute compute farm.

Tomahawk is unique in that it constructs complete haplotype/genotype contigency matrices for each comparison, perform statistical tests on the output data, and provide a framework for investigating the produced data.

Get started

Requirements

Installation

For Ubuntu, Debian, and Mac systems, installation is easy: just run

git clone --recursive https://github.com/mklarqvist/tomahawk
cd tomahawk
./install.sh

The install.sh file depends extensively on apt-get, so it is unlikely to run without extensive modifications on non-Debian-based systems. If you do not have super-user (administrator) privileges required to install new packages on your system then run the local installation:

./install.sh local

When installing locally, the required dependencies are downloaded and built in the root directory. This approach will require additional effort if you intend to move the compiled libraries to a different directory.

Contributing

Interested in contributing? Fork and submit a pull request and it will be reviewed.

Support

We are actively developing Tomahawk and are always interested in improving its quality. If you run into an issue, please report the problem on our Issue tracker. Be sure to add enough detail to your report that we can reproduce the problem and address it. We have not reached version 1.0 and as such the specification and/or the API interfaces may change.

Version

This is Tomahawk 0.7.0. Tomahawk follows semantic versioning.

Author

Marcus D. R. Klarqvist (mk819@cam.ac.uk)
Department of Genetics, University of Cambridge
Wellcome Sanger Institute

License

Tomahawk is licensed under MIT