robmaz / distmap

Sequence alignment on Hadoop
0 stars 1 forks source link

Distmap

DistMap is a wrapper around different mappers for distributed computation using Hadoop.

This repository contains a modified version for the DistMap pipeline derived from the original implementation (see the SourceForge DistMap project) that can be used on both Linux and MacOS servers and it is no longer tied to a specific version of Hadoop (although it was developed and tested on Hadoop 2.7.x versions).

This unified version finds a Hadoop configuration either via HADOOP_CONF_DIR or, possibly, via a command line argument, and uses the cluster configured there.

From version 3 onward, distmap is no longer distributed with mapper binaries. You need to provide binaries compatible with your cluster.

Requirements

Versioning

The master branch of this repository contains versions >= 3.0.0, and releases might be found after passing the alpha stage. Versioning from 3.0.0 will follow the Semantic Versioning conventions (SemVer).

We provide also the old 2.7.1 version of DistMap in two OS-specific branches ("macos" and "linux").

Citation

The original DistMap version is described in Pandey & Schlötterer 2013. If you use the software in this repository, please cite as:

Pandey RV, Schlötterer C (2013) DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster. PLOS ONE 8(8): e72614. https://doi.org/10.1371/journal.pone.0072614