ytchen0323 / cloud-scale-bwamem

Apache License 2.0
15 stars 9 forks source link

Cloud-Scale BWAMEM

Introduction

Cloud-scale BWAMEM (CS-BWAMEM) is an ultrafast and highly scalable aligner built on top of cloud infrastructures, including Spark and Hadoop distributed file system (HDFS). It leverages the abundant computing resources in a public or private cloud to fully exploit the parallelism obtained from the enormous number of reads. With CSBWAMEM, the pair-end whole-genome reads (30x) can be aligned within 80 minutes in a 25-node cluster with 300 cores.

Build and Install

  1. git clone git@github.com:ytchen0323/cloud-scale-bwamem.git
  2. cd cloud-scale-bwamem
  3. updated absolute path in two pom.xml: src/pom.xml and src/main/jni_fpga/pom.xml
    update:

    <systemPath>/curr/pengwei/github/cloud-scale-bwamem/target/cloud-scale-bwamem-0.2.2.jar</systemPath>

    to your path:

    <systemPath>/youpath/cloud-scale-bwamem/target/cloud-scale-bwamem-0.2.2.jar</systemPath>
  4. ./compile.pl

Upload FASTQ file(s) to HDFS

Use CS-BWAMEM aligner

Merge the output ADAM folders

Sort the output ADAM folders