gsneha26 / SegAlign

A Scalable GPU-Based Whole Genome Aligner, published in SC20: https://doi.ieeecomputersociety.org/10.1109/SC41405.2020.00043
MIT License
66 stars 16 forks source link
aws-ec2 comparative-genomics cuda genome-aligner genome-alignments genomics gpu-acceleration lastz tbb whole-genome-alignment

License Build Status Published in SC20

A Scalable GPU System for Pairwise Whole Genome Alignments based on LASTZ's seed-filter-extend paradigm.

Table of Contents

Overview

The system has been tested on all the AWS G3 and P3 GPU instances with AMI Ubuntu Server 18.04 LTS (HVM), SSD Volume Type (ami-0fc20dd1da406780b (64-bit x86))

git clone https://github.com/gsneha26/SegAlign.git
export PROJECT_DIR=$PWD/SegAlign

Dependencies

The following dependencies are required by SegAlign:

The dependencies can be installed with the given script as follows, which might take a while (only installs the dependencies not present already). This script requires sudo to install most packages at the system level. Using the -c option skips CUDA installation [the CUDA toolkit binaries should be in $PATH for SegAlign].

cd $PROJECT_DIR
./scripts/installUbuntu.sh

How to run SegAlign

run_segalign target query [options]
run_segalign --help

Running a test

cd $PROJECT_DIR
mkdir test
cd test
wget https://hgdownload.soe.ucsc.edu/goldenPath/ce11/bigZips/ce11.2bit
wget https://hgdownload-test.gi.ucsc.edu/goldenPath/cb4/bigZips/cb4.2bit 
twoBitToFa ce11.2bit ce11.fa
twoBitToFa cb4.2bit cb4.fa
run_segalign ce11.fa cb4.fa --output=ce11.cb4.maf

How to run SegAlign repeat masker

run_segalign_repeat_masker sequence [options]
run_segalign_repeat_masker --help

Running a test

cd $PROJECT_DIR
mkdir test_rm
cd test_rm
wget https://hgdownload.soe.ucsc.edu/goldenPath/ce11/bigZips/ce11.2bit
twoBitToFa ce11.2bit ce11.fa
run_segalign_repeat_masker ce11.fa --output=ce11.seg

Running Docker Image

Running segalign

wget https://hgdownload.soe.ucsc.edu/goldenPath/ce11/bigZips/ce11.2bit
wget https://hgdownload-test.gi.ucsc.edu/goldenPath/cb4/bigZips/cb4.2bit 
sudo docker run -v $(pwd):/data -it gsneha/segalign:v0.1.2 \
                           twoBitToFa \
                           /data/ce11.2bit \
                           /data/ce11.fa
sudo docker run -v $(pwd):/data -it gsneha/segalign:v0.1.2 \
                           twoBitToFa \
                           /data/cb4.2bit \
                           /data/cb4.fa
sudo docker run --ipc=host --gpus all -v $(pwd):/data -it gsneha/segalign:v0.1.2 \
                           run_segalign \
                           /data/ce11.fa \
                           /data/cb4.fa \
                           --output=/data/ce11.cb4.maf

Running segalign_repeat_masker

wget https://hgdownload.soe.ucsc.edu/goldenPath/ce11/bigZips/ce11.2bit
sudo docker run -v $(pwd):/data -it gsneha/segalign:v0.1.2 \
                           twoBitToFa \
                           /data/ce11.2bit \
                           /data/ce11.fa
sudo docker run --ipc=host --gpus all -v $(pwd):/data -it gsneha/segalign:v0.1.2 \
                           run_segalign_repeat_masker \
                           /data/ce11.fa \
                           --output=/data/ce11.seg

Citing SegAlign

S. Goenka, Y. Turakhia, B. Paten and M. Horowitz, "SegAlign: A Scalable GPU-Based Whole Genome Aligner," in 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Atlanta, GA, US, 2020 pp. 540-552. doi: 10.1109/SC41405.2020.00043