ncbi / BAMscale

BAMscale is a one-step tool for either 1) quantifying and normalizing the coverage of peaks or 2) generated scaled BigWig files for easy visualization of commonly used DNA-seq capture based methods.
Other
68 stars 17 forks source link

BAMscale

Overview of BAMscale applications

BAMscale is a one-step tool to

1) quantify/normalize peak coverages from multiple BAM files 
2) Create scaled BigWig files for easy visualization

In the wiki pages we have more detailed tutorials for creating bigWig files and quantifying peaks

Update

20210510: Added support for BAM indexes "file.bam.bai" and "file.bai". Modified the bigwig writing to decrease file size: blocks of 25 bins are written, if non-empty. Modified the default bin size to 20bp

20200918: We are working on a heatmap plotting script in R to help visualization. The script (under development) is available in "R/Plot_heatmap" folder. Please use Rstudio or something similar, as you have to set the paths in the script. Meanwhile, we will work on developing a simple GUI for this.

20200423: The full manuscript has been published in Epigenetics & Chromatin

20200326: We added the visualization script app written in R. The scripts are available in the "R" sub-folder, with a detailed manual available in the wiki->visualization section

20190821: We recently added support for RNA-seq data as well to create coverage tracks. The new method enables accurate representations of exon-intron boundaries (splicing).

Manuals

In the wiki page we have more detailed tutorials for creating bigWig files and quantifying peaks:

  1. OK-seq and RFD Track Generation
  2. Quantifying Peaks
  3. Generating Scaled Coverage Tracks
  4. END-seq data
  5. Log2 Coverage Tracks for Replication Timing Data
  6. Smoothening Function for Coverage Tracks

We also added a few R scripts that might be helpful for basic visualizations:

  1. Creating density plots of quantified peaks

  2. Segmenting replication timing bigwigs

  3. Identifying OK-seq strand switches

For additional information, visit the wiki page.

For any other requests, or if you need help either open an issue, or feel free to email me: pongorlorinc@gmail.com

Usage for the impatient

These examples assume you have 4 processing threads, so we set '-t 4' for multithreading.

Peak quantification

BAMscale cov -t 4 --bed <BED_FILE> --bam <BAM1> --bam <BAM2> --bam <BAM3> ... --bam <BAMn>

Generating scaled coverage tracks

Creating scaled coverage tracks

BAMscale scale -t 4 --bam <BAM_FILE> [--bam <BAM2> .. --bam <BAMn>]

Creating stranded RNA-seq coverage tracks

BAMscale scale --operation strandrna --bam <RNAseq.bam>

Creating unstranded coverage from RNA-seq

BAMscale scale --operation rna --bam <RNAseq.bam>

Getting RFD score from OKseq data

BAMscale scale -t 4 --operation rfd --binsize 1000 --bam <BAM_FILE>

Processing replication timing and Repli-seq data

BAMscale scale -t 4 --operation reptime --bam <G1_phase.bam> --bam <S_phase.bam>

Creating stranded END-seq coverages

BAMscale scale -t 4 --operation endseq --bam <ENDseq.bam>

Reference

BAMscale can be found at bioRχiv (https://doi.org/10.1101/669275)

Bioconda instalation

BAMscale is available through Bioconda. Read the Bioconda Getting Started page for a detailed description on how to get Bioconda installed.

Once Bioconda is available you can install BAMscale using this command.

conda install bamscale

Docker

BAMscale docker image is available in quay.io/biocontainers/bamscale.

Pulling the image

docker pull quay.io/biocontainers/bamscale:0.0.5--ha85820d_0

Using the Docker image

Peak quantification with Docker

docker run -v `pwd`:/data bamscale BAMscale cov --bed <BED_FILE> --bam <BAM1> --bam <BAM2> --bam <BAM3> ... --bam <BAMn>

Generating scaled coverage tracks with Docker

docker run -v `pwd`:/data bamscale BAMscale scale --bam <BAM_FILE> [--bam <BAM2> .. --bam <BAMn>]

Creating a custom docker image

docker build -t bamscale https://raw.githubusercontent.com/pongorlorinc/BAMscale/master/Dockerfile

Local compilation

Requirements

We have a detailed installation for Linux and MAC (with homebrew) based systems or through conda. There is also a precompiled version for linux ready for usage available at the releases.

samtools

http://www.htslib.org/

libBigWig

Clone the libBigWig repository from GitHub: https://github.com/dpryan79/libBigWig

git clone https://github.com/dpryan79/libBigWig.git

Compile it and set the environment variables for BAMscale

cd libBigWig/
make
export LIBBIGWIG_DIR=`pwd`
export CPPFLAGS="-I $LIBBIGWIG_DIR"
export LDFLAGS="-L $LIBBIGWIG_DIR -Wl,-rpath,$LIBBIGWIG_DIR"

Optionally (and if you have permission), the libbigwig can also be installed

make install

In this case, the flags don't have to be set in the terminal.

Installation

After compiling the libBigWig library and samtools (if not already installed) clone the BAMscale from GitHub

git clone https://github.com/ncbi/BAMscale.git

and go to the BAMscale folder to compile the program:

cd BAMscale/
make

A bin folder will be created with the BAMscale executable.

Public Domain notice

National Center for Biotechnology Information.

This software is a "United States Government Work" under the terms of the United States Copyright Act. It was written as part of the authors' official duties as United States Government employees and thus cannot be copyrighted. This software is freely available to the public for use. The National Library of Medicine and the U.S. Government have not placed any restriction on its use or reproduction.

Although all reasonable efforts have been taken to ensure the accuracy and reliability of the software and data, the NLM and the U.S. Government do not and cannot warrant the performance or results that may be obtained by using this software or data. The NLM and the U.S. Government disclaim all warranties, express or implied, including warranties of performance, merchantability or fitness for any particular purpose.

Please cite NCBI in any work or product based on this material.