tommyau / bamclipper

Remove primer sequence from BAM alignments by soft-clipping
MIT License
31 stars 10 forks source link

BAMClipper

Remove gene-specific primer sequences from SAM/BAM alignments of PCR amplicons by soft-clipping

Download latest version in a ZIP package

Dependencies, as tested on 64-bit CentOS 5.5

Usage

bamclipper.sh soft-clips gene-specific primers from BAM alignment file based on genomic coordinates of primer pairs in BEDPE format.

./bamclipper.sh -b BAM -p BEDPE [-n NTHREAD] [-s SAMTOOLS] [-g GNUPARALLEL] [-u UPSTREAM] [-d DOWNSTREAM]

Given a BAM file called NAME.bam, a new BAM file (NAME.primerclipped.bam) and its associated index (NAME.primerclipped.bam.bai) will be generated in the current working directory.

Notes: For the sake of performance and simplicity, soft-clipping is performed solely based on genomic coordinates without involving the underlying sequence. Reference sequence names and coordinates of BAM and BEDPE are assumed to be derived from identical reference sequences (e.g. hg19).

Required arguments

Options

Example using demo data

# Clip primers by BAMClipper
>./bamclipper.sh -b examples/SRR2075598.bam -p examples/trusight_myeloid.bedpe -n 4
# done!
# SRR2075598.primerclipped.bam and its index SRR2075598.primerclipped.bam.bai are generated.

# the new SRR2075598.primerclipped.bam should be identical to the provided example (compare checksum of alignments and ignore headers)
>samtools view SRR2075598.primerclipped.bam | md5sum
6a431457fd6e892646c17d1c3029c24e  -
>samtools view examples/SRR2075598.primerclipped.bam | md5sum
6a431457fd6e892646c17d1c3029c24e  -

# An example line of primer pair BEDPE file (an amplicon targeting ASXL1)
>grep 31022896 examples/trusight_myeloid.bedpe
chr20   31022896        31022921        chr20   31023096        31023123

Details of demo data:

Citation

Au CH, Ho DN, Kwong A, Chan TL and Ma ESK, 2017. BAMClipper: removing primers from alignments to minimize false-negative mutations in amplicon next-generation sequencing. Scientific Reports 7:1567 (doi:10.1038/s41598-017-01703-6)