josiegleeson / BamSlam

Summarise and plot data from long-read ONT (direct RNA/cDNA) BAM files
MIT License
11 stars 4 forks source link
bioinformatics direct-rna-sequencing long-read-sequencing

BamSlam

This script was written for long-read Oxford Nanopore Technologies direct RNA/cDNA sequencing data. It uses BAM files produced after mapping with minimap2 to the reference transcriptome. It will output a summary file and plots from your aligned reads. This script was used in: https://doi.org/10.1093/nar/gkab1129.

Inputs:

To obtain a BAM file align your FASTQ/FASTA files to the transcriptome with minimap2.

minimap2 -ax map-ont --sam-hit-only transcriptome.fasta reads.fastq | samtools view -bh > aligned_reads.bam

If minimap2 is not run with --sam-hit-only you should remove unmapped reads prior to running BamSlam to avoid slowing it down. You can also input a BAM file output from NanoCount: https://github.com/a-slide/NanoCount.

Requirements:

R (tested with v4.2.2) R packages:

Usage:

Download/copy the Rscript from this repository and run it from the terminal as follows:

Rscript BamSlam.R [DATA_TYPE] [BAM_FILE] [OUT_PREFIX]
Rscript BamSlam.R rna undiff1_5Y.bam undiff1

DATA_TYPE, enter either: cdna, rna
BAM_FILE, a BAM file of alignments to the transcriptome
OUT_PREF, output file prefix

The script takes approximately 5 minutes per million reads.

Outputs:

Data

Plots

Details of summary file metrics