ksahlin / ultra

Long-read splice alignment with high accuracy
60 stars 10 forks source link
alignment bioinformatics long-read-sequencing strobemers

uLTRA

install with bioconda Build Status

uLTRA is a tool for splice alignment of long transcriptomic reads to a genome, guided by a database of exon annotations. uLTRA is particularly accurate when aligning to small exons see some examples.

uLTRA is distributed as a python package supported on Linux / OSX with python (versions 3.4 or above).

Here is a YouTube video that describes uLTRA.

Table of Contents

INSTALLATION

Conda recipe

There is a bioconda recipe, docker image, and a singularity container of uLTRA created by sguizard. You can use, e.g., the bioconda recipe for an easy automated installation.

Alternative ways of installations are provided below.

Using the INSTALL.sh script

You can clone this repository and run the script INSTALL.sh as

git clone https://github.com/ksahlin/uLTRA.git --depth 1
cd uLTRA
./INSTALL.sh <install_directory>

The install script is tested in bash environment.

To run uLTRA, you need to activate the conda environment "ultra":

conda activate ultra

Without the INSTALL.sh script

You can also manually perform below steps for more control.

1. Create conda environment

Create a conda environment called ultra and activate it

conda create -n ultra python=3 pip 
conda activate ultra

2. Install uLTRA

pip install ultra-bioinformatics

3. Install third party tools

Install namfinder and minimap2 and place the generated binaries namfinder and minimap2 in your path.

4. Verify installation

You should now have 'uLTRA' installed; try it

uLTRA --help

Upon start/login to your server/computer you need to activate the conda environment "ultra" to run uLTRA as:

conda activate ultra

You can also download and use test data available in this repository here and run:

uLTRA pipeline [/your/full/path/to/test]/SIRV_genes.fasta  \
               /your/full/path/to/test/SIRV_genes_C_170612a.gtf  \
               [/your/full/path/to/test]/reads.fa outfolder/  [optional parameters]

Entirly from source

Make sure the below-listed dependencies are installed (installation links below). All below dependencies except namfinder can be installed as pip install X or through conda.

With these dependencies installed. Run

git clone https://github.com/ksahlin/uLTRA.git
cd uLTRA
./uLTRA

USAGE

uLTRA can be used with either PacBio Iso-Seq or ONT cDNA/dRNA reads.

Indexing

uLTRA index genome.fasta  /full/path/to/annotation.gtf  outfolder/  [parameters]

Important parameters:

  1. --disable_infer can speed up the indexing considerably, but it only works if you have the gene feature and transcript feature in your GTF file.

Aligning

For example

uLTRA align genome.fasta reads.[fa/fq] outfolder/  --ont --t 8   # ONT cDNA reads using 8 cores
uLTRA align genome.fasta reads.[fa/fq] outfolder/  --isoseq --t 8 # PacBio isoseq reads

Important parameters:

  1. --index [PATH]: You can set a custom location of where to get the index from using, otherwise, uLTRA will try to read the index from the outfolder/ by default.
  2. --prefix [PREFIX OF FILE]: The aligned reads will be written to outfolder/reads.sam unless --prefix is set. For example, --prefix sample_X will output the reads in outfolder/sample_X.sam.

Pipeline

Perform all the steps in one

uLTRA pipeline genome.fasta /full/path/to/annotation.gtf reads.fa outfolder/  [parameters]

Common errors

Not having a properly formatted GTF file. Before running uLTRA, notice that it reqires a properly formatted GTF file. If you have a GFF file or other annotation format, it is adviced to use AGAT for file conversion to GTF as many other conversion tools do not respect GTF format. For example, you can run AGAT as:

agat_convert_sp_gff2gtf.pl --gff annot.gff3 --gtf annot.gtf

CREDITS

Please cite

  1. Kristoffer Sahlin, Veli Mäkinen, Accurate spliced alignment of long RNA sequencing reads, Bioinformatics, Volume 37, Issue 24, 15 December 2021, Pages 4643–4651, https://doi.org/10.1093/bioinformatics/btab540

when using uLTRA. Please also cite minimap2 as uLTRA incorporates minimap2 for alignment of some genomic reads outside indexed regions. For example "We aligned reads to the genome using uLTRA [1], which incorporates minimap2 [CIT].".

LICENCE

GPL v3.0, see LICENSE.txt.