jehandzlik / Manatee

Manatee: detection and quantification of small non-coding RNAs from next-generation sequencing data
GNU General Public License v3.0
7 stars 3 forks source link

Manatee

Manatee version 1.3

What is Manatee?

Manatee is a tool for detection, quantification, and analysis of small ncRNAs 
from next-generation sequencing data.

DEPENDENCIES

  1. perl
  2. Set::IntervalTree: perl package
  3. SAMtools: need to be installed and added to your PATH
  4. Bowtie: executable file included in Manatee package, no installation required

INSTALLATION (Unix/Linux)

Install the required dependencies and execute Manatee main script as described in the usage section.

Set::IntervalTree

cpan

install Set::IntervalTree

PACKAGE FILES

The following compontents are included in the Manatee package.

bowtie-1.0.1       % directory with bowtie aligner

config             % configuration file

Manatee            % Perl core program for sRNA analysis

README.md          % this file

USAGE with configuration file

Syntax:

manatee -config <file> -i <file> -o <dir>

-config Path to configuration file.
-i Path to pre-processed FASTQ or FASTA file. Valid formats: .fa, .fasta, .fastq, .fq, .fa.gz, .fasta.gz, .fastq.gz, .fq.gz.
-o Path to directory where the output will be stored.

USAGE with input parameters

Syntax:

manatee [OPTIONS] -i <file> -o <dir> -index <ebwt> -genome <file> -annotation <file>

-i Path to pre-processed FASTQ or FASTA file. Valid formats: .fa, .fasta, .fastq, .fq, .fa.gz, .fasta.gz, .fastq.gz, .fq.gz.
-o Path to directory where the output will be stored.
-index Path and basename of the genome Bowtie index to be searched. The basename is the name of any of the index files up to but not including the final .1.ebwt/.rev.1.ebwt/etc.
-genome Path to genome FA or FASTA file.
-annotation Path to non coding annotation file. File should contain the following tab seperated elements: chromosome, strand, start loci, end loci, biotype, transcript id, transcript name.

OPTIONS

-t_index Path and basename of the transcriptome Bowtie index to be searched. The basename is the name of any of the index files up to but not including the final .1.ebwt/.rev.1.ebwt/etc. If left blank, in case of non existing index, Manatee will generate transcriptome index based on the provided non coding annotation and will store that index within the transcripts directory.
-cores Number of alignment cores (default: -cores 1).
-collapse Collapse reads with the same genomic sequences. This setting reduces significantly the execution time. Possible values yes/no (default: -collapse yes).
-mismatches Maximun number of mismatches in genomic alignments (default: mismatches=1).
-m Max of multimapping loci, -m in bowtie execution. The mapping algorithm will be applied only for reads with multi-mapped loci less or equal than m. Reads with multimapped loci that exceed the -m will be aligned against transcriptome (default: -m 50).
-s Strand specific mode of the algorithm (default -s yes).
-cd Minimum number of unannotated read abundances per cluster (default: -cd 5).
-cdi Clusters of unannotated reads will be merged if the distance between them is equal or less than cdi (default: -cdi 50).

OUTPUT

A successful run will produce the following three output files in the output directory

<*inputName>*_Manatee_counts.tsv

<*inputName>*_Manatee_clusters.tsv

<*inputName>*_Manatee_isomirs.tsv.

Depending on the input, <*inputName>*_Manatee_clusters.tsv might not be generated.

ADDITIONAL COMMENTS

FUNDING

The "ELIXIR-GR: Managing and Analysing Life Sciences Data (MIS: 5002780)". Project is co-financed by Greece and the European Union - European Regional Development Fund.