Manatee version 1.3
Manatee is a tool for detection, quantification, and analysis of small ncRNAs
from next-generation sequencing data.
Install the required dependencies and execute Manatee main script as described in the usage section.
cpan
install Set::IntervalTree
The following compontents are included in the Manatee package.
bowtie-1.0.1 % directory with bowtie aligner
config % configuration file
Manatee % Perl core program for sRNA analysis
README.md % this file
manatee -config <file> -i <file> -o <dir>
-config | Path to configuration file. |
-i | Path to pre-processed FASTQ or FASTA file. Valid formats: .fa, .fasta, .fastq, .fq, .fa.gz, .fasta.gz, .fastq.gz, .fq.gz. |
-o | Path to directory where the output will be stored. |
manatee [OPTIONS] -i <file> -o <dir> -index <ebwt> -genome <file> -annotation <file>
-i | Path to pre-processed FASTQ or FASTA file. Valid formats: .fa, .fasta, .fastq, .fq, .fa.gz, .fasta.gz, .fastq.gz, .fq.gz. |
-o | Path to directory where the output will be stored. |
-index | Path and basename of the genome Bowtie index to be searched. The basename is the name of any of the index files up to but not including the final .1.ebwt/.rev.1.ebwt/etc. |
-genome | Path to genome FA or FASTA file. |
-annotation | Path to non coding annotation file. File should contain the following tab seperated elements: chromosome, strand, start loci, end loci, biotype, transcript id, transcript name. |
-t_index | Path and basename of the transcriptome Bowtie index to be searched. The basename is the name of any of the index files up to but not including the final .1.ebwt/.rev.1.ebwt/etc. If left blank, in case of non existing index, Manatee will generate transcriptome index based on the provided non coding annotation and will store that index within the transcripts directory. |
-cores | Number of alignment cores (default: -cores 1). |
-collapse | Collapse reads with the same genomic sequences. This setting reduces significantly the execution time. Possible values yes/no (default: -collapse yes). |
-mismatches | Maximun number of mismatches in genomic alignments (default: mismatches=1). |
-m | Max of multimapping loci, -m in bowtie execution. The mapping algorithm will be applied only for reads with multi-mapped loci less or equal than m. Reads with multimapped loci that exceed the -m will be aligned against transcriptome (default: -m 50). |
-s | Strand specific mode of the algorithm (default -s yes). |
-cd | Minimum number of unannotated read abundances per cluster (default: -cd 5). |
-cdi | Clusters of unannotated reads will be merged if the distance between them is equal or less than cdi (default: -cdi 50). |
A successful run will produce the following three output files in the output directory
<*inputName>*_Manatee_counts.tsv
<*inputName>*_Manatee_clusters.tsv
<*inputName>*_Manatee_isomirs.tsv.
Depending on the input, <*inputName>*_Manatee_clusters.tsv might not be generated.
The "ELIXIR-GR: Managing and Analysing Life Sciences Data (MIS: 5002780)". Project is co-financed by Greece and the European Union - European Regional Development Fund.