mhuttner / miRA

GNU General Public License v2.0
5 stars 1 forks source link

miRA a micro RNA identification tool

Software and source code for the paper: Conservation-independent identification of novel miRNAs M. Evers, A. Dueck, G. Meister and J. C. Engelmann

How to install:

If you are running a recent 64bit Linux, or OSX Lion or later you can grab the miRA binary here:

You will still need gnuplot and latex installed for your system. You will also need the Varna binary: Varna Please place it into the same Folder the miRA binary is in.

Compiling from source

For an easy setup simply download the latest bundeled release archive: miRA-1.2.0.tar.gz

unpack it, using for example

tar -xvf miRA-1.2.0.tar.gz

Make sure your system supplies the following dependecies for miRA:

NOTE: miRA will work without the optional dependencies but will skip some reporting features (creating plots etc.) if they are not available.

Compile it for your system with:

cd miRA-1.2.0
./configure
make

Optionally run the unit tests on your system with:

make test

to check for correct behavior.

How to use:

The simplest and most common way to run miRA is to run the full Suite using the command:

./miRA full -c <configuration file> <input SAM file> <input FASTA file> <output directory>
Batching in version 1.2.0+ (beta)

If you are having memory problems use

./miRA batch -c <configuration file> <input SAM file> <input FASTA file> <output directory>

instead. It will split all files based on the chromosome (rname) and run miRA separately for each, only loading the essential parts into memory. This will reduce the memory footprint of miRA significantly, but will be slower.

You can test miRA with sample data provided in ./example/:

./miRA full -c example/sample_configuration.config example/sample_reads.sam example/sample_sequence.fasta example/sample_output/

You can also run only parts of miRA, it is seperated in 3 parts with distinct calls for each one:

Algorithm Description Command
Clustering generates a list of main expression contigs based on alignment data cluster
Folding fold rna sequences and calculate secondary structure information fold
Coverage Testing coverage based verification and reporting of micro rna candidates coverage

For additional help and usage information run:

./miRA <command> -h

where <command> is either "cluster" "fold" or "coverage"

Results:

After running miRA all result files will be created in the specified output directory. Depending on the configuration and the available external programs the following files will be created:

Additional comments and known issues

SAM file format

Memory requirements

If you are having this issue, consider updating to miRA 1.2.0+ and running miRA batch instead

miRA stores miRNA candidates that passed the folding and read coverage-based verification steps in memory until the generation of the final reports. The memory footprint of miRA therefore depends on the number of validated candidates.

Under certain conditions, miRA may crash with an error

ERROR: initialize_Lfold: argument must be greater 0

The error is almost always associated with an out-of-memory issue, which may be the result of e.g. running miRA on a desktop/notebook computer with little RAM on deep sequencing data resulting in many candidates, and/or using a set of relaxed, non-stringent filtering parameters.