miRA a micro RNA identification tool

Software and source code for the paper: Conservation-independent identification of novel miRNAs M. Evers, A. Dueck, G. Meister and J. C. Engelmann

How to install:

If you are running a recent 64bit Linux, or OSX Lion or later you can grab the miRA binary here:

You will still need gnuplot and latex installed for your system. You will also need the Varna binary: Varna Please place it into the same Folder the miRA binary is in.

Compiling from source

For an easy setup simply download the latest bundeled release archive: miRA-1.2.0.tar.gz

unpack it, using for example

tar -xvf miRA-1.2.0.tar.gz

Make sure your system supplies the following dependecies for miRA:

a c compiler supporting the c99 standard
a java virtual machine version 1.6+ (optional)
a recent version of gnuplot (optional)
a recent version of latex (optional)

NOTE: miRA will work without the optional dependencies but will skip some reporting features (creating plots etc.) if they are not available.

Compile it for your system with:

cd miRA-1.2.0
./configure
make

Optionally run the unit tests on your system with:

make test

to check for correct behavior.

How to use:

The simplest and most common way to run miRA is to run the full Suite using the command:

./miRA full -c <configuration file> <input SAM file> <input FASTA file> <output directory>

Batching in version 1.2.0+ (beta)

If you are having memory problems use

./miRA batch -c <configuration file> <input SAM file> <input FASTA file> <output directory>

instead. It will split all files based on the chromosome (rname) and run miRA separately for each, only loading the essential parts into memory. This will reduce the memory footprint of miRA significantly, but will be slower.

You can test miRA with sample data provided in ./example/:

./miRA full -c example/sample_configuration.config example/sample_reads.sam example/sample_sequence.fasta example/sample_output/

You can also run only parts of miRA, it is seperated in 3 parts with distinct calls for each one:

Algorithm	Description	Command
Clustering	generates a list of main expression contigs based on alignment data	cluster
Folding	fold rna sequences and calculate secondary structure information	fold
Coverage Testing	coverage based verification and reporting of micro rna candidates	coverage

For additional help and usage information run:

./miRA <command> -h

where <command> is either "cluster" "fold" or "coverage"

Results:

After running miRA all result files will be created in the specified output directory. Depending on the configuration and the available external programs the following files will be created:

a full pdf report for every microRNA candidate (requires latex)
final_candidates.bed, a file containing location and properties of all candidates in the bed file format.
final_candidaes.json, a file containing location and properties of all candidates in the json file format.

Additional comments and known issues

SAM file format

It is important to make sure that the SAM file was generated by aligning reads to the same FASTA reference genome as the one that is used within miRA. In other words, all chromosome names found in the SAM file must have a matching entry in the FASTA reference genome.
miRA requires a SAM file that does not contain unmapped reads.

For converting and position-sorting a BAM to SAM file, run
```
samtools sort reads.bam sorted_reads
samtools view -h sorted_reads.bam > sorted_reads.sam
```
To remove unmapped reads from a BAM file, and output a position sorted BAM file, run
```
samtools view -b -F 4 all_reads.bam | samtools sort - > sorted_mapped_reads
```
To remove unmapped reads from a SAM file, and output a position-sorted SAM file, run
```
samtools view -hS -F 4 all_reads.sam | samtools sort - > sorted_mapped_reads
```

Memory requirements

If you are having this issue, consider updating to miRA 1.2.0+ and running miRA batch instead

miRA stores miRNA candidates that passed the folding and read coverage-based verification steps in memory until the generation of the final reports. The memory footprint of miRA therefore depends on the number of validated candidates.

Under certain conditions, miRA may crash with an error

ERROR: initialize_Lfold: argument must be greater 0

The error is almost always associated with an out-of-memory issue, which may be the result of e.g. running miRA on a desktop/notebook computer with little RAM on deep sequencing data resulting in many candidates, and/or using a set of relaxed, non-stringent filtering parameters.

mhuttner / miRA

readme