rkweku / miRador

Plant miRNA identification tool that utilizes a variety of filters to validate predicted miRNAs
GNU General Public License v3.0
5 stars 2 forks source link

miRador - Plant miRNA identification tool

[License](https://www.apache.org/licenses/LICENSE-2.0.html

Description

miRador is a miRNA prediction tool developed to be the first of two compontents in a miRNA prediction and validation pipeline. While miRNAs can be predicted quite accurately with miRador alone, utilizing PARE data to predict and validate targets of novel miRNAs with sPARTA is the ultimate use of this package. Through runs of both programs, users will be able to provide evidence of cleavage at predicted targets of novel miRNAs.

Installation

miRador comes with a conda environment file which can be utilized to ensure all dependencies are satisfied, with versions that miRador was developed for, prior to running. Install either anaconda or miniconda, if you do not have it installed already on your system, following either of the links below: Anaconda: https://www.anaconda.com/products/distribution miniconda: https://docs.conda.io/en/latest/miniconda.html

Once conda is installed, a miRador environment can be created with the following command from within the miRador directory: conda env create -n mirador -f environment.yml

When this command is complete, you are ready to run miRador. All of the dependencies in the following section should now be satisfied.


In the event that there are issues installing via the provided .yml file, you may also try to create your own conda environment that will be miRador ready. First, create a conda environment:
conda create --name mirador

Next, activate the envioronment conda activate mirador

Set the conda channel priority to flexible as some environments will fail without this change. We will change it back to strict at the end. conda config --set channel_priority flexible

Then, install the following packages:
conda install python=3.7.13
conda install -c conda-forge ghostscript=9.54.0 perl=5.32.1 pypdf2=2.11.1 typing_extensions=4.5.0
conda install -c bioconda biopython=1.78 blast=2.13.0 bowtie=1.3.1 emboss=6.6.0 samtools=1.6 perl-io-string=1.08 viennarna=2.5.1

Finally, reset the channel priority back to strict conda config --set channel_priority strict


If you opt to not utilize conda, the dependencies for miRador can be downloaded separately and their executable paths can be initialized in miRador.ini

Dependencies

There are several dependencies of miRador, all of which are checked prior to running. If you opt to not utilize conda, you must download and install the following packages

Standalone packages

blast: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

bowtie: http://bowtie-bio.sourceforge.net/index.shtml

einverted: http://emboss.sourceforge.net/download/

ps2pdfwr: https://www.ghostscript.com/download.html

perl: https://www.perl.org/get.html

python3: https://www.python.org/downloads/

samtools: htslib.org/download/

ViennaRNA: https://www.tbi.univie.ac.at/RNA/#download

Python3 packages

PyPDF2: https://pypi.org/project/PyPDF2/

Upon python installation, Python packages may be installed via pip. If your system does not have pip installed already, or you do not have pip for your version of python, you can follow the installation instructions here: https://pip.pypa.io/en/stable/installing/.

If you do not have sudo privileges on your system, the --user option can be used when installing packages with pip. This will add the package to your local directory python directory and allow you to install any package without the need of sudo. To do this, open your terminal and simply type: pip3 install --user PackageName

Perl Modules

IO::String: https://metacpan.org/pod/IO::String

This can be installed via CPAN. See instructions here: https://docs.huihoo.com/livejournal/server/lj.install.perl_setup.modules.html

Configuration

Running miRador requires modifying a configuration file, initially named miRador.ini

Options

Option Default Description
genomeFilename The path and name of the genome file
runEInvertedFlag 1 Flag to be set if you wish to run EInverted
einvertedPresets Presets for einverted parameters to be set. low, medium, or high
gap 6 EInverted score for gaps
match 3 EInverted score for matches
mismatch -4 EInverted penalty score for mismatches
threshold 40 Einverted scoring threshold for identifying inverted repeats
maxRepLen 300 Maximum length that an inverted repeat can be
libFilenamesList List of library file names and their path for each. Should not be set if libFolder is set
libFolder The name of the folder holding all of the chopped.txt files. Should not be set if libFilenamesList is used
organism First letter of genus and first 2 letters of species
version CURRENT Version of miRBase to use for annotation
parallel Flag to utilize parallelization
nthrads Number of threads to utilize when running bowtie
bowtiePath Path of bowtie
bowtieBuildPath Path of bowtie-build
einvertedPath Path of einverted
RNAFoldPath Path of RNAFold
RNAPlotPath Path of RNAPlot
blastnPath Path of blastn
makeblastdbPath Path of makeblastdb
ps2pdfwrPath Path of ps2pdfwr
outputFolder Name of specific folder to write data to. If folder exists, data within will be overwritten

Bowtie

Bowtie has been configured to be run with the following arguments. While these can be changed, there is no simple option to do so within the ini file and thus changes would need to be done within library.py. With that said, the specific options are as follows:

Einverted

Einverted is utilized to predict an initial set of inverted repeats from the genome FASTA file. There are three presets which can be selected by the user, low, medium or high, which assigns pre-defined preset scores for matches and mismatches (utilize negative integer), penalty score for gaps (utilize positive integer), and the overall scoring threshold for reporting an inverted repeats. low: match = 3, mismatch = -4, gap = 6, threshold = 40 medium: match = 3, mismatch = -4, gap = 6, threshold = 45 high: match = 3, mismatch = -5, gap = 7, threshold = 50

A maxRepLen variable is editable as well, though we recommend this remain at its default value of 300.

While we generally recommend running miRador with a medium threshold, you are able to edit the individual scores and penalties yourself with the respective variables in the configuation file. If a preset is used, you may simply override the default values of any, or all scores, by placing a value yourself. For instance, you can use the medium presets, but set mismatch = -5 to run the pipeline with the medium scores, but the mismatch score will be overridden by your provided score.

Examples

When all options in the configuration file are set, running miRador is quite simple. From the miRador base directory, type:

python3 miRador miRador.ini

The configuation file that is included in this repository include the first chromosome of the Arabidopsis TAIR10 genome and two sRNA sequencing libraries which can be used to test that miRador will run properly. These files do need to be unzipped before running. This can be done quite simply with the following command: gunzip -r test

Output

miRador writes its output to a folder provided by you, the user, or to a folder with the data and time as a means of providing a unique folder name. The contents of the folder are: