This repository is intended to accompany our submission. For more information please refer to:
Agarwal V, Subtelny AO, Thiru P, Ulitsky I, Bartel DP. Predicting microRNA targeting efficacy in Drosophila. Genome Biology, 19:152. (2018).
The code is released to enhance reproducibility and as a suite of complementary tools to TargetScan in the hope it might help others in the future who work on new datasets.
These tools can be used in a variety of organisms to:
For a codebase to compute context scores for flies or other insects while incorporating 3' UTR isoform information, the code provided in TargetScan is recommended for use instead of this code.
If you find our code or our precomputed fly miRNA target predictions to be helpful for your work, please cite the paper above.
To better understand the methodological details for the evolutionary analyses, the following resources describe the original implementation [1], extension of parameters to worm and fly [2], and the re-implemented pipeline [3].
BranchLengthScoring.py script from MotifMap and newick 1.2
Installation of the PHAST package
Local installation of a mysql server. Replace this line in scripts of Fig2 with your server information:
$dbh = DBI->connect("dbi:mysql:database=username;host=HOSTNAME", "username", "password");
The following perl modules:
UCSC tools installation, including mafFrag and corresponding genome-wide multiz27way alignments
An LSF-based computing cluster that can submit jobs with the "bsub" command
Local installation of Matlab
Numerous R libraries listed individually in each R script, including R Bioconductor libraries
Not all code may work immediately because some pieces depend on computing environment, and not all intermediate files are provided because some are too large. For R code to work properly, please copy the contents of .Rprofile in this folder to your local .Rprofile. Exporting the allfxns.pm module to PERL5LIB might also be required.
Users are advised to read the code closely and modify commented pieces as appropriate to acquire desired output for your environment. For example, you will need to download all of the additional R library and Perl module dependencies for the code to work. This being said, if you find crucial files are missing, making the code unusable, or if you identify a major problem in the code, please raise a Github issue.
In each Figure's folder, change directories to it and then run the script "bash runme.sh". Please read this file first as it provides a general overview of relevant commands that were used sequentially to pre-process the data and generate the figures. This script should be able to run on the precomputed data provided in the folder to generate the figures.
Our naming convention is slightly different in the code than in the paper. In particular, the "HYBRIDSCORE" and "PLFOLD" features in the code are equivalent to "3p_energy" and "SA" features in the paper, respectively.