jszavits / NEAR

Non-Equilibrium Analysis of Ribosome profiling data
GNU General Public License v3.0
2 stars 2 forks source link

NEAR

Non-Equilibrium Analysis of Ribosome profiling data

Description

Non-Equilibrium Analysis of Ribo-seq (NEAR) is a program that infers codon-dependent elongation rates relative to the initiation rate from A-site density measured by ribosome profiling. The program uses nonlinear optimisation to match the experimental ribosome density to the theoretical density predicted by a mathematical model for ribosome dynamics called the totally asymmetric simple exclusion process (TASEP) [1]. The progam is described in detail in [2].

Limitation

An important limitation of the program is that the input data, namely the A-site densities, must be properly normalised. Let $RFP_i$ denotes the number of ribosome footprints for a given mRNA species at codon $i$. Let $\rho_i$ denotes the ribosomal density at codon $i$, i.e. the probability that the ribosome's A-site is positioned at codon $i$. Then $RFPi/RFP{av}$ is equal to $\rhoi/\rho$, where $RFP{av}$ is the average number of reads for that mRNA species, $RFP=\sum{i}RFP{i}/(L-1)$, $L$ is the number of codons and $\rho$ is the ribosome density, i.e. the number of ribosomes on the mRNA divided by $L-1$, $\rho=\sum_i \rho_i/(L-1)$. Here we normalise by $L-1$ instead of $L$ because the start codon is included in the initiation mechanism in the model. Note that for this program to be useful, one must know the ribosome density $\rho$, which is not normally known in ribosome profiling experiments. It can be obtained, for example, by polysome profiling experiments, or it can be estimated using ribosome proofiling and RNA-seq.

Installation

The program is written in Fortran. Before compiling the program, NLopt library must be installed. NLopt library is available for Unix-like systems and Windows. After installing NLopt, the program can be compiled using gfortran.

Linux

On a Linux system, go to the directory source and set the path to the NLopt library:

export LD_LIBRARY_PATH=\<absolute path to NLopt>/lib64

where \<absolute path to NLopt> is the absolute path to the NLopt directory. Then compile the object file for the Mersenne Twister pseudorandom number generator:

gfortran -c mt19937.f90

Finally, compile the program named NEAR linking to the NLopt library:

gfortran NEAR.f90 -I\<absolute path to nlopt>/include -L\<absolute path to nlopt>/lib64 -lnlopt -lm -o NEAR mt19937.o

After compiling, move the executable file NEAR to the main directory.

Windows

Install Cygwin with Devel packages (cmake, make, gcc). Then follow the instructions from the nlopt website for Linux installation of nlopt. After the installation, go to the directory source and set the path to the NLopt library:

export LD_LIBRARY_PATH=/usr/local/lib

Then compile the object file for the Mersenne Twister pseudorandom number generator:

gfortran -c mt19937.f90

Finally, compile the program named NEAR linking to the NLopt library:

gfortran NEAR.f90 -I/usr/local/include -L/usr/local/lib -lnlopt -lm -o NEAR mt19937.o

After compiling, move the executable file NEAR to the main directory

Input data

NEAR.dat

The main parameters of the program are stored in the file NEAR.dat. The parameters are:

genelist.dat

The file genelist.dat contains names of genes to optimise. The format of the file is:

\<genename> \<average ribosome density>
\<genename> \<average ribosome density>
...

where \<genename> is a name for the gene (maximum 10 characters) and \<average ribosome density> is equal to the number of ribosomes per transcript length, where the transcript length is equal to the number of codons including STOP but excluding START. Note that by definition, \<average ribosome density> must be smaller or equal to 1/l, where l is the ribosome footprint length in codons. An input density that is larger than 1/l will cause the program to stop.

ribo-seq directory

ribo-seq directory should contain files with A-site footprint reads. Each file should be named A-site_\<genename>.dat, where \<genename> should be matched to the \<genename> in the file genelist.dat. The format of the file A-site_\<genename>.dat is:

\<codon> \<number of A-site footprint reads mapped to that codon>
\<codon> \<number of A-site footprint reads mapped to that codon>
...

Output data

All output data are written in the directory output. For each gene in the genelist.dat file, two output files are generated. One file are the results of the optimisation procedure, called \<genename>-results.dat. The other file is a log file, where details of the optimisation procedure are stored, for example if the optimisation was successful. Results are stored in the following format:

\<codon> \<included> \<rhoexp> \<rhomc0> \<rhomc> \<rhopsm> \<omega0> \<omega> \<current0> \<current> \<f1> \<f2> \<f3>

where

The main results are omega, rhomc and current. Other results are for the quality check of the optimisiation procedure. See [2] for more details.

References

  1. MacDonald CT, Gibbs JH, Pipkin AC. Kinetics of biopolymerization on nucleic acid templates. Biopolymers. 1968 Jan;6(1):1–25.
  2. Szavits-Nossan J, Ciandrini L. Inferring efficiency of translation initiation and elongation from ribosome profiling. Nucleic Acids Research. 2020 Sep;48(17):9478-9490.