lncLOOM / LncLOOMv2

The Python 3 version of LncLOOM
1 stars 0 forks source link

LncLOOMv2.0

Ulitsky Lab / Weizmann Institute of Science

Version 2.0

Python 3 compatible

Release Date: February 2021

Developed and maintained by Caroline Ross and Igor Ulitsky

About lncLOOM

LncLOOM is a graph-based framework that uses integer programming to identify combinations of short motifs that are deeply conserved in rapidly evolving sequences. This version is implemented in Python 3 and is supported on Linux/Unix-based systems.

Getting Started (Install with pip)

*If you're unable to install LncLOOM with pip, please see examples in the Troubleshooting section at the bottom of this file on how to run LncLOOM from within the LncLOOMv2 directory.

  1. Download the lncLOOM repository.

    git clone https://github.com/lncLOOM/LncLOOMv2.git

  2. Install Python 3 (If needed) LncLOOM is supported on Linux/Unix-based systems. It is run via the command line. Python 3 is available for download at: here

  3. Install LncLOOM as an executable using pip or pip3.

    • Firstly ensure that pip is installed:

      sudo apt-get install python-pip or sudo apt-get install python3-pip

      if you are using macOS:

      sudo easy_install pip

    • Install LncLOOMv2 using pip (the following command ensures that it is setup to run with python3)

      python3 -m pip install --user -e ./LncLOOMv2

    • Alternatively, install using pip3

      pip3 install --user -e ./LncLOOMv2

  4. Add LncLOOM to your $PATH.

    pip creates a LncLOOM executable. Depending on your OS, this executable will be saved to certain directory, which needs to be added to your $PATH:

    • For Linux systems, LncLOOM will be saved in ~/.local/bin/ (or /home//.local/bin)

      export PATH="~/.local/bin:$PATH"

    • For macOS, LncLOOM will be saved in /Users//Library/Python//bin eg: /Users/Mac/Library/Python/3.6/bin

      export PATH="/Users/Mac/Library/Python/3.6/bin:$PATH"

    *Note: the above paths to LncLOOM may vary depending on your directories

  5. Install LncLOOM dependencies

    LncLOOM requires several packages to be installed. Most of these would have already been installed when you installed LncLOOMv2 (see last section of this page for a list of these packages) However the following additional programs must be installed individually:

    • BLASTN

    sudo apt-get install ncbi-blast+

    • Mafft
      To download and setup follow the steps given here
  6. Set paths to genome files and eCLIP data that LncLOOM will use for annotations and generation of a custom track for the UCSC Genome Browser

    • In the LncLOOMv2/LncLOOMv2/src/ directory there is a file called for_eclip_annotation.txt. This file tells LncLOOM where to find data needed for annotations. The file looks as follows:

         Query Layer: 1
         Blat: src/hg19.fa
         eCLIP: Data 1: src/eCLIP_narrowPeakApr2019/

      Currently the paths have been set to use data that is located in the LncLOOMv2/LncLOOMv2/src/ folder. However, these files are too large to be stored on GitHub and need to be downloaded from hg19.fa and eCLIP_narrowPeakApr2019

      To use this data, download and extract the files into the LncLOOMv2/LncLOOMv2/src/ folder:

      tar xvzf eCLIP_narrowPeakApr2019.tar.gz

      mv eCLIP_narrowPeakApr2019 LncLOOMv2/LncLOOMv2/src/

      tar xvzf hg19.tgz

      mv hg19.fa LncLOOMv2/LncLOOMv2/src/

      The eCLIP data consists of BigBed files retrieved from ENCODE in 2019. If you choose to run the eCLIP annotation option with Blat, by default your query sequence must have at least 95% similarity to the target genome for a match to be considered. This can be adjusted using the --blatID paramater.

      Alternatively, if you have your own data you can update these paths in for_eclip_annotation.txt to the full paths to your genome file and eCLIP data. For example:

         Query Layer: 1
         Blat: /home/MySpace/MyGenomeFiles/hg19.fa
         eCLIP: Data 1: /home/MySpace/My_eCLIP_Data/

      To annotate motifs found with eCLIP data specified in for_eclip_annotation.txt use the --eclip option when running LncLOOM.

      • Explanation:

      • The query layer specifies which sequence you would like annotate. By default this will be the top sequence (layer 1) in your input file. Note that LncLOOM always sets the first sequence in your file to the top sequence, but may reorder the other sequences to improve motif discovery. To retain your original order of sequences use the --inputorder command is used.

      • Blat: specifies the full path to a genome file

      • eCLIP: specifies the full paths to eCLIP data. Note: you can add multiple paths as follows:

        Query Layer: 1
        Blat: <specify path to genome fasta file>
        eCLIP: Data 1: <specify path to eCLIP data>
        eCLIP: Data 2: <specify path to eCLIP data>
        eCLIP: Data 3: <specify path to eCLIP data>

        Alternatively you can upload a bedfile instead of running Blat

        Query Layer: 1
        Bed: <specify path to bedfile>
        eCLIP: Data 1: <specify path to eCLIP data>

In the LncLOOMv2/LncLOOMv2/src/ directory there is also a file: for_track_output.txt. Similar to the for_eclip_annotation.txt, this file tells LncLOOM where to find a genome file so that a custom track of conserved motifs can be generated. The paths have been initiated to find hg19.fa in LncLOOMv2/LncLOOMv2/src. Note that you can specify a different layer and genome to what is specified in for_eclip_annotation.txt.

     To generate a custom track use the `--track` option.
     ```
     Query Layer: 1 
     Blat: <specify path to genome fasta file>
     ```
  1. Make sure that the blat executable has the correct executable permissions:

    chmod 755 LncLOOMv2/LncLOOMv2/src/blat

  2. Make sure that the blat executable is compatible with your machine type:

    The blat executable in the src folder is compatible with linux.x86_64 machines. If needed download the correct executable for your machine type from Genome Browser software, and replace the current blat executable.

  3. OPTIONAL: Install the Gurobi Solver - although not required it allows much faster performance on larger datasets There are two possible ways to install Gurobi:

    • Option 1: Install through Anaconda.
      • If needed download and install Anaconda
      • Add the gurobi channel to the Ananconda search list
        conda config --add channels http://conda.anaconda.org/gurobi
      • install gurobi
        conda install gurobi
      • Initialise Gurobi License
      • A free academic license can be obtained from: [https://www.gurobi.com/downloads/end-user-license-agreement-academic/]
      • First register an account
      • Verify your account from a link sent to your email, this will take you to a home page
      • Click on Licenses (you may be askd to login again)
      • On the top Navigation bar, select Academia... and Licenses
      • Click on a link: Free Academic License page,this will issue you a license
      • Scroll to the bottom of the page to Installation: you will see a command similar to this, but specific to your key: grbgetkey YOUR_KEY
      • Copy and run this command in your terminal
    • Option 2:

      • Download Gurobi

      • Once you have downloaded your version of Gurobi copy the folder to /opt

      sudo cp -r gurobi9.0.2_linux64.tar.gz /opt

      • Extract the file into /opt
      cd /opt/
      sudo tar xvfz gurobi9.0.2_linux64.tar.gz
      • Set environment variables

      Users of the bash shell should add the following lines to their .bashrc files:

      export GUROBI_HOME="/opt/gurobi902/linux64"
      export PATH="${PATH}:${GUROBI_HOME}/bin"
      export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${GUROBI_HOME}/lib"

      Users of the csh shell should add the following lines to their .cshrc files:

      setenv GUROBI_HOME /opt/gurobi902/linux64
      setenv PATH ${PATH}:${GUROBI_HOME}/bin
      setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:${GUROBI_HOME}/lib

      If LD_LIBRARY_PATH is not already set, use the following instead:

      export LD_LIBRARY_PATH="${GUROBI_HOME}/lib" or setenv LD_LIBRARY_PATH ${GUROBI_HOME}/lib

      • Initialise Gurobi License
      • A free academic license can be obtained from: https://www.gurobi.com/downloads/end-user-license-agreement-academic/
      • First register an account
      • Verify your account from a link sent to your email, this will take you to a home page
      • Click on Licenses (you may be askd to login again)
      • On the top Navigation bar, select Academia... and Licenses
      • Click on a link: Free Academic License page,this will issue you a license
      • Scroll to the bottom of the page to Installation: you will a command similar to this, but specific to your key: grbgetkey YOUR_KEY
      • copy and run this command in your terminal

Running LncLOOM

Definitions and troubleshooting tips for calculating motif significance are also given in Definitions.html:

P(i): Probability of finding the exact motif, at the same depth, in a random set of sequences that have the same percentage identities as the input sequences

E(i): Probability of finding any combination of the same number of motifs of the same length, or longer, at the same depth, in a random set of sequences that have the same percentage identities as the input sequences

P(r): Probability of finding the exact motif, at the same depth, in a random set of sequences that have the dinucleotide composition as the input sequences

E(r): Probability of finding any combination of the same number of motifs of the same length, or longer, at the same depth, in a random set of sequences that have the same dinucleotide composition as the input sequences

All LncLOOM Options

LncLOOM has several options:

Required arguments

Optional arguments

BOOLEAN OPTIONS

The following are boolean options (all defaults are false, by simple typing --option, it will be set to true)

Command Line Examples

More examples of commands:

   LncLOOM --fasta Chaserr.fas --name Chaserr  --startw 10  --solver gurobi --iterations 100 --multiprocess 10 --eclip --targetscan --track
   LncLOOM --fasta Chaserr.fas --name Chaserr  --startw 10  --solver gurobi --iterations 100 --eclip --targetscan --track --tol5 0.1 --tol3 0.5
   LncLOOM --fasta Chaserr.fas --pname Chaserr  --startw 10  --solver gurobi --eclip --targetscan --inputorder`
   LncLOOM --fasta Chaserr.fas --pname Chaserr  --startw 10  --solver gurobi --eclip --targetscan --inputorder --newcolours`

Troubleshooting (If installation with pip was not successful)