jbkerry / oligo

Pipelines for Capture-C oligo design
https://oligo.readthedocs.io
GNU General Public License v3.0
4 stars 0 forks source link

#################### Capture Oligo Design ####################

For full documentation, see http://oligo.readthedocs.io

oligo provides functionality to automate oligo (bait) design for DNA capture experiments, providing the user with details about capture efficiency of the sequences generated.

.. highlights:: Since version 0.2, oligo provides a more efficient installation process and provides the user with the option of running the tool via a Docker image. This version is not compatible with running the scripts in versions prior 0.2 due to the new installation process. For the older format of oligo, please use version 0.1.2. Documentation specific for v0.1.2 can be found here https://oligo.readthedocs.io/en/0.1.2/.

.. contents:: Table of Contents :depth: 2

Installation

Local

oligo ^^^^^ To install oligo on your local machine, it is recommended to first create a new Python environment ( >=3.8 ) using your preferred method e.g. conda, pyenv etc. Once the new environment is activated, install oligo via pip. Note that the package is called oligo but has a published name of oligo-capture to ensure it had a unique name in the public repository.

.. code-block:: bash

$ pip install oligo-capture

Ensure it has installed correctly by running the following command and verifying that you see the installed version in the standard output

.. code-block:: bash

$ python -m oligo --version oligo v0.2

Before running the full oligo pipeline you will need to install RepeatMasker and either BLAT or STAR, depending on how you intend to run oligo (see below for more details)

Dependencies ^^^^^^^^^^^^

.. highlights::

Since running oligo locally requires a number of third-party software to be installed, users might find it less hassle to run oligo via the Docker image, if this option is available to them. See the Docker_ section for more details.

RepeatMasker oligo uses RepeatMasker (RM) to determine if oligos contain simple sequence repeats as these can reduce the efficiency of the oligo for targeted capture. Follow the instructions on RepeatMasker home page <http://www.repeatmasker.org/RepeatMasker/>_ for installing RM on your local system. The most recent version of RM that oligo has been tested with is v4.1.5. As detailed on the RM home page, its installation depends on a Sequence Search Engine (it is recommended to use HMMER for oligo) and Tandem Repeat Finder (TRF). For the Repeat Database, RM ships with the curated set of Dfam 3.7 Transposable Elements which is sufficient but users are free to use the full set if required; further instructions are on the RM home page.

.. highlights::

The RM home page mentions that it requires the Python library h5py, however this is listed as a dependency of the oligo package so will already be installed in your Python environment from when you ran the pip install step.

BLAT oligo uses the BLAST-Like Alignment Tool (BLAT) to determine any off-target binding sites of an oligo within the genome, in addition to its intended binding site. An oligo that binds to multiple regions will have a reduced score since it will perform a less-specific capture. BLAT <https://genome.ucsc.edu/FAQ/FAQblat.html> executables can be found by going to <http://hgdownload.soe.ucsc.edu/admin/exe/> and locating the BLAT directory in the for your systems archtecture. For example, for Linux.x86 architecture, rsync should be used to get the BLAT executables on your system:

.. code-block:: bash

$ rsync -aP rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/blat/ ./

See <http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/blat/>_ for more details.

.. highlights::

Please note, that as specified on its website, "the Blat source and executables are freely available for academic, nonprofit and personal use. Commercial licensing information is available on the Kent Informatics website (http://www.kentinformatics.com/)". Ensure that you are adhering to this licence agreement if you are using oligo with --blat enabled.

STAR As an alternative to BLAT, oligo allows users to use the Spliced Transcripts Alignment to a Reference (STAR) alignment program for increased speed when determining multiple binding events for oligo sequences. BLAT is more widely used to detect off-target binding events, however BLAT can be particulary slow for large designs, especially for the human reference genomes. STAR’s exceptional speed is better suited for designs with >1000 oligos. If you think you would prefer to use STAR instead, visit the STAR GitHub page <https://github.com/alexdobin/STAR>_ for instructions on how to install it.

Docker

Due to oligo requiring various third-party software, it can instead be run from a pre-built Docker image that has everything needed already installed. This should make the setup much easier for users as well as reducing the need to install lots of software on their local machines. Running via Docker is obviously less flexible in terms of the configuration of the third-party software but has been built with the most common use cases in mind and reducing the image size to as small as possible, without losing any of requirements oligo uses from the third-party software. Currently the Docker image only supports running oligo with BLAT, not STAR.

First pull the latest oligo image onto your local machine:

.. code-block:: bash

$ docker pull jbkerry/oligo:latest

You can also specify a version if needed. The Docker image versions match the oligo package version i.e., jbkerry/oligo:0.2 will be running oligo v0.2:

.. code-block:: bash

$ docker pull jbkerry/oligo:0.2

The docker entrypoint is set to run oligo with the config file already set up to point to the install executables of BLAT and RepeatMasker so users can run the image, starting with the oligo subcommand that is required (see the Usage_ section for more details).

In order for your BED file and reference genome FASTA files to be accessible to the Docker container, your local directories with these files must be mounted into the Docker container using the -v option when you call the docker run command on the image. The Docker image runs the oligo command from a top-level directory called /results and stores all of its output files here. In order to see them on your local machine after the run has finished, you will need to mount a local directory where you want to store the results, to this /results directory. Again, this mount with the -v option needs to be done at the image runtime.

For example, running the oligo command with the off-target subcommand might look something like this with the Docker image:

.. code-block:: bash

$ docker run -v /local/path/oligo_results:/results -v /local/human:/genome jbkerry/oligo:latest off-target -f /genome/genome.fa -g hg38 -b ./off_target_sites.bed -o 100 -t 50 -m 300 --blat

With this command, the output results will appear in the example local directory of /local/path/oligo_results. Note that this example command is using the Linux filepath format (i.e., /.../) for the local directories. On Windows (not using WSL) the mounting would look like this:

.. code-block::

$ docker run -v C:\local\path\oligo_results:/results -v C:\local\human:/genome jbkerry/oligo:latest off-target -f /genome/genome.fa -g hg38 -b ./off_target_sites.bed -o 100 -t 50 -m 300 --blat

Because the docker image is built on top of a Debian Linux image, the paths that local directories get mounted to in the container (i.e. the right-hand side of the : for the -v options) still need to use the Linux filepath format, even when running from a Windows machine.

Installation specifics ^^^^^^^^^^^^^^^^^^^^^^ Below is a list of the versions and alterations that have been made to the standard installs of third-party software for the oligo Docker image:

The HMM matrices were generated with the following two commands, run from with the top-level RepeatMasker directory (famdb.py comes bundled with the latest versions of RepeatMasker):

.. code-block:: bash

$ ./famdb.py -i Libraries/RepeatMaskerLib.h5 families --format hmm 'Homo sapiens' --include-class-in-name >humans.hmm $ ./famdb.py -i Libraries/RepeatMaskerLib.h5 families --format hmm 'Mus musculus' --include-class-in-name >mouse.hmm

The Dockerfile in the oligo GitHub repository can be referenced for details of the how the Docker image was built. Some reference data files that get copied into the image at build time are not present in the repository but can be provided to the user if needed.

Usage

oligo can be run with one of three subcommands

These subcommands all generate oligo sequences, based on different underlying behaviours. Methods from the Tools <http://oligo.rtfd.io/en/latest/tools_class.html>_ class in the oligo.tools module are then used to check the off-target binding and repeat content of the oligos. This information is output in a file called oligo_info.txt; oligo sequences are written to a FASTA file called oligo_seqs.fa

Example

The subcommand follows the oligo command and options for the subcommand are then specified afterwards. Note that the config text file, specifying paths to the installed RepeatMasker, BLAT and STAR directories (see Dependencies) must be specified between oligo and the chosen subcommand, with the -cfg argument. An example config file can be viewed here <https://github.com/jbkerry/oligo/blob/main/config.txt>. Below, is an example using the off-target subcommand:

.. code-block:: bash

$ python -m oligo -cfg ./config.txt off-target -f /path/to/human/genome.fa -g hg38 -b ./off_target_sites.bed -o 100 -t 50 -m 300 --blat

Command-line help

The oligo options and subcommands can be viewed with the --help flag from the command-line:

.. code-block:: bash

$ python -m oligo --help Usage: python -m oligo [OPTIONS] COMMAND [ARGS]...

Options:
  --version
  -cfg, --config PATH  [required]
  --help               Show this message and exit.

Commands:
  capture
  off-target
  tiled

For options specific to each subcommand, run oligo with the desired subcommand, followed by --help. Note that you will need to provide the -cfg flag for this to work:

.. code-block:: bash

$ python -m oligo -cfg ./config.txt off-target --help Usage: python -m oligo off-target [OPTIONS]

Options:
  -f, --fasta PATH                Path to reference genome fasta.  [required]
  (etc.)

For full documentation on each of the subcommands and how each of their options affects the run, see the full documentation at http://oligo.readthedocs.io. There you will find a detailed documentation page for each of the subcommands, as well as information regarding the oligo output file and how best to make use of it.