holstegelab / TREAT

Tandem REpeat Annotation Toolkit (TREAT)
5 stars 0 forks source link

TREAT (Tandem REpeat Annotation Toolkit)

TREAT

TREAT in a nutshell

TREAT is a command line tool written in Python and R (for plotting) that can be used to work with tandem repeats and structural variants from long-read sequencing data. TREAT was developed specifically for long-read sequencing data. However, it can potentially be used with any sequencing data, including PacBio, Oxford Nanopore, and Illumina.
TREAT integrates a novel targeted local assembler, otter

How do you install TREAT

Depending on your system and your preferences, you can install TREAT and otter in various ways:

Independently from how you want to install TREAT and otter, you should clone this repository first by typing:
git clone https://github.com/holstegelab/TREAT.git

Manual installation

To install TREAT and otter manually, the steps are:

Install with Conda

To install TREAT using Conda, you can use the INSTALL.sh script. This will install a fresh version of python 3.6.15 in a new Conda environment called treat, along with the required packages. This script assumes you have Conda correctly installed in your system. If you are not familiar with Conda, please see here. You can run the script by typing:
cd TREAT/install/conda
source INSTALL.sh
This script will install:

Install with Docker

The easiest way to install TREAT is through the provided Docker image. You can do so with these steps:

What do you need to run TREAT

To run TREAT, you need:

What do you get as output

TREAT output consists of:

Toolkit

TREAT contains several tools to manipulate and analyze sequencing data. Three main analysis strategies are available in TREAT:

Assembly analysis

The assembly analysis take advantage of all sequencing reads aligning to the target region to perform local assembly of the target regions. Local assembly is done with otter. The procedure goes as it follows:

  1. extract the reads and relative sequences encompassing the target regions
  2. perform haplotype aware local assembly of the target regions in each sample
  3. performs motif finding at the individual assembly level using tandem repeat finder
  4. performs haplotype calling

Required parameters

Same as for the reads analysis.

Optional parameters

Reads analysis

The reads analysis take advantage of all sequencing reads aligning to the target regions to estimate genotypes. The procedure goes as it follows:

  1. extract the reads and relative sequences encompassing the target regions
  2. extract the corresponding sequence from the reference genome
  3. performs motif finding at the individual read level using tandem repeat finder
  4. performs haplotype calling

Required parameters

Optional parameters

TREAT analysis module

TREAT includes a module for downstream analysis of tandem repeats. This takes as input the VCF file generated by TREAT, and performs either a outlier analysis or a case-control analysis:

Optional parameters

TREAT plot module

TREAT includes a module for plotting tandem repeats across samples. This module can be invoked with TREAT.py plot -v input_vcf -r all. The plotting module will produce 2 plots:

Optional parameters

Additional folders in the repository

test_data folder

The test_data folder contains test data that can be use to assess the correct functioning of TREAT. A basic test can be the following for a reads analysis: TREAT.py reads -b test_data/example.bed -i test_data/example.bam -o test_output -r /path/to/reference_genome_hg38.fa
While for a assembly analysis, the following can be used:
TREAT.py assembly -b test_data/example.bed -i test_data/example.bam -o test_output_asm -r /path/to/reference_genome_hg38.fa -s otter

treat_application folder

The treat_application folder contains several sub-folders related to the different projects from our group in which TREAT was used. Each sub-folder contains additional scripts and downstream analysis scripts that were used. Please look at the README in the specific sub-folders for additional information.