hyunhwan-jeong / SalmonTE

SalmonTE is an ultra-Fast and Scalable Quantification Pipeline of Transpose Element (TE) Abundances
GNU General Public License v3.0
81 stars 23 forks source link
ngs-analysis pipeline salmon transposable-elements

SalmonTE

Change Logs

Notice

What is SalmonTE?

SalmonTE is an ultra-Fast and Scalable Quantification Pipeline of Transpose Element (TE) Abundances from Next Generation Sequencing Data. It comes with Salmon which is a fast and accurate transcriptome quantification method. You can read the details of the pipeline and an example of real data study in my recent published paper in PSB 2018.

What I need to run SalmonTE? Why I have to use it?

Requirements & Installation

To use SalmonTE python and R must be installed before running it.

~* Note: Currently, running SalmonTE on MacOS has an issue, and we are try to fix it soon. Thus, we recommend to use linux environment to play it.~

For python:

Run following line in your console

pip3 install snakemake docopt pandas --user

For R: Run following lines in R console.

install.packages(c("tidyverse", "scales", "WriteXLS", "BiocManager"))
BiocManager::install("DESeq2", version = "3.8")
git clone https://github.com/hyunhwaj/SalmonTE
export PATH=$PATH:/PATH_OF_SALMON_TE/
source ~/.bashrc

Troubleshooting

Q. I am using SalmonTE on macOS and salmonTE fails to run on quant mode with error messages:

CalledProcessError in line xx of SOME_PATH:
Command ' set -euo pipefail;  ROOT_OF_SALMON_TE/SalmonTE/salmon/darwin/bin/salmon quant...' returned non-zero exit status 134.

A. You may have a problem to run salmon which is an essential tool for the pipeline. You may install Threading Building Blocks library to solve the problem. If you are using homebrew then please use below command:

brew install tbb

How to use it?

Usage:
    SalmonTE.py index [--ref_name=ref_name] (--input_fasta=fa_file) [--te_only]
    SalmonTE.py quant [--reference=genome] [--outpath=outpath] [--num_threads=numthreads] [--exprtype=exprtype] FILE...
    SalmonTE.py test [--inpath=inpath] [--outpath=outpath] [--tabletype=tabletype] [--figtype=figtype] [--analysis_type=analysis_type] [--conditions=conditions]
    SalmonTE.py (-h | --help)
    SalmonTE.py --version

Options:
    -h --help     Show this screen.
    --version     Show version.

An example of SalmonTE usage with command line

Running the quant mode to collect TE expressions

Parameters

After you put your parameters, you can put the directory which includes a list of FASTQ files,

SalmonTE.py quant --reference=hs example

Or, you can put the list of files like below.

SalmonTE.py quant --reference=hs example/CTRL_1_R1.fastq.gz example/CTRL_2_R1.fastq.gz          

Running test mode to perform statistical test

Before you run test mode, you should modify control.csv condition.csv file which is stored in the outpath. Here are examples of the proper modifications:

For the differential expression analysis, change the file as below. Important: The control samples has to be labeled as control. Other labels will cause errors.

SampleID,condition
FASTQ1,control
FASTQ2,control
FASTQ3,treatment
FASTQ4,treatment

For the regression analysis,

SampleID,condition
FASTQ1,1.5
FASTQ2,2.1
FASTQ3,3.8
FASTQ4,9.5

Once the conditions of every sample has been filled, we can run the test mode like the example commnad-line below:

SalmonTE.py test --inpath=SalmonTE_output --outpath=SalmonTE_statistical_test --tabletype=csv --figtype=png --analysis_type=DE --conditions=control,treatment

How to Cite?

@inbook{doi:10.1142/9789813235533_0016,
author = {Hyun-Hwan Jeong and Hari Krishna Yalamanchili and Caiwei Guo and Joshua M. Shulman and Zhandong Liu},
title = {An ultra-fast and scalable quantification pipeline for transposable elements from next generation sequencing data},
booktitle = {Biocomputing 2018},
chapter = {},
pages = {168-179},
doi = {10.1142/9789813235533_0016},
URL = {http://www.worldscientific.com/doi/abs/10.1142/9789813235533_0016},
eprint = {http://www.worldscientific.com/doi/pdf/10.1142/9789813235533_0016}
publisher = WORLD SCIENTIFIC
address = 
year = 2017
edition = 
}