mshakya / piret

Pythonic PiReT
BSD 3-Clause "New" or "Revised" License
1 stars 4 forks source link
rna-seq

Build Status codecov install with bioconda

PiReT

Pipeline for Reference based Transcriptomics.

0.0 Installing PiReT

PiReT is installed using conda. So, please make sure that conda is installed and in your path. The installation can take upto 2 hours depending on your internet speed.

0.0.1 Install directly from bioconda

Coming soon!

0.0.2 Install dependencies separately using conda

For installation to work, conda must be installed. See here for instructions on how to install conda. Use following commands to create conda environments and then install corresponding packages. Also make sure that there is not an environment by the name of piret_env before attempting the installation. Delete the environment if its already present. I recommend that if you are python savvy, use this instruction as you will have control on every step of the installation, and if something fails, you wont have to start from the beginning.

git clone https://github.com/mshakya/piret.git
cd piret
conda create -n piret_env python=3.6.6 --yes
conda install -c bioconda faqcs -n piret_env --yes
conda install -c bioconda star hisat2 subread -n piret_env --yes
conda install -c bioconda subread stringtie -n piret_env --yes
conda install -c bioconda samtools bamtools bedtools -n piret_env --yes
conda install -c bioconda diamond=0.9.24 -n piret_env --yes
source activate piret_env
cd thirdparty
rm -rf eggnog-mapper
git clone https://github.com/mshakya/eggnog-mapper.git
cd eggnog-mapper
python download_eggnog_data.py -y
cd ..
cd ..
Rscript --no-init-file -e "if('BiocManager' %in% rownames(installed.packages()) == FALSE){install.packages('BiocManager',repos='https://cran.r-project.org')}";
# install optparse
Rscript --no-init-file -e "if('optparse' %in% rownames(installed.packages()) == FALSE){install.packages('optparse',repos='https://cran.r-project.org')}";
# install tidyverse
Rscript --no-init-file -e "if('tidyverse' %in% rownames(installed.packages()) == FALSE){install.packages('tidyverse',repos='https://cran.r-project.org')}";
# install R reshape2 packages
Rscript --no-init-file -e "if('reshape2' %in% rownames(installed.packages()) == FALSE){install.packages('reshape2',repos='https://cran.r-project.org')}";
# install R pheatmap packages
Rscript --no-init-file -e "if('pheatmap' %in% rownames(installed.packages()) == FALSE){install.packages('pheatmap',repos='https://cran.r-project.org')}";
# install R edgeR packages
Rscript --no-init-file -e "if('edgeR' %in% rownames(installed.packages()) == FALSE){BiocManager::install('edgeR')}";
# install R deseq2 packages
Rscript --no-init-file -e "if('DESeq2' %in% rownames(installed.packages()) == FALSE){BiocManager::install('DESeq2')}";
# install R pathview package
Rscript --no-init-file -e "if('pathview' %in% rownames(installed.packages()) == FALSE){BiocManager::install('pathview')}";
# install R gage package
Rscript --no-init-file -e "if('gage' %in% rownames(installed.packages()) == FALSE){BiocManager::install('gage')}";
# install R ballgown package
Rscript --no-init-file -e "if('ballgown' %in% rownames(installed.packages()) == FALSE){BiocManager::install('ballgown')}";
python setup.py install

0.0.3 Install using provided bash script

$ git clone https://github.com/mshakya/piret.git
$ cd piret
$ ./installer.sh <conda_env>

For example:

$ git clone https://github.com/mshakya/piret.git
$ cd piret
$ ./installer.sh piret_env

Make sure that the environment name (eg. piret_env) doesnt exist yet.

0.0.4 Install using pip

Coming soon!

1.0 Testing Installation

We have provided test data set to check if the installation was successful or not. fastq files can be found in tests/fastqs and corresponding reference fasta files are found in tests/data. To run the test, from within piret directory:

For running tests on eukaryote datasets:

$ cd piret
$ source activate piret_env

$LUIGI_CONFIG_PATH="/panfs/biopan01/scratch-311300/ecoli_usda/ecoli.cfg" bin/piret -c ecoli.cfg -d ecoli_piret -e exp_desn.txt
$LUIGI_CONFIG_PATH="full_path_to/piret/tests/test_euk.cfg" bin/piret -c tests/test_euk.cfg -d tests/test_euk -e tests/test_euk.txt

For running tests on prokarya datasets:

$LUIGI_CONFIG_PATH="full_path_to/piret/tests/test_prok.cfg" bin/piret -c tests/test_prok.cfg -d tests/test_prok -e tests/test_prok.txt

For running tests using both prokarya and eukarya datasets:

$LUIGI_CONFIG_PATH="full_path_to/piret/tests/test_both.cfg" bin/piret -c tests/test_prok.cfg -d tests/test_prok -e tests/test_both.txt

For getting KO ids for genes, PiReT uses emapper. The conda install of PiReT also includes emapper. However, its database need to be downloaed following instruction here. Briefly,

0.1 Dependencies

PiReT requires following dependencies, all of which should be installed and in the PATH.

0.1.0 Programming/Scripting languages

0.1.1 Installing dependencies

0.1.2 Third party softwares/packages

0.1.3 R packages

0.1.4 Python packages

2.0 Running PiReT

usage: piret [-h] -d WORKDIR -e EXPDSN -c CONFIG [-v]

piret

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit

required arguments:
  -d WORKDIR            working directory where all output files will be
                        processed and written (default: None)
  -e EXPDSN             tab delimited experimental design file
  -c CONFIG, --config CONFIG
                        luigi config file for setting parameters that control
                        each step, see github repo for an example (default:
                        None)

Example runs:

        piret -d <workdir> -e <design file>  -c <config file>

2.1 Experimental design file

An experimental design file consist of sample name (SampleID), full path to fastq files (Files), and different groups of your samples (Group). We recommend that you use a text editor like BBedit or TextWrangler to generate the tab delimited experimental design file. Exporting a tab delimited file directly from Excel tend to cause formatting problem. If possible, please avoid any special characters in sample names and group names.

For example:

  samp1, samp_1 : good name
  samp 1, samp.1: not a good name and will likely cause errors.

A sample of experimental design file can be found here.

2.2 Config file

All options are set in the config file.

3.0 OUTPUT

All the outputs will be within the working directory. The main output file is a concatenated JSON file called out.json.

4.0 Removing PiReT

For removal, since all dependencies that are not in your system are installed in PiReT, delete (rm -rf) PiReT folder is sufficient to uninstall the package. Before removing check if your project files are within PiReT directory.

5.0 Contributions

6.0 Citations:

If you use PiReT please cite following papers:

Copyright

Copyright (XXXX). Triad National Security, LLC. All rights reserved.

This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S. Department of Energy/National Nuclear Security Administration.

All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.

This is open source software; you can redistribute it and/or modify it under the terms of the GPLv3 License. If software is modified to produce derivative works, such modified software should be clearly marked, so as not to confuse it with the version available from LANL. Full text of the GPLv3 License can be found in the License file in the main development branch of the repository.