lauringlab / variant_pipeline

Work on the variant_pipeline and initial r analysis used in calling variants from NGS data
Apache License 2.0
8 stars 13 forks source link

Lauring Variant Pipeline

Nominates candidate variants by comparing the sequences in a test sample to those found in a plasmid control. The Pipeline runs as one phase which takes in fastq files and outputs putative variants as well as all base call above a set frequency. It is then up to the user to filter the putative variants based on the characteristics provided.

Directory list

bin/variantPipeline.py

This script is a thin python wrapper that takes in a bpipe pipeline, input files, output directory and an options yaml. Whenever this is launched, the bpipe scripts are copied from the scripts directory and stored in the output directory as a log of what was run. the output directory will be made if it doesn't exist.

Usage: python variantPipeline.py -h

See the tutorial for more information.

*NOTE: Your fasta is used in the variant calling step and needs to end in .fa*

Outputs

There are 3 main pipelines that can be run. All of the stages for the pipelines are held in ./scripts/variantPipeline.bpipe.stages.groovy

Basic alinging scripts/aligning_pipeline.groovy

DeepSNV pipeline scripts/deepsnv_pipeline.groovy

Runing this pipeline after the one above is the same as the old single pipeline.

python pipeline to call all variants and sequencing errors scripts/python_pipeline.groovy

Dependencies

Note : Flux is the name of the computing core used by our lab at the Univeristy of Michigan. Some of the directions may be specific to those working on this platform

The pipeline comes with many of the required programs (bpipe and pycard); however, bowtie2, samtools and certain R and python libraries are required by the variant calling.

To run these all pipelines you must have the java developer kit installed. It can be installed from here. If bpipe doesn't run this is the first place to start.

All the other depedencies, except R and the R packages, are handled by conda. Install conda by following the tutorial here.

We can install the conda environment with the following command (run from the variant_pipeline/ directory)

conda env create -f scripts/environment.yml

We have to activate the environment before running the commands below.

conda activate variant-pipeline

On flux we can achieve an equivalent environment by loading the following modules

module load muscle
module load bowtie2
module load python-anaconda2/201704
module load fastqc
module load R

The R modules are managed by packrat. I am using R 3.5.0. From the main directory run

R
packrat::restore()

to download the needed dependencies. They should be placed the packrat/lib directory. This is important since the R script will look for them there. You may need to install packrat first if you don't have it.

Adapted and developed by JT McCrone based on work done by Chris Gates/Peter Ulintz UM BCRCF Bioinformatics Core