nanoporetech / Pore-C-Snakemake

Other
33 stars 15 forks source link

.


We have a new bioinformatic resource that largely replaces the functionality of this project! See our new repository here: https://github.com/epi2me-labs/wf-pore-c.

This repository is now unsupported and we do not recommend its use. Please contact Oxford Nanopore: support@nanoporetech.com for help with your application if it is not possible to upgrade to our new resources, or we are missing key features.


1. Introduction

Overview:

This pipeline manages a pore-c workflow starting from raw fastq files and converting them to standard file formats for use by downstream tools. The steps involved are:

2. Getting started

In most cases, it is best to pre-install conda before starting. All other dependencies will be installed automatically when running the pipeline for the first time.

Requirements:

This pipeline requires a computer running Linux (Ubuntu 16). >64Gb of memory would be recommended. The pipeline has been tested on minimal server installs of these operating systems.

Most software dependencies are managed using conda. To install conda, please install miniconda3 and refer to installation instructions. You will need to accept the license agreement during installation and we recommend that you allow the Conda installer to prepend its path to your .bashrc file when asked.

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Check if the conda has successfully installed

conda -h

If conda has installed correctly, you should see the follow output. If you do not see the below output, you may need to close and reopen your terminal.

$ conda
usage: conda [-h] [-V] command ...

conda is a tool for managing and deploying applications, environments and packages.

Options:

positional arguments:
  command
    clean        Remove unused packages and caches.
    config       Modify configuration values in .condarc. This is modeled
                 after the git config command. Writes to the user .condarc
                 file ($HOME/.condarc) by default.
    create       Create a new conda environment from a list of specified
                 packages.
..............

Installation:

Clone this git repository to the location where you want to run your analysis and create the conda environment that will be used to run the pipeline

git clone https://github.com/nanoporetech/Pore-C-Snakemake.git
cd pore-c-snakemake
## Creates environment and the dependencies will install automatically
conda env create
conda activate pore_c_snakemake

Note before you run any of the snakemake commands below you need to make sure that you've run conda activate pore_c_snakemake.


3. Usage

Testing:

Test data is included in the .test subfolder (git-lfs is required to download them). To run the tests use

snakemake --use-conda  test -j 4 --config=output_dir=results.test

The results of the test run will appear in the results.test directory.

Configure workflow:

The pipeline configuration is split across several files:

*  `config/config.yaml` - A yaml file containing settings for the pipeline. Input data is specified in the following tab-delimited files.
*  `config/basecall.tsv` - Metadata and locations of the pore-c sequencing run fastqs.
*  `config/references.tsv` - Locations of the draft/scaffold/reference assemblies that the pore-c reads will be mapped to.
*  `config/phased_vcfs.tsv` - [Optional] The location of phased vcf files that can be used to haplotag poreC reads.

Execute workflow:

Test your configuration by performing a dry-run via

snakemake --use-conda -n

Execute the workflow locally via

snakemake --use-conda --cores $N

using $N cores or run it in a cluster environment via

snakemake --use-conda --cluster qsub --jobs 100

or

snakemake --use-conda --drmaa --jobs 100

in combination with any of the modes above. See the Snakemake documentation for further details.

Workflow targets

The pipeline defines several targets that can be speficied on the command line:

To build the files for a particular target:

snakemake --use-conda -j 8 <target>

4. Output files

Once the pipeline has run successfully you should expect the following files in the output directory:

License and Copyright:

© 2019 Oxford Nanopore Technologies Ltd.

Bioinformatics-Tutorials is distributed by Oxford Nanopore Technologies under the terms of the MPL-2.0 license.