lskatz / SneakerNet

:feet: QA/QC pipeline for a MiSeq/HiSeq/Ion Torrent/assembly-only run
Apache License 2.0
11 stars 4 forks source link

SneakerNet

DOI

Synopsis

A pipeline for processing reads from a sequencing run. Currently supports Illumina or Ion Torrent, but it can be expanded to other platforms.

# Run SneakerNet on the example data
SneakerNetPlugins.pl --numcpus 4 t/M00123-18-001-test

SneakerNet workflow

Main steps

This is the default workflow in v0.14 but there are other workflows available as described in PLUGINS.md.

Quick start

  1. Install and configure SneakerNet - from source or with a container
  2. Make an input folder from your MiSeq run docs/SneakerNetInput.md
  3. Run SneakerNetPlugins.pl on the input folder.

Installation

See docs/INSTALL.md

NOTE: to ensure all dependencies are met, please follow the dependencies section under the installation document.

Container installation

SneakerNet has been containerized and is at dockerhub. For more information, please see our containers documentation.

Here is a summary of Docker commands, from the containers documentation.

# Pull image
docker pull lskatz/sneakernet:latest
# Import data directly from the MiSeq machine, where $MISEQ is a raw run folder exported by the MiSeq machine
# and $INDIR is the newly created SneakerNet input folder
docker run --rm -v $PWD:/data -v $KRAKEN_DEFAULT_DB:/kraken-database -u $(id -u):$(id -g) lskatz/sneakernet:latest SneakerNet.roRun.pl /data/$MISEQ -o /data/$INDIR
# Run SneakerNet on the $INDIR (SneakerNet formatted folder)
docker run --rm -v $PWD:/data -v $KRAKEN_DEFAULT_DB:/kraken-database -u $(id -u):$(id -g) lskatz/sneakernet:latest SneakerNetPlugins.pl --numcpus 12 --no email --no transfer --no save /data/$INDIR

Workflow

Creating a SneakerNet project directory

For more information on a SneakerNet-style folder, see docs/SneakerNetInput.md

SneakerNet requires a project directory that is in a certain format already. To create the project, you can use SneakerNet.roRun.pl. For example,

SneakerNet.roRun.pl --createsamplesheet -o M1234-18-001-test miseq/working/directory

M01234-19-01-test is a project folder name, where it is dash-delimited and contains machine name, year, ordinal, and optionally a name. Fastq files must be in the format of _R1_ instead of _1 and _R2_ instead of _2 for this particular script to parse the files properly.

Running SneakerNet

It is generally a good idea to edit a file snok.txt to configure the run further. For more information on the workflow, see the configuration section in INSTALL.md. For example,

echo "emails = example@example.com, blah@example.com" > t/data/M00123-18-001/snok.txt
echo "workflow = default" >> t/data/M00123-18-001/snok.txt

And then run SneakerNet like so (optionally following the log with tail -f):

SneakerNetPlugins.pl --numcpus 8 t/data/M00123-18-001 > t/data/M00123-18-001/SneakerNet.log 2>&1 &
tail -f t/data/M00123-18-001/SneakerNet.log

Containers

SneakerNet has been containerized and is at dockerhub. For more information, please see our containers documentation.

Output

For more information, please see docs/SneakerNetOutput.md

SneakerNet produces a subfolder SneakerNet/ in your run directory. It also emails a report. To view a sample report, please go to t/report.html in this repository.

Plugins

SneakerNet is based on plugins. In this context, a plugin is an independent script that can run an analysis on a run directory, accept standard inputs (e.g., --help), and create standard output files.

For more details, see the plugins readme.

Plugins for developers

You too can develop for SneakerNet! For more information, please look at the readme for plugins and the contributing doc.

Further reading

Please see the docs subfolder for more specific documentation.

For inline documentation on some of the perl code, run perldoc lib/perl5/SneakerNet.pm.

Citation

Griswold, T., Kapsak, C., Chen, J. C., den Bakker, H. C., Williams, G., Kelley, A., Vidyaprakash, E., & Katz, L. S. (2021). SneakerNet: A modular quality assurance and quality check workflow for primary genomic and metagenomic read data. Journal of open source software, 6(60), 10.21105/joss.02334. https://doi.org/10.21105/joss.02334