nmdp-bioinformatics / pipeline

Consensus assembly and allele interpretation pipeline.
GNU Lesser General Public License v3.0
7 stars 7 forks source link

pipeline

This is a prototypical pipeline. Greater detail on how this works, why it's supposed to work, and much, much more can be found in the wiki at the github page.

Build Status

DESCRIPTION

splitter is the master script which figures out what files to work on, and divides the workload up -- into one file per core on the machine. It then starts process_fastq on each of these files. Note that it isn't particularly clever about dividing up the workloads. If there are 100 files, and 4 cores, you get 4 files of 25 lines each -- regardless of sample size. Still, some parallelization is better than none, right?

USING pipeline

CAVEATS

This tool requires a fair number of other tools to be installed. The shell script checks for them, and will try to give you breadcrumbs as to how to solve any of the issues it finds (e.g. where to find bwa).

If your machine/instance has less than 4GB of RAM, you'll want to modify the script so it uses chr6.fa as the reference genomic, install of all_chr.fa. Make sure those files are indexed. This script does not do the indexing for you.

This is a work in progress, but we welcome improvements and questions. Please feel free to file issues in github.

This has been primarily tested on Debian-based Linux systems (Ubuntu 14.04.01 LTS, to be exact). However, we do try to support other Unix-like operating systems, such as Apple OSX. If you run into issues on your distribution of choice, please file an issue on github. OSX, mind you, has some VERY old shell utilities, so it's still a bit prototypical.

DEBUGGING

Elapsed times, STDOUT and STDERR messages are stored in files named timedata.[0-9]+ -- that is, timedata, followed by a numeric string derived from the PID of the splitter.bash script.

All timedata files may safely be removed after run validation.

ngs-validation-report

This tool takes in the output of the NGS pipeline and produces a html report of the results.

Using ngs-validation-report

1) Running with no parameters

/path/to/tool/ngs-validation-report

2) Running with only an experiment name

/path/to/ngs-validation-report -x ex014

3) Running with only an experiment name and an input directory

/path/to/ngs-validation-report -x ex014 -d /path/to/directory/of/data

4) Running with the experiment, validated, and observed files.

/path/to/ngs-validation-report -l ex00_validated.txt -e ex00_expected.hml -o ex00_observed.txt

Output

report/
├── index.html
├── results.html
├── errors.html
├── experiments.html
├── help.html
├── log.html
├── subjects/
│   ├── subject1.html
│   ├── subject2.html
│   └── ...
├── css/
│   ├── dashboard.css
│   ├── bootstrap.min.css
│   └── default.css
├── js/
│   ├── Chart.js
│   ├── docs.min.js
│   ├── ie-emulation-modes-warning.js
│   ├── ie10-viewport-bug-workaround.js
│   ├── ie8-responsive-file-warning.js
│   ├── init.js
│   ├── jquery.js
│   ├── raphael.js
│   └── bootstrap.min.js
└── img/
    └── bethematch.jpeg