Closed johnbradley closed 3 years ago
After running the pipeline I created a html report using snakemake --report
that created a report.html
file.
When opened it looks like so:
You can click on each rule and see more information:
The statistics tab has details some runtime statistics:
Adds a Snakemake workflow to run setup steps on the genome.
Input and Output
The input genome is named NC_045512.fasta and stored in a resources directory per the Snakemake docs recommendation. The Snakefile consists of an "all" rule that specifies the files to generate. These output files are the bwa index files, samtools index file and the picard dictionary file. There are three rules for creating these output files.
Per rule conda environment
Instead of a single environment each rule has it's own conda environment. This allows greater flexibility when choosing tools for various steps. Snakemake handles creating and using these environments when snakemake is run with the
--use-conda
flag.Logging config
The rules specify a location for their log files. Note the shell "&>{log}" part that saves command output to the log files.
Future Changes
The docs also recommend putting each rule in a separate file but for clarity (while we are getting started) I left the rules in the main Snakefile.
To run on a slurm cluster requires a "slurm" profile. I will address this need in a later change.
Running
Right now I have installed snakemake on my laptop and run the pipeline like so:
When complete the genome indexes and dict file will be in the resources/* directory.
multiext function
The rules make use of the
multiext
function. This function appends a list of suffixes to a filename and returns an array of filenames. Themultiext("resources/NC_045512.fasta", ".amb", ".ann", ".bwt", ".pac", ".sa")
function call returns the following list of filenames as a python array:This is part of issue #46.