$SNAKEMAKE_PROFILE
variable to use the vdblab-profile (A private repo for vdblab members) or to point to whatever snakmake profile you will be using. At the same time add an $TMPDIR
environmental variable definition to your .bashrc file to define where you would like to put temporary files. If doing this on lilac - recommended that you point this to a location in your /data/ directory.$SNAKEMAKE_PROFILE
(eg export SNAKEMAKE_PROFILE=/path/to/your/profile/
) (Recommended that you add this to the .bashrc file in your home directory to have this environmental variable instated upon startup.)--dry-run
flag for the user to preview the rules to be executed. Remove this step to execute the commands.config/config.yaml
Change the paths to reflect where the databases can be found on your machine. For a uniform way to fetch and build all the databases, see https://github.com/vdblab/resourcessnakemake --snakefile .test/Snakefile --directory .test/simulated/
snakemake \
--directory tmpout/ \
--config \
sample=473 \
R1=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R1_001.fastq.gz] \
R2=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R2_001.fastq.gz] \
nshards=4 \
stage=all \
--dry-run
The rule DAG for a single sample looks like this:
Different modules of the workflow can be run indenpendently using the stage
config entry.
Just run MultiQC on a directory, no need to use Snakemake
cp -r tmppre/reports tmpreports
cp tmpassembly/quast/quast_473/report.tsv ./tmpreports/
ver="v1.12"
docker run -V $PWD:$PWD docker://ewels/multiqc:${ver} multiqc \
--config vdb_shotgun/multiqc_config.yaml --force \
--title "a multiqc report for some test data" \
-b "generated by ${ver}" --filename multiqc_report.html \
reports/ --interactive
snakemake \
--directory tmppreprocess/ \
--config \
sample=473 \
R1=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R1_001.fastq.gz] \
R2=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R2_001.fastq.gz] \
nshards=4 \
dedup_platform=NovaSeq \
stage=preprocess \
--dry-run
snakemake \
--directory tmpbiobakery/ \
--config \
sample=473 \
R1=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R1_001.fastq.gz] \
R2=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R2_001.fastq.gz] \
stage=biobakery \
--dry-run
snakemake \
--directory tmpkraken/ \
--config \
sample=473 \
R1=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R1_001.fastq.gz] \
R2=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R2_001.fastq.gz] \
dedup_platform=NovaSeq \
stage=kraken \
--dry-run
snakemake \
--directory tmpassembly/ \
--config \
sample=473 \
R1=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R1_001.fastq.gz] \
R2=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R2_001.fastq.gz] \
stage=assembly \
--dry-run
snakemake \
--directory tmpannotate/ \
--config \
sample=473 \
R1=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R1_001.fastq.gz] \
R2=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R2_001.fastq.gz] \
assembly=tmpassembly/473.contigs.fasta \
stage=annotate \
--dry-run
snakemake \
--directory tmpbinning/ \
--config \
sample=473 \
R1=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R1_001.fastq.gz] \
R2=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R2_001.fastq.gz] \
assembly=tmpassembly/473.contigs.fasta \
stage=binning \
--dry-run
snakemake \
--directory tmprgi/ \
--config \
sample=473 \
R1=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R1_001.fastq.gz] \
R2=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R2_001.fastq.gz] \
stage=rgi \
--dry-run
This pipeline StrainPhlAn for each specified species. Strainphlan requires two inputs: sample-level marker pickle files, and strain-level markers extracted from the main database. These are stored in central subdirectory in the Metaphlan database directory to aid re-running. If you provide the .sam.bz2 file for a samples that has already been processed into a pkl file, it will use the pregenerated result.
This workflow accepts as input a list of sample's metaphlan sam.bz2
alignment files, and a list of species of interest. A config argument strainphlan_markers_dir
serves as a central place for storing both the species- and the sample-level marker files; these are specific to a version of the MetaPHlan database, so we recommend placing that within the metaphlan database directory.
snakemake \
--snakefile workflow/strainphlan.smk \
--directory tmpstrain/ \
--config \
sams=[path/to/sample1.sam.bz2,path/to/sample2.sam.bz2] \
strainphlan_markers_dir=/data/brinkvd/resources/dbs/metaphlan/mpa_vJan21_CHOCOPhlAnSGB_202103/marker_outputs/ \
metaphlan_db=/data/brinkvd/resources/dbs/metaphlan/mpa_vJan21_CHOCOPhlAnSGB_202103/ \
marker_in_n_samples=2 \
--dry-run
For each input species:
The rule DAG for two example input species looks like this:
Please see development.md
.