Open daler opened 5 years ago
After some discussion, we arrived at the following proposal to support reporting. Note that this comes with the additional benefit of documenting created files, but the cost is a change in the patterns yaml files. The new Snakefiles will be compatible with old configs, but users will want to update their configs to take full advantage of the reporting.
rnaseq_patterns.yaml
and chipseq_patterns.yaml
will change format from:# old format
fastqc:
raw: 'data/rnaseq_samples/{sample}/fastqc/{sample}_R1.fastq.gz_fastqc.zip'
cutadapt: 'data/rnaseq_samples/{sample}/fastqc/{sample}_R1.cutadapt.fastq.gz_fastqc.zip'
bam: 'data/rnaseq_samples/{sample}/fastqc/{sample}.cutadapt.bam_fastqc.zip'
to
# new format
fastqc:
raw:
pattern: 'data/rnaseq_samples/{sample}/fastqc/{sample}_R1.fastq.gz_fastqc.zip'
description: 'FastQC results for raw FASTQ files'
cutadapt:
pattern: 'data/rnaseq_samples/{sample}/fastqc/{sample}_R1.cutadapt.fastq.gz_fastqc.zip'
description: 'FastQC results for trimmed FASTQ files'
bam:
pattern: 'data/rnaseq_samples/{sample}/fastqc/{sample}.cutadapt.bam_fastqc.zip'
description: 'FastQC results for aligned reads'
An additional pre-processing step will be added to lib.patterns_targets to generate .rst
files corresponding to each description, living in .rst/fastqc_raw.rst
, .rst/fastqc_cutadapt.rst
, .rst/fastqc_bam.rst
, etc.
For those rules that should have reported files, the output will be changed from:
# old
rule rulename:
output: c.patterns['cutadapt']
to
# new
rule rulename:
output: report(c.patterns['cutadapt'], c.descriptions['cutadapt'])
For corner cases like the fastqc rule, which does not use c.patterns
, one solution is to not report those individual files. In the particular case of fastqc and read count rules, multiqc does a better job anyway so it would be redundant to report separately.
We should really be using the snakemake reporting tools, https://snakemake.readthedocs.io/en/stable/snakefiles/reporting.html