lcdb / lcdb-wf

Robust, tested workflows for RNA-seq, ChIP-seq and other high-throughput sequencing analysis
https://lcdb.github.io/lcdb-wf
20 stars 17 forks source link

use snakemake reports #185

Open daler opened 5 years ago

daler commented 5 years ago

We should really be using the snakemake reporting tools, https://snakemake.readthedocs.io/en/stable/snakefiles/reporting.html

daler commented 5 years ago

After some discussion, we arrived at the following proposal to support reporting. Note that this comes with the additional benefit of documenting created files, but the cost is a change in the patterns yaml files. The new Snakefiles will be compatible with old configs, but users will want to update their configs to take full advantage of the reporting.

  1. rnaseq_patterns.yaml and chipseq_patterns.yaml will change format from:
# old format
fastqc:
  raw: 'data/rnaseq_samples/{sample}/fastqc/{sample}_R1.fastq.gz_fastqc.zip'
  cutadapt: 'data/rnaseq_samples/{sample}/fastqc/{sample}_R1.cutadapt.fastq.gz_fastqc.zip'
  bam: 'data/rnaseq_samples/{sample}/fastqc/{sample}.cutadapt.bam_fastqc.zip'

to

# new format
fastqc:
  raw:
    pattern: 'data/rnaseq_samples/{sample}/fastqc/{sample}_R1.fastq.gz_fastqc.zip'
    description: 'FastQC results for raw FASTQ files'
  cutadapt:
    pattern: 'data/rnaseq_samples/{sample}/fastqc/{sample}_R1.cutadapt.fastq.gz_fastqc.zip'
    description: 'FastQC results for trimmed FASTQ files'
  bam:
    pattern: 'data/rnaseq_samples/{sample}/fastqc/{sample}.cutadapt.bam_fastqc.zip'
    description: 'FastQC results for aligned reads'
  1. An additional pre-processing step will be added to lib.patterns_targets to generate .rst files corresponding to each description, living in .rst/fastqc_raw.rst, .rst/fastqc_cutadapt.rst, .rst/fastqc_bam.rst, etc.

  2. For those rules that should have reported files, the output will be changed from:

# old
rule rulename:
    output: c.patterns['cutadapt']

to

# new
rule rulename:
    output: report(c.patterns['cutadapt'], c.descriptions['cutadapt'])

For corner cases like the fastqc rule, which does not use c.patterns, one solution is to not report those individual files. In the particular case of fastqc and read count rules, multiqc does a better job anyway so it would be redundant to report separately.