Closed saramonzon closed 4 years ago
how do you want to make this report? I have done custom R Markdown based reporting before. Not sure if you want that or if you prefer to build it directly into MultiQC?
Nice! Looks like it should fit MultiQC very well. Let me know if you’d like any help.
It would be great if we can include the output from ivar trim
, kraken2
and varscan mpileup2cns
in the MultiQC report as a plugin/custom content. As requested, I have added some test data here @ewels.
Once we are able to add this all into MultiQC then generating a table with a subset of metrics would be great!
PR for iVar to MultiQC: https://github.com/ewels/MultiQC/pull/1159 PR for VarScan2 to MultiQC (supports SNp, INDEL and CNS files): https://github.com/ewels/MultiQC/pull/1160
Feedback greatly appreciated - if any of you wants to have a look 👍
Hi there, I am going through the current MultiQC and this is what I noticed first:
DONE: It would be nice to know what the file names correspond to in "General Statistics" (Sample, Sample_1, Sample_2, Sample_T1, Sample_T1_1, Sample_T1_2. From @drpatelh: The _T1 suffixes wont be visible in the report anymore because the pipeline will now merge the samples right at the beginning of the pipeline if applicable
DONE: The order: is it possible to better structure the report such as in "Pipeline Summary" of the README.md? Cutadapt is run for the assembly process (point 6 in README.md), yet Bowtie 2 is reported later in MultiQC although it is run for the variant calling process (point 5 in README.md)
NOT POSSIBLE: Nesting? I would think that a clear order and overview would be great, I don't know if nesting is possible with MultiQC. The feedback I got with the viralrecon MultiQC so far is that people got a bit lost.
... more later.
Thank you!!
Hi @drpatelh, I think the new organisation of your MultiQC report is much nicer. Here a few comments:
Per Sequence GC Content: Normal random library typically have a roughly normal distribution of GC content. -> libraries? (I am not a native English speaker, but seems like a mix of singular and plural)
DONE: VARIANTS: SAMTools (iVar): This section of the report shows SAMTools counts/statistics after primar sequence removal with iVar. -> primer not primar
Mapped reads per contig: Default is on "Normalised counts" - is that a useful plot for this pipeline? We know the number of mapped reads already from previous tables and as discussed yesterday, the pipeline works for one chromosome genomes for now. Seems redundant to me.
Indel Distribution: Should plots like this one be skipped if there are absolutely none? I guess this is for all Bcftools plots, they look empty in your report. Maybe empty plots can be skipped, and a message says that there are no indels/whatsoever.
Thanks a lot for all the work! Further suggestions will follow :-). Best Katrin
Based on the current version of the pipeline the files below are collated in the MultiQC work
directory. I have listed the files that we could use to create a custom table for the samples as a starting point. We would have to test and check how these are reported for both PE and SE reads. One idea I discussed briefly with @ewels was to use the parsing functionality of MultiQC modules to read the data directly into a custom Python script that we could then use to collate of the data and output as a table.
e.g. https://github.com/ewels/MultiQC/blob/master/multiqc/modules/samtools/flagstat.py
###############################
## PREPROCESSING METRICS
###############################
## TOTAL NUMBER OF INPUT READS
├── fastqc
│ ├── SAMPLE1_PE_1.merged_fastqc.html
│ ├── SAMPLE1_PE_1.merged_fastqc.zip
## NUMBER OF READS LEFT AFTER ADAPTER & QUALITY TRIMMING RAW FASTQ
├── fastp
│ └── log
│ ├── SAMPLE1_PE.fastp.html
│ ├── SAMPLE1_PE.fastp.json
│ ├── SAMPLE1_PE.fastp.log
###############################
## VARIANT CALLING METRICS
###############################
## NUMBER OF READS MAPPED TO VIRAL GENOME
├── bowtie2
│ ├── flagstat
│ │ ├── SAMPLE1_PE.sorted.bam.flagstat
│ └── log
│ ├── SAMPLE1_PE.bowtie2.log
## TOTAL NUMBER OF VARIANTS CALLED
## TOTAL NUMBER OF Ns in consensus
├── varscan2
│ ├── quast
│ │ └── highfreq
│ │ └── quast
│ └── variants
│ ├── highfreq
│ │ ├── SAMPLE1_PE.highfreq.varscan2.log
│ └── lowfreq
│ ├── SAMPLE1_PE.lowfreq.varscan2.log
## INSERT SIZE MEAN AND STD DEV
## COVERAGE METRICS?
## OTHERS?
├── picard
│ ├── SAMPLE1_PE.trim.CollectMultipleMetrics.alignment_summary_metrics
│ ├── SAMPLE1_PE.trim.CollectMultipleMetrics.insert_size_metrics
│ ├── SAMPLE1_PE.trim.CollectWgsMetrics.coverage_metrics
## TOTAL NUMBER OF Ns in consensus
## TOTAL NUMBER OF READS LEFT AFTER PRIMER TRIMMING
## TOTAL NUMBER OF VARIANTS CALLED WITH IVAR
├── ivar
│ ├── consensus
│ │ └── quast
│ │ └── quast
│ ├── trim
│ │ ├── flagstat
│ │ │ ├── SAMPLE1_PE.trim.sorted.bam.flagstat
│ │ └── log
│ │ ├── SAMPLE1_PE.trim.ivar.log
│ └── variants
│ ├── counts
│ │ ├── SAMPLE1_PE.variant.counts_mqc.tsv
###############################
## DE NOVO ASSEMBLY METRICS
###############################
## NUMBER OF READS LEFT AFTER PRIMER TRIMMING RAW FASTQ
├── cutadapt
│ └── log
│ ├── SAMPLE1_PE.cutadapt.log
## NUMBER OF CLASSIFIED/UNCLASSIFED READS
├── kraken2
│ ├── SAMPLE1_PE.kraken2.report.txt
## NUMBER OF Ns in ASSEMBLY
## OTHER ASSEMBLY METRICS
├── spades
│ ├── quast
│ │ └── quast
├── metaspades
│ ├── quast
│ │ └── quast
├── unicycler
│ ├── quast
│ │ └── quast
├── minia
│ ├── quast
│ │ └── quast
###############################
Will be mostly fixed in https://github.com/nf-core/viralrecon/pull/102
We create this type of report for researchers, this way they can see at a glance how the experiment worked:
Also we have worked in some graphs for the amplicon experiment in order to see how homogeneus the depht of coverage is among amplicons (using bedtools coverage), but a lot of improvement can be done here. And including it as custom content in multiQC is a plus!!