nf-core / viralrecon

Assembly and intrahost/low-frequency variant calling for viral samples
https://nf-co.re/viralrecon
MIT License
123 stars 111 forks source link

Generate improved report QC #9

Closed saramonzon closed 4 years ago

saramonzon commented 4 years ago

We create this type of report for researchers, this way they can see at a glance how the experiment worked:

sample host Virus sequence total reads reads host % reads host reads virus %reads virus unmapped reads % unmaped reads mean DP coverage virus Coverage > 5x (%) NumVariantsTrimIVAR %Nswithoutprimers
201397 human NC_045512,2 2365486 23052 0,97% 2335037 98,71% 7397 0,3127053 15289,14611 0,985386 5 2.50
201493 human NC_045512,2 2077038 22299 1,07% 2049058 98,65% 5681 0,2735145 13671,11417 0,997024 6 0.32
201495 human NC_045512,2 1983106 14431 0,73% 1963048 98,99% 5627 0,28374681 14110,63539 0,997893 7 0.33
201575 human NC_045512,2 2092372 4073 0,19% 2080486 99,43% 7813 0,37340396 14861,83768 0,998763 7 0.34
201602 human NC_045512,2 1821320 92766 5,09% 1730140 94,99% -1586 -0,0870797 11835,88687 0,992308 6 2.27
201607 human NC_045512,2 2531506 23880 0,94% 2499503 98,74% 8123 0,32087619 18076,30037 0,997458 7 0.49
201617 human NC_045512,2 2232766 10799 0,48% 2212565 99,10% 9402 0,42109204 16235,09561 0,99786 3 0.32
201706 human NC_045512,2 2642668 6370 0,24% 2628133 99,45% 8165 0,30896806 18056,17594 0,998763 6 0.24
201709 human NC_045512,2 1690968 13871 0,82% 1673082 98,94% 4015 0,23743796 11272,06076 0,998595 5 0.64
201738 human NC_045512,2 2723604 3457 0,13% 2708670 99,45% 11477 0,42139019 18263,53416 0,998763 2 0.32
202050 human NC_045512,2 1366142 467391 34,21% 898349 65,76% 402 0,02942593 5552,181253 0,95268 5 10.35
202052 human NC_045512,2 558458 238928 42,78% 167130 29,93% 152400 27,2894291 913,072902 0,946494 4 14.17

Also we have worked in some graphs for the amplicon experiment in order to see how homogeneus the depht of coverage is among amplicons (using bedtools coverage), but a lot of improvement can be done here. And including it as custom content in multiQC is a plus!! image image

stevekm commented 4 years ago

how do you want to make this report? I have done custom R Markdown based reporting before. Not sure if you want that or if you prefer to build it directly into MultiQC?

ewels commented 4 years ago

Nice! Looks like it should fit MultiQC very well. Let me know if you’d like any help.

drpatelh commented 4 years ago

It would be great if we can include the output from ivar trim, kraken2 and varscan mpileup2cns in the MultiQC report as a plugin/custom content. As requested, I have added some test data here @ewels.

Once we are able to add this all into MultiQC then generating a table with a subset of metrics would be great!

apeltzer commented 4 years ago

PR for iVar to MultiQC: https://github.com/ewels/MultiQC/pull/1159 PR for VarScan2 to MultiQC (supports SNp, INDEL and CNS files): https://github.com/ewels/MultiQC/pull/1160

Feedback greatly appreciated - if any of you wants to have a look 👍

ktrns commented 4 years ago

Hi there, I am going through the current MultiQC and this is what I noticed first:

... more later.

Thank you!!

ktrns commented 4 years ago

Hi @drpatelh, I think the new organisation of your MultiQC report is much nicer. Here a few comments:

Thanks a lot for all the work! Further suggestions will follow :-). Best Katrin

drpatelh commented 4 years ago

Based on the current version of the pipeline the files below are collated in the MultiQC work directory. I have listed the files that we could use to create a custom table for the samples as a starting point. We would have to test and check how these are reported for both PE and SE reads. One idea I discussed briefly with @ewels was to use the parsing functionality of MultiQC modules to read the data directly into a custom Python script that we could then use to collate of the data and output as a table.

e.g. https://github.com/ewels/MultiQC/blob/master/multiqc/modules/samtools/flagstat.py

###############################
## PREPROCESSING METRICS
###############################

## TOTAL NUMBER OF INPUT READS
├── fastqc
│   ├── SAMPLE1_PE_1.merged_fastqc.html
│   ├── SAMPLE1_PE_1.merged_fastqc.zip

## NUMBER OF READS LEFT AFTER ADAPTER & QUALITY TRIMMING RAW FASTQ
├── fastp
│   └── log
│       ├── SAMPLE1_PE.fastp.html
│       ├── SAMPLE1_PE.fastp.json
│       ├── SAMPLE1_PE.fastp.log

###############################
## VARIANT CALLING METRICS
###############################

## NUMBER OF READS MAPPED TO VIRAL GENOME
├── bowtie2
│   ├── flagstat
│   │   ├── SAMPLE1_PE.sorted.bam.flagstat
│   └── log
│       ├── SAMPLE1_PE.bowtie2.log

## TOTAL NUMBER OF VARIANTS CALLED
## TOTAL NUMBER OF Ns in consensus
├── varscan2
│   ├── quast
│   │   └── highfreq
│   │       └── quast
│   └── variants
│       ├── highfreq
│       │   ├── SAMPLE1_PE.highfreq.varscan2.log
│       └── lowfreq
│           ├── SAMPLE1_PE.lowfreq.varscan2.log

## INSERT SIZE MEAN AND STD DEV
## COVERAGE METRICS?
## OTHERS?
├── picard
│   ├── SAMPLE1_PE.trim.CollectMultipleMetrics.alignment_summary_metrics
│   ├── SAMPLE1_PE.trim.CollectMultipleMetrics.insert_size_metrics
│   ├── SAMPLE1_PE.trim.CollectWgsMetrics.coverage_metrics

## TOTAL NUMBER OF Ns in consensus
## TOTAL NUMBER OF READS LEFT AFTER PRIMER TRIMMING
## TOTAL NUMBER OF VARIANTS CALLED WITH IVAR
├── ivar
│   ├── consensus
│   │   └── quast
│   │       └── quast
│   ├── trim
│   │   ├── flagstat
│   │   │   ├── SAMPLE1_PE.trim.sorted.bam.flagstat
│   │   └── log
│   │       ├── SAMPLE1_PE.trim.ivar.log
│   └── variants
│       ├── counts
│       │   ├── SAMPLE1_PE.variant.counts_mqc.tsv

###############################
## DE NOVO ASSEMBLY METRICS
###############################

## NUMBER OF READS LEFT AFTER PRIMER TRIMMING RAW FASTQ
├── cutadapt
│   └── log
│       ├── SAMPLE1_PE.cutadapt.log

## NUMBER OF CLASSIFIED/UNCLASSIFED READS
├── kraken2
│   ├── SAMPLE1_PE.kraken2.report.txt

## NUMBER OF Ns in ASSEMBLY 
## OTHER ASSEMBLY METRICS
├── spades
│   ├── quast
│   │   └── quast
├── metaspades
│   ├── quast
│   │   └── quast
├── unicycler
│   ├── quast
│   │   └── quast
├── minia
│   ├── quast
│   │   └── quast

###############################
drpatelh commented 4 years ago

Will be mostly fixed in https://github.com/nf-core/viralrecon/pull/102