report - Githubissues

hoelzer commented 3 years ago

It would be good to have at least a simple summary report for the reconstructed consensus sequence. This should include:

used version of poreCov
used tools and versions within poreCov
basic stats about the reconstructed consensuses (length, N50, number Ns, maybe pairwise identity to Wuhan strain ...)
if possible some stats about the called variants

E.g. in a single PDF report per run.

For the first part (technical stats) it might be also enough to use the nextflow internal functions for reporting.

hoelzer commented 3 years ago

We can base the report on the following script: https://gitlab.com/RKIBioinformaticsPipelines/ncov_minipipe/-/blob/master/ncov_minipipe.Rmd

and modify it according to the Nanopore needs.

hoelzer commented 3 years ago

@oliverdrechsel @rekm welcome to the reporting issue! :)

So basically @replikation needs to know what are the inputs (and formats) for the different parts https://gitlab.com/RKIBioinformaticsPipelines/ncov_minipipe/-/blob/master/ncov_minipipe.Rmd already provides so he can generate the inputs.

Maybe we can start basic and try to implement the Rmd script with support for

some raw read metrics
consensus coverage

replikation commented 3 years ago

yep basically either an example "of input" to this script or a summary of what the script needs to function properly

oliverdrechsel commented 3 years ago

the report currently takes a bunch of files. I think we can either modify the report to have all files optional or butcher it and make a nanopore version

What i find quite important for the report output (html) is that figures and tables have a maximum size and implement scroll bars. This keeps the report short and easy to scroll, although each section might contain 100 samples.

*coverage.tsv

loads of 0's because it's a negative control as example

$ head NK-1xTE_1.coverage.tsv
NC_045512.2     1       0
NC_045512.2     2       0
NC_045512.2     3       0
NC_045512.2     4       0
NC_045512.2     5       0
NC_045512.2     6       0
NC_045512.2     7       0
NC_045512.2     8       0
NC_045512.2     9       0
NC_045512.2     10      0

fragment size

this is basically the fragment size column of the mapping bam file

$ head NK-1xTE_1.fragsize.tsv
0
102
-102
-121
121
102
-102
102
-102
93

mapping statistics

output from bwa mem

$ head NK-1xTE_1.bamstats.txt
794 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
2 + 0 supplementary
0 + 0 duplicates
718 + 0 mapped (90.43% : N/A)
792 + 0 paired in sequencing
396 + 0 read1
396 + 0 read2
526 + 0 properly paired (66.41% : N/A)
640 + 0 with itself and mate mapped

transformed for improved reading in R

$ head NK-1xTE_1.bamstats.pipe.txt
794|0|in total (QC-passed reads + QC-failed reads)
0|0|secondary
2|0|supplementary
0|0|duplicates
718|0|mapped (90.43% : N/A)
792|0|paired in sequencing
396|0|read1
396|0|read2
526|0|properly paired (66.41% : N/A)
640|0|with itself and mate mapped

version

$ cat pipeline.version
v2.0.4

pangolin (optional)

$ cat NK-TE_1_21-00101.lineage.txt
taxon,lineage,probability,pangoLEARN_version,status,note
NK-TE_1_21-00101_iupac_consensus_v2.0.4,None,0,2021-01-16,fail,N_content:1.0

kraken (optional)

kraken2 result of a run of reads against a human/SARS-CoV-2 database (https://zenodo.org/record/3854856) kraken read filtering improved out mapping tremendously

$ head NK-TE_1_21-00101.kraken.report.txt
 22.58  56      56      U       0       unclassified
 77.42  192     0       R       1       root
 77.02  191     0       D       10239     Viruses
 77.02  191     0       D1      2559587     Riboviria
 77.02  191     0       K       2732396       Orthornavirae
 77.02  191     0       P       2732408         Pisuviricota
 77.02  191     0       C       2732506           Pisoniviricetes
 77.02  191     0       O       76804               Nidovirales
 77.02  191     0       O1      2499399               Cornidovirineae
 77.02  191     0       F       11118                   Coronaviridae

replikation commented 3 years ago

@oliverdrechsel thanks ill prepare the information on my end and check if I can add all the mandatory items so you don't need to change anything.

oliverdrechsel commented 3 years ago

@oliverdrechsel thanks ill prepare the information on my end and check if I can add all the mandatory items so you don't need to change anything.

Thanks a lot. I don't know if things like fragment size make sense as minION is not performing paired end sequencing.

replikation commented 3 years ago

@oliverdrechsel just checked its 0 everywhere. so this might be good as an optional parameter or do you have another idea that I can supply here instead? € not sure how the final report looks like (e.g. can I supply some more nanopore relevant things here)

replikation commented 3 years ago

@oliverdrechsel looking at your report script i tend to rewrite it as nextflow is more "file" oriented. so using dirs and recursively looking for inputs is counter intuitive here. Would this be okay if I fork your script and adjust it to the workflow?

replikation / poreCov

report #10

*coverage.tsv

fragment size

mapping statistics

version

pangolin (optional)

kraken (optional)