replikation / poreCov

SARS-CoV-2 workflow for nanopore sequence data
https://case-group.github.io/
GNU General Public License v3.0
39 stars 17 forks source link

report #10

Closed hoelzer closed 3 years ago

hoelzer commented 3 years ago

It would be good to have at least a simple summary report for the reconstructed consensus sequence. This should include:

E.g. in a single PDF report per run.

For the first part (technical stats) it might be also enough to use the nextflow internal functions for reporting.

hoelzer commented 3 years ago

We can base the report on the following script: https://gitlab.com/RKIBioinformaticsPipelines/ncov_minipipe/-/blob/master/ncov_minipipe.Rmd

and modify it according to the Nanopore needs.

hoelzer commented 3 years ago

@oliverdrechsel @rekm welcome to the reporting issue! :)

So basically @replikation needs to know what are the inputs (and formats) for the different parts https://gitlab.com/RKIBioinformaticsPipelines/ncov_minipipe/-/blob/master/ncov_minipipe.Rmd already provides so he can generate the inputs.

Maybe we can start basic and try to implement the Rmd script with support for

replikation commented 3 years ago

yep basically either an example "of input" to this script or a summary of what the script needs to function properly

oliverdrechsel commented 3 years ago

the report currently takes a bunch of files. I think we can either modify the report to have all files optional or butcher it and make a nanopore version

What i find quite important for the report output (html) is that figures and tables have a maximum size and implement scroll bars. This keeps the report short and easy to scroll, although each section might contain 100 samples.

*coverage.tsv

loads of 0's because it's a negative control as example

$ head NK-1xTE_1.coverage.tsv
NC_045512.2     1       0
NC_045512.2     2       0
NC_045512.2     3       0
NC_045512.2     4       0
NC_045512.2     5       0
NC_045512.2     6       0
NC_045512.2     7       0
NC_045512.2     8       0
NC_045512.2     9       0
NC_045512.2     10      0

fragment size

this is basically the fragment size column of the mapping bam file

$ head NK-1xTE_1.fragsize.tsv
0
102
-102
-121
121
102
-102
102
-102
93

mapping statistics

output from bwa mem

$ head NK-1xTE_1.bamstats.txt
794 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
2 + 0 supplementary
0 + 0 duplicates
718 + 0 mapped (90.43% : N/A)
792 + 0 paired in sequencing
396 + 0 read1
396 + 0 read2
526 + 0 properly paired (66.41% : N/A)
640 + 0 with itself and mate mapped

transformed for improved reading in R

$ head NK-1xTE_1.bamstats.pipe.txt
794|0|in total (QC-passed reads + QC-failed reads)
0|0|secondary
2|0|supplementary
0|0|duplicates
718|0|mapped (90.43% : N/A)
792|0|paired in sequencing
396|0|read1
396|0|read2
526|0|properly paired (66.41% : N/A)
640|0|with itself and mate mapped

version

$ cat pipeline.version
v2.0.4

pangolin (optional)

$ cat NK-TE_1_21-00101.lineage.txt
taxon,lineage,probability,pangoLEARN_version,status,note
NK-TE_1_21-00101_iupac_consensus_v2.0.4,None,0,2021-01-16,fail,N_content:1.0

kraken (optional)

kraken2 result of a run of reads against a human/SARS-CoV-2 database (https://zenodo.org/record/3854856) kraken read filtering improved out mapping tremendously

$ head NK-TE_1_21-00101.kraken.report.txt
 22.58  56      56      U       0       unclassified
 77.42  192     0       R       1       root
 77.02  191     0       D       10239     Viruses
 77.02  191     0       D1      2559587     Riboviria
 77.02  191     0       K       2732396       Orthornavirae
 77.02  191     0       P       2732408         Pisuviricota
 77.02  191     0       C       2732506           Pisoniviricetes
 77.02  191     0       O       76804               Nidovirales
 77.02  191     0       O1      2499399               Cornidovirineae
 77.02  191     0       F       11118                   Coronaviridae
replikation commented 3 years ago

@oliverdrechsel thanks ill prepare the information on my end and check if I can add all the mandatory items so you don't need to change anything.

oliverdrechsel commented 3 years ago

@oliverdrechsel thanks ill prepare the information on my end and check if I can add all the mandatory items so you don't need to change anything.

Thanks a lot. I don't know if things like fragment size make sense as minION is not performing paired end sequencing.

replikation commented 3 years ago

@oliverdrechsel just checked its 0 everywhere. so this might be good as an optional parameter or do you have another idea that I can supply here instead? € not sure how the final report looks like (e.g. can I supply some more nanopore relevant things here)

replikation commented 3 years ago

@oliverdrechsel looking at your report script i tend to rewrite it as nextflow is more "file" oriented. so using dirs and recursively looking for inputs is counter intuitive here. Would this be okay if I fork your script and adjust it to the workflow?