Closed hoelzer closed 3 years ago
We can base the report on the following script: https://gitlab.com/RKIBioinformaticsPipelines/ncov_minipipe/-/blob/master/ncov_minipipe.Rmd
and modify it according to the Nanopore needs.
@oliverdrechsel @rekm welcome to the reporting issue! :)
So basically @replikation needs to know what are the inputs (and formats) for the different parts https://gitlab.com/RKIBioinformaticsPipelines/ncov_minipipe/-/blob/master/ncov_minipipe.Rmd already provides so he can generate the inputs.
Maybe we can start basic and try to implement the Rmd
script with support for
yep basically either an example "of input" to this script or a summary of what the script needs to function properly
the report currently takes a bunch of files. I think we can either modify the report to have all files optional or butcher it and make a nanopore version
What i find quite important for the report output (html) is that figures and tables have a maximum size and implement scroll bars. This keeps the report short and easy to scroll, although each section might contain 100 samples.
loads of 0's because it's a negative control as example
$ head NK-1xTE_1.coverage.tsv
NC_045512.2 1 0
NC_045512.2 2 0
NC_045512.2 3 0
NC_045512.2 4 0
NC_045512.2 5 0
NC_045512.2 6 0
NC_045512.2 7 0
NC_045512.2 8 0
NC_045512.2 9 0
NC_045512.2 10 0
this is basically the fragment size column of the mapping bam file
$ head NK-1xTE_1.fragsize.tsv
0
102
-102
-121
121
102
-102
102
-102
93
output from bwa mem
$ head NK-1xTE_1.bamstats.txt
794 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
2 + 0 supplementary
0 + 0 duplicates
718 + 0 mapped (90.43% : N/A)
792 + 0 paired in sequencing
396 + 0 read1
396 + 0 read2
526 + 0 properly paired (66.41% : N/A)
640 + 0 with itself and mate mapped
transformed for improved reading in R
$ head NK-1xTE_1.bamstats.pipe.txt
794|0|in total (QC-passed reads + QC-failed reads)
0|0|secondary
2|0|supplementary
0|0|duplicates
718|0|mapped (90.43% : N/A)
792|0|paired in sequencing
396|0|read1
396|0|read2
526|0|properly paired (66.41% : N/A)
640|0|with itself and mate mapped
$ cat pipeline.version
v2.0.4
$ cat NK-TE_1_21-00101.lineage.txt
taxon,lineage,probability,pangoLEARN_version,status,note
NK-TE_1_21-00101_iupac_consensus_v2.0.4,None,0,2021-01-16,fail,N_content:1.0
kraken2 result of a run of reads against a human/SARS-CoV-2 database (https://zenodo.org/record/3854856) kraken read filtering improved out mapping tremendously
$ head NK-TE_1_21-00101.kraken.report.txt
22.58 56 56 U 0 unclassified
77.42 192 0 R 1 root
77.02 191 0 D 10239 Viruses
77.02 191 0 D1 2559587 Riboviria
77.02 191 0 K 2732396 Orthornavirae
77.02 191 0 P 2732408 Pisuviricota
77.02 191 0 C 2732506 Pisoniviricetes
77.02 191 0 O 76804 Nidovirales
77.02 191 0 O1 2499399 Cornidovirineae
77.02 191 0 F 11118 Coronaviridae
@oliverdrechsel thanks ill prepare the information on my end and check if I can add all the mandatory items so you don't need to change anything.
@oliverdrechsel thanks ill prepare the information on my end and check if I can add all the mandatory items so you don't need to change anything.
Thanks a lot. I don't know if things like fragment size make sense as minION is not performing paired end sequencing.
@oliverdrechsel just checked its 0 everywhere. so this might be good as an optional parameter or do you have another idea that I can supply here instead? € not sure how the final report looks like (e.g. can I supply some more nanopore relevant things here)
@oliverdrechsel looking at your report script i tend to rewrite it as nextflow is more "file" oriented. so using dirs and recursively looking for inputs is counter intuitive here. Would this be okay if I fork your script and adjust it to the workflow?
It would be good to have at least a simple summary report for the reconstructed consensus sequence. This should include:
E.g. in a single PDF report per run.
For the first part (technical stats) it might be also enough to use the nextflow internal functions for reporting.