Open hoelzer opened 3 months ago
Another interesting method might be VirPool: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-05100-3
They illustrated already the advantage that longer amplicons bring w/ their method in particular.
is able to use the entire length of reads instead of just the most informative positions, and can also capture haplotype dependencies within a single read. A crucial property of our model is its ability to capture long-range dependencies within reads, which is particularly relevant when coupled with use of long amplicons and nanopore sequencing.
They also tested the method on real data by sequencing a mixture of eight clinical samples using long amplicons (2kb).
(sidenote, attention was brought up and information was shared by Victor! thx!)
The question is, as usually, how up-to-date is the tool?
https://github.com/fmfi-compbio/virpool
They also provide scripts to create own profiles: https://github.com/fmfi-compbio/virpool?tab=readme-ov-file#creating-a-custom-variant-profile
Maybe covsonar
could also create these profiles easily. Or Ashkans sc2mfc
tool.
Yep, it's completely fine to extend Porecov now on the wastewater surveillance. We just need to make sure the normal "user experience" is not convoluted.
ad Floria
:
You would use the VCF + BAM form ARTIC, right? Just to keep in mind: mixed indels might be tricky. I saw an overlapping x nt deletion and y nt deletion, and neither of both was called by medaka, and/or a frameshift introducing indel was called
edit: corrected tool name; Florida would be also a fun name
ad
Florida
:You would use the VCF + BAM form ARTIC, right? Just to keep in mind: mixed indels might be tricky. I saw an overlapping x nt deletion and y nt deletion, and neither of both was called by medaka, and/or a frameshift introducing indel was called
Yes, I would like to use the output porecov anyway produces to change as little as possible.
However, good point. I would live with such issues for now. Deconvoluting lineages from wastewater is anyway wild west :) but of course, important to keep such situations in mind.
Ps: Floria
;)
Not sure if it is possible to implement a subcommand or a second "main.nf" solely for the wastewater part? (thinking samtools subcommand for instance)
I think subcommands are rather unusual - afaik, this is mainly handled by a parameter.
E.g. viralrecon has a platform
parameter for either Illumina, or nanopore:
https://github.com/nf-core/viralrecon/blob/3731dd3a32a67a2648ea22c2bd980c224abdaee2/main.nf#L62-L76
Also, it could make trouble with the execution from GitHub nextflow run replikation/poreCov ...
(or it needs some extra configuration)
Started working on this in a branch ww-porecov
. First step was a container for CONCOMPRA
which was already a bit pain, but made it finally
rkimf1/concompra:v0.0.1--f6c273d
The authors of CONCOMPRA
provide a docker now: willemstock/concompra:version0.0.2
However, we need to see if that works w/ nextflow
This might be even more interesting to add as a process to ww-poreCov:
One big questions is probably: how do they handle the reference used for lineage assignments at the end? And: can they detect something now/cryptic
I suggest using the poreCov pipeline as the backend for SARS-CoV-2 wastewater lineage deconvolution from nanopore long reads. You already added
freyja
( #274 #270), which is great as the current community standard.However, we are also interested in detecting new stuff, aka "cryptic lineages" or novel mutation profiles.
To do this, I would like to test/implement two recent approaches:
1)
CONCOMPRA
2)
Floria
By this, we would get known lineage abundances from
freyja
plus potential new lineages from one or both of the other tools.Finally, we could also write a little ww-poreCoV extension paper ;)