ww-poreCoV extension - Githubissues

hoelzer commented 3 months ago

I suggest using the poreCov pipeline as the backend for SARS-CoV-2 wastewater lineage deconvolution from nanopore long reads. You already added freyja ( #274 #270), which is great as the current community standard.

However, we are also interested in detecting new stuff, aka "cryptic lineages" or novel mutation profiles.

To do this, I would like to test/implement two recent approaches:

1) CONCOMPRA

https://github.com/willem-stock/CONCOMPRA
consensus approach for community profiling with nanopore amplicon sequencing data, focused on 16S rDNA
I already tested this on example nanopore data from mixed patient samples (simulating wastewater) and it looked very promising
should be easy to add as a single new process
challenge is that the tool only works on one primer pair (https://github.com/willem-stock/CONCOMPRA/issues/1)

2) Floria

https://github.com/bluenote-1577/floria | https://doi.org/10.1093/bioinformatics/btae252
Strain-level haplotyping for metagenomes with short or long reads.
I am curious how well this works on a) amplicon data and b) SARS-CoV-2
input is VCF and mapped reads (BAM): so I think should be also easy to add

By this, we would get known lineage abundances from freyja plus potential new lineages from one or both of the other tools.

Finally, we could also write a little ww-poreCoV extension paper ;)

hoelzer commented 3 months ago

Another interesting method might be VirPool: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-05100-3

They illustrated already the advantage that longer amplicons bring w/ their method in particular.

is able to use the entire length of reads instead of just the most informative positions, and can also capture haplotype dependencies within a single read. A crucial property of our model is its ability to capture long-range dependencies within reads, which is particularly relevant when coupled with use of long amplicons and nanopore sequencing.

They also tested the method on real data by sequencing a mixture of eight clinical samples using long amplicons (2kb).

(sidenote, attention was brought up and information was shared by Victor! thx!)

The question is, as usually, how up-to-date is the tool?

https://github.com/fmfi-compbio/virpool

They also provide scripts to create own profiles: https://github.com/fmfi-compbio/virpool?tab=readme-ov-file#creating-a-custom-variant-profile

Maybe covsonar could also create these profiles easily. Or Ashkans sc2mfc tool.

replikation commented 3 months ago

Yep, it's completely fine to extend Porecov now on the wastewater surveillance. We just need to make sure the normal "user experience" is not convoluted.

MarieLataretu commented 3 months ago

ad Floria:

You would use the VCF + BAM form ARTIC, right? Just to keep in mind: mixed indels might be tricky. I saw an overlapping x nt deletion and y nt deletion, and neither of both was called by medaka, and/or a frameshift introducing indel was called

edit: corrected tool name; Florida would be also a fun name

hoelzer commented 3 months ago

ad Florida:

You would use the VCF + BAM form ARTIC, right? Just to keep in mind: mixed indels might be tricky. I saw an overlapping x nt deletion and y nt deletion, and neither of both was called by medaka, and/or a frameshift introducing indel was called

Yes, I would like to use the output porecov anyway produces to change as little as possible.

However, good point. I would live with such issues for now. Deconvoluting lineages from wastewater is anyway wild west :) but of course, important to keep such situations in mind.

Ps: Floria ;)

replikation commented 3 months ago

Not sure if it is possible to implement a subcommand or a second "main.nf" solely for the wastewater part? (thinking samtools subcommand for instance)

MarieLataretu commented 3 months ago

I think subcommands are rather unusual - afaik, this is mainly handled by a parameter. E.g. viralrecon has a platform parameter for either Illumina, or nanopore: https://github.com/nf-core/viralrecon/blob/3731dd3a32a67a2648ea22c2bd980c224abdaee2/main.nf#L62-L76

Also, it could make trouble with the execution from GitHub nextflow run replikation/poreCov ... (or it needs some extra configuration)

hoelzer commented 3 months ago

Started working on this in a branch ww-porecov. First step was a container for CONCOMPRA which was already a bit pain, but made it finally

rkimf1/concompra:v0.0.1--f6c273d

hoelzer commented 4 weeks ago

The authors of CONCOMPRA provide a docker now: willemstock/concompra:version0.0.2

However, we need to see if that works w/ nextflow

hoelzer commented 1 week ago

This might be even more interesting to add as a process to ww-poreCov:

https://www.medrxiv.org/content/10.1101/2024.08.27.24312690v1
Unsupervised detection of SARS-CoV-2 mutations and lineages in Norwegian wastewater samples using long-read sequencing
https://github.com/garcia-nacho/HERCULES

One big questions is probably: how do they handle the reference used for lineage assignments at the end? And: can they detect something now/cryptic

replikation / poreCov

ww-poreCoV extension #275