replikation / poreCov

SARS-CoV-2 workflow for nanopore sequence data
https://case-group.github.io/
GNU General Public License v3.0
39 stars 17 forks source link

ww-poreCoV extension #275

Open hoelzer opened 1 month ago

hoelzer commented 1 month ago

I suggest using the poreCov pipeline as the backend for SARS-CoV-2 wastewater lineage deconvolution from nanopore long reads. You already added freyja ( #274 #270), which is great as the current community standard.

However, we are also interested in detecting new stuff, aka "cryptic lineages" or novel mutation profiles.

To do this, I would like to test/implement two recent approaches:

1) CONCOMPRA

2) Floria

By this, we would get known lineage abundances from freyja plus potential new lineages from one or both of the other tools.

Finally, we could also write a little ww-poreCoV extension paper ;)

hoelzer commented 1 month ago

Another interesting method might be VirPool: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-05100-3

They illustrated already the advantage that longer amplicons bring w/ their method in particular.

is able to use the entire length of reads instead of just the most informative positions, and can also capture haplotype dependencies within a single read. A crucial property of our model is its ability to capture long-range dependencies within reads, which is particularly relevant when coupled with use of long amplicons and nanopore sequencing.

They also tested the method on real data by sequencing a mixture of eight clinical samples using long amplicons (2kb).

(sidenote, attention was brought up and information was shared by Victor! thx!)

The question is, as usually, how up-to-date is the tool?

https://github.com/fmfi-compbio/virpool

They also provide scripts to create own profiles: https://github.com/fmfi-compbio/virpool?tab=readme-ov-file#creating-a-custom-variant-profile

Maybe covsonar could also create these profiles easily. Or Ashkans sc2mfc tool.

replikation commented 1 month ago

Yep, it's completely fine to extend Porecov now on the wastewater surveillance. We just need to make sure the normal "user experience" is not convoluted.

MarieLataretu commented 1 month ago

ad Floria:

You would use the VCF + BAM form ARTIC, right? Just to keep in mind: mixed indels might be tricky. I saw an overlapping x nt deletion and y nt deletion, and neither of both was called by medaka, and/or a frameshift introducing indel was called

edit: corrected tool name; Florida would be also a fun name

hoelzer commented 1 month ago

ad Florida:

You would use the VCF + BAM form ARTIC, right? Just to keep in mind: mixed indels might be tricky. I saw an overlapping x nt deletion and y nt deletion, and neither of both was called by medaka, and/or a frameshift introducing indel was called

Yes, I would like to use the output porecov anyway produces to change as little as possible.

However, good point. I would live with such issues for now. Deconvoluting lineages from wastewater is anyway wild west :) but of course, important to keep such situations in mind.

Ps: Floria ;)

replikation commented 1 month ago

Not sure if it is possible to implement a subcommand or a second "main.nf" solely for the wastewater part? (thinking samtools subcommand for instance)

MarieLataretu commented 1 month ago

I think subcommands are rather unusual - afaik, this is mainly handled by a parameter. E.g. viralrecon has a platform parameter for either Illumina, or nanopore: https://github.com/nf-core/viralrecon/blob/3731dd3a32a67a2648ea22c2bd980c224abdaee2/main.nf#L62-L76

Also, it could make trouble with the execution from GitHub nextflow run replikation/poreCov ... (or it needs some extra configuration)

hoelzer commented 3 weeks ago

Started working on this in a branch ww-porecov. First step was a container for CONCOMPRA which was already a bit pain, but made it finally

rkimf1/concompra:v0.0.1--f6c273d