replikation / poreCov

SARS-CoV-2 workflow for nanopore sequence data
https://case-group.github.io/
GNU General Public License v3.0
40 stars 17 forks source link

Integration of VarSkip NEB primern #201

Closed BoehmONT closed 2 years ago

BoehmONT commented 2 years ago

First, I like to thank you for developing this workflow for scientist who have no knowledge with bioinformatics.

To my questions, it is possible to integrate the VarSkip NEB primer from NEBNext ARTIC SARS-CoV-2 Companion Kit (E7660) to your workflow? The technical support of NEB told me for sequencing omicron samples, I should perform two libaries (one with V3 primer and one with VarSkip primer) and two barcoding steps. After sequencing with MinION I have to merge the two single sequences. Therefore, we should use the software Picard´s MergeSamFiles. Unfortunately, my knowledge of bioinformatics is strongly limtied. If the VarSkip NEB primer are available in your poreCov script, excist there also a tool for this problem?

replikation commented 2 years ago

hi,

we currently have the VarSkipV1a implemented via the --primerV option. Did you try to run this already? or with V3?

the main problem is that any SARS cov workflow (not just this one) needs all the primer Infos (usually stored in a .bed file) to remove them from the amplicons. so in your case, you probably need to combine the V3 and Varskip .bed files and use this for the genome assembly.

we want to add for the next release "custom" bed files for such edge cases as you have.

replikation commented 2 years ago

usually, i would recommend using the artic primer version 4.1 they address this omicron part directly or using the midnight primer (V1200) as they seem to be stable throughout the pandemic so far.

the primer mixing part is a bit problematic as we cant support all the possible mixing options

BoehmONT commented 2 years ago

Hi Christian,

first of all, thank you for the good advices. Where can I get the V4.1 or V1200 primers? Can we buy the primers commercially? We are a diagnostic lab and so far we used commercial kits like the one from NEB to keep the sequencing as simple as possible. At the moment, we are sequencing with the V3 primer and rechieved a lot of drop outs, which prevents us from sending the sequences to the RKI dashboard. For us, it will be easier to get the ARTIC V4.1 primer so we can simply use your poreCoV workflow. The NEB tech. support wouldn't give me a concrete answer if they plan to include the V 4.1 primers in their kit. Furthermore, I found a Varskip script from bwlang on github, which contain several bed files of VarSkip primer. If I integrate these primer sequences into the poreCov workflow, genome assembly will be done automatically by your workflow afterwards?

Sorry for my entire questions, but I am struggling due to the omicron variant and new primer set-up.

replikation commented 2 years ago

@hoelzer can you comment on the 4.1 ?

the V1200 or midnight primers are sold via nanopore I think

BoehmONT commented 2 years ago

Thanks a lot.

hoelzer commented 2 years ago

Hey, I'm not ordering the wet lab stuff but can ask regarding V4.1 @BoehmONT ;)

hoelzer commented 2 years ago

Got answer: IDT now has V4 with V4.1 spike-in as a finished product but unfortunately no "pure" V4.1 @BoehmONT - hope that helps!

bwlang commented 2 years ago

I'm the designer of the VarSkip primers (designed in more conserved regions of the genome, hopefully for better resilience to mutations). This approach worked well with beta, gamma, delta, mu etc, but omicron seems to have reset the definition of conservation in some regions. https://primer-monitor.neb.com/lineages

VarSkip v2 was specifically modified to handle omicron variants (and all previous variants). Note that this is not a spiked-in mix containing old and new primers but a dedicated reformulation.

@oliverdrechsel was planning to add these to porecov via a PR I think. Meanwhile, it is possible to use the primers from I think https://github.com/nebiolabs/varskip . Let me know if you'd like me to make the PR instead.

replikation commented 2 years ago

@bwlang to you have a bed file for the new primer set VarSkip v2 we could then just simply add it like V1.

oliverdrechsel commented 2 years ago

Hi all, it needed a little clean up to add the bed file to porecov. I'll issue a pull request today or tomorrow.

cheers

bwlang commented 2 years ago

@replikation :Yep - it's available here: https://github.com/nebiolabs/VarSkip/blob/main/neb_vss2.primer.bed

(v1a and VarSkip long are there too)

I thought this format would work - but maybe it needs a bit more love to meet porecov's standards.

I’ll try to get porecov working on a local VarSkip v2 dataset today. I'm curious it it will handle an ONT only TTTTT->TTTTTT issue (around position 5386) better than field bioinformatics.

bwlang commented 2 years ago

@oliverdrechsel : thanks for working on this!

hoelzer commented 2 years ago

@bwlang thanks for the information on the VarSkip v2, thats very helpful and thx @oliverdrechsel for preparing the PR!

A bit offtopic, but regarding your TTTTT->TTTTTT issue, actually the basecalling model might have the largest impact? Running sup instead of hac might help here as well, I could imagine. And/or switching from Medaka to Nanopolish within the ARTIC workflow (also possible via poreCov).

BoehmONT commented 2 years ago

First of all, I like to thanks everyone of you for your entire support.

@hoelzer: sorry for asking, what is IDT? @bwlang: NEB technical support has recommended to use separate ONT barcodes for the V3 primer and VSS primer DNA libraries because the two libraries must be bioinformatically processed separately using the corresponding .bed file to remove the primer sequence from the amplicons. Subsequently the two BAM files have to merge via Picard's MergeSamFiles to generate a comined consensus sequence for variant calling.

Is it still possible to work with the poreCov workflow? Actually, I am dependent on the poreCoV workflow because my bioinformatics knowledge is severely limited.

BoehmONT commented 2 years ago

Is a merge samfile included in the poreCov workflow? Or do further steps have to be taken to get standardized data for the RKI dashboard?

hoelzer commented 2 years ago

@BoehmONT IDT: https://eu.idtdna.com/pages/landing/coronavirus-research-reagents (at least I think so, srry bioinformatician speaking here that only passes information from the wet lab ;) )

replikation commented 2 years ago

we don't have a input for sam/bam files.

hoelzer commented 2 years ago

... and I think (not 100% sure, needs proper testing and double-checking your data e.g. via a Genome Browser) that you can also provide a BED file with mixed primer positions e.g. from V3 and VSS. In the pipeline, all these sequences will be checked and primer-clipped then. It's basically what we are doing w/ V4 and spike-in primers from V4.1 and this worked.

BoehmONT commented 2 years ago

@hoelzer do you mean, I could use for both primer V3 and VSS an identical ONT barcode for sequencing? So, if we use the poreCov workflow my sequences will be merged by your workflow?

bwlang commented 2 years ago

@BoehmONT : Sorry if wasn't clear - I work at NEB and made that recommendation for people to get as close to complete coverage as possible with the reagents in existing kits while we finished manufacturing the new V2 reagents. I do it by manually merging BAM files and generating consensus files in Galaxy. I can share a workflow with you if that's helpful. You can also reach out to me directly using my @neb.com.

I don't think it's possible to do a combined analysis in only one pass using any existing pipeline since each pipeline asks for only a single bed file describing the primer locations. It would not be good to combine the bed files, since too many sites would be masked. Instead one would need to run the analysis twice; once for each primer set. A third analysis pass to combine the bam files could then be used to call variants and build a consensus. I sounds like poreCov does not support starting at bam files - though that would be cool. This specific scenario is pretty unusual, but I often want to combine technical replicates for analysis and that would be more efficient than restarting analysis from combined fastq files.

I want to be really clear that this approach requires 2 distinct libraries with different ONT barcodes so it's not really practical except for samples that require absolutely complete coverage. With a single VarSkip 1a library (or ARTIC v3/v4) , a consensus can be generated and the lineage can be reliably identified - but there will likely be at least 2 ~400 bp stretches of Ns unless you sequences for a very long time.

By the way, IDT is a US oligo synthesis company (sort of like eurofins in EU).

replikation commented 2 years ago