vtsyvina / CliqueSNV

MIT License
21 stars 5 forks source link

UMI Nanopore Amplicons #3

Open chrisLanderson opened 4 years ago

chrisLanderson commented 4 years ago

Hello! Thanks for putting together CliqueSNV - the ability to leverage long reads should be really powerful across different fields I think. I was curious if it would be possible to use long reads that have already gone through error correction with CliqueSNV? Specifically, I am referring to amplicons sequenced with Oxford Nanopore with unique molecular identifier sequences (UMI - see: https://www.biorxiv.org/content/10.1101/645903v3.full). In short, the attachment of the UMI to amplicons allows for extensive error correction of the sequences. Would it be possible to effectively use these sequences with CliqueSNV? If so, which pipeline (Illumina or PacBio) which you suggest trying to input the amplicon sequences? I believe the PacBio pipeline filters ~10% of reads based on quality and since these reads have been error corrected perhaps there is a way to eliminate the filter step as a user? Any insight on how to leverage the unique properties of these reads with CliqueSNV would be greatly appreciated.

Thanks again!

gallardo-seq commented 3 years ago

Hi Chris, I saw your message and figured I'd respond to this. We have successfully used CliqueSNV with error-corrected reads, in our case with our internally developed methodology (https://www.biorxiv.org/content/10.1101/2021.01.27.428469v1.full). We got residual error-rates down to 0.1% and we were able to unambiguously use CliqueSNV to identify haplotype clusters in our validation/control datasets, and also in our patient isolates. We have the conditions in the methods section titled "Generation of viral haplotype clusters from high accuracy Gag-Pol reads". Hopefully this helps!