torognes / swarm

A robust and fast clustering method for amplicon-based studies
GNU Affero General Public License v3.0
123 stars 23 forks source link

Does swarm work with nanopore short amplicons? #145

Closed dcm9123 closed 4 years ago

dcm9123 commented 4 years ago

Hi! I have used this software in the past to determine different haplotypes across short gene malaria markers with illumina data. However, I am currently working on this project that involves nanopore data. My depth is good (about 50,000) across gene markers that are less than 1kb of length. However, after I polish my nanopore reads (given its high error rate), and I run swarm using -d 1, -d 2, and -d 3, I do get clusters with very little amplicon abundance in them (my main OTU is never above 5 across all of my samples). Could I be getting this because I am using minION data? Could it also be because my alignment is not very good across all of my reads, so it excludes them?

Thanks in advance!

frederic-mahe commented 4 years ago

If you clusterize 1 kb-long sequences with d = 1, 2 or 3, you will link pairs of sequences with 1, 2 or 3 differences. Only 3 differences over 1000 bp is not much (0.03% dissimilarity), that could explain why you cannot link a lot of your sequences. You could try with higher d values, but a more interesting approach with relatively smaller dataset such as this one might be to try full-knowledge clustering, where you know the pairwise distances of all your sequences. If you have the possibility to share a fasta file with me, I can have a look too.

dcm9123 commented 4 years ago

I see, that makes sense, especially if I have a high error rate from the nanopore output (q score mean is about 12 after polishing). I'll give it a try with higher d values, although I am not sure how to justify such high values for genotyping when it comes to publishing. I was also thinking to do a snp validation process where I accept snps at a giving nucleotide only if 50% of the reads support it as well, but then again I have to justify this number. And sure! Let me send you a couple of fasta files! Thanks a lot!

dcm9123 commented 4 years ago

Hi Frederic! Thank you for your fast reply. Yes, I was thinking to do the same and increase the 'd' parameter, although I am not sure how could I justify that in the future when I consider to publish, as I am currently getting a hard time with reviewers for using a '-d 2' in another project that involves illumina sequencing. I was also considering to adjust the global alignment parameters to something more 'relaxed', could this work? And yes! I am sending you an email with 4 files as I type! Thanks a lot!