torognes / swarm

A robust and fast clustering method for amplicon-based studies
GNU Affero General Public License v3.0
123 stars 23 forks source link

pooled samples? #132

Closed dcm9123 closed 5 years ago

dcm9123 commented 5 years ago

Hello! So, I have encountered a problem that could be solved by swarm, however I am not sure. I currently have 20 sequenced samples. In every sample, I have the sequence of two different genes (msp1 and msp2) per .fastq file. When following the process of trimming, demultiplexing, pairing and converting to fasta, I will have refined files with the two genes. When I run swarm I will most likely get a plot representing two or more different clusters, given the nature of my files, however, is there any way that swarm deals with multiple genes per sample? Or will I have to deal with this by mapping to each gene, and then removing the DNA reads individually (.bam to .fastq)?

Thanks!

frederic-mahe commented 5 years ago

If I understand correctly, you have fastq files corresponding to samples produced using a mixed-PCR (two markers amplified simultaneously). If that's the case, then you can separate the markers pre-clustering with cutadapt (or atropos) using the primer sequences.

If for some reason you cannot do that, then an alternative would be to keep everything mixed, clusterize with swarm, then perform a double taxonomic assignment on the cluster representative sequences (assign against a reference database for each marker). Then you can compare the two assignments and tag each representative sequence as being marker 1 or marker 2 (note that some representative sequence will be hard to tag with certainty).