Closed deprekate closed 9 years ago
Hi @deprekate,
Here is what I normally do for my own multi-sample analyses:
In my own experience, I always have to assemble paired-end reads with pear, and I always have to trim primers and adaptors from sequences with cutadapt (if your experience is different, I'd be interested to read you). For swarm to be able to work directly with fastq files, we would need to duplicate these rather complex pieces of software.
We are trying to keep swarm light and streamlined: swarm should be an element in a pipeline, not a pipeline in itself.
Thanks for understanding,
I have amplicon unpaired reads, which are already trimmed, in fastq format. I just want to remove all technical replicates (allow ~1bp mismatch for sequencing error).
I guess VSEARCH is the tool I want, and not SWARM?
And yep, I am using SWARM in a pipeline to replace a sequence assembly step, when users have amplicon reads, instead of shotgun reads.
@frederic-mahe :clap:
We are trying to keep swarm light and streamlined: swarm should be an element in a pipeline, not a pipeline in itself.
Thank you for developing Swarm according to the linux philosophy of 'do one thing well'
@deprekate
I would recommend converting your fastq files to fasta, then dereplicate with vsearch.
sed -n '1~4s/^@/>/p;2~4p' sequences.fastq > sequences.fasta
vsearch -derep_fulllength sequences.fasta -sizeout -output sequences.derep.fna
If you are only interested in removing '~1bp mismatch', you may be able to use swarm with d=1
Thanks @colinbrislawn,
indeed the next command is:
swarm -d 1 -z -w sequences.derep.seeds.fna sequences.derep.fna > /dev/null
to clusterize using ;size=
abundance pattern and to collect representative sequences in fasta format (hence using swarm as a denoising method).
I am going to close that issue. Feel free to re-open it if need be.
It would be helpful if SWARM could process fastq files. Even if it did not use the quality score information, and still used the raw nucleotides (incorporating quality scores is large undertaking).