Closed kemin711 closed 5 years ago
Hi @kemin711 swarm only accepts sequences with unambiguous nucleotides (ACGT). This is a design choice that allows swarm to be very fast, and supporting ambiguous nucleotides would mean a major slowdown.
With Illumina sequencing, ambiguous nucleotides are rare (very few sequences with Ns), and swarm's strict support has never been a problem for me. If for some reason you end up with sequences with Ns, you can filter them out with vsearch
, or you can only remove or replace the Ns:
sed '/^>/ ! s/[Nn]//g' input.fasta | swarm
Error: Illegal character 'N' in sequence on line 28062 This is not an issue, but could be handled by the algorithm. If we encounter N, then we should count the base as different. Not sure how much work is needed. The user can filter out sequences with N (this will removed some data).