torognes / swarm

A robust and fast clustering method for amplicon-based studies
GNU Affero General Public License v3.0
121 stars 23 forks source link

support ambigous nucleotides N #138

Closed sfehrmann closed 4 years ago

sfehrmann commented 4 years ago

Dear Frederic, dear Torbjorn,

Usually, my sequences harbor some low level of ambiguous N nucleotide characters. Such data is currently rejected by swarm. Will it be possible to support N nucleotides in the future? My workaround currently is somthing like sed 's/N/A/g' <in.fasta >out.fasta.

It's no big deal, as these are low frequency sequences and most of them are filtered out anyway. But I guess it's a common problem.

Best, C

sfehrmann commented 4 years ago

Apologies, I saw you dismissed a similar request in the past for performance reasons. Fair enough, closing this. Duplicate of #134

frederic-mahe commented 4 years ago

@Carambakaracho yes, swarm only accepts ACGT. In my experience though sequences with Ns typically account for a fraction of a percent of datasets. Ignoring or transforming them should have no impact whatsoever on downstream analyses.

Thanks though for taking the time to open an issue (and to read through previous issues).