torognes / swarm

A robust and fast clustering method for amplicon-based studies
GNU Affero General Public License v3.0
121 stars 23 forks source link

Option to exclude seeds or clusters with low abundance #176

Open torognes opened 1 year ago

torognes commented 1 year ago

A user have suggested an option to exclude seeds sequences or clusters with low abundance (e.g. singletons). This could be applied to the dereplicated input sequences or to the output clusters.

Even if this does not save much time, a large amount of output could be avoided, especially when there are lots of singletons.

torognes commented 1 year ago

To be a bit more precise: clusters with centroids/seeds that have a low abundance could optionally be excluded, but low-abundance sequences could be part of other clusters. This cannot be done by simply filtering input or output based on abundance.