Closed thermokarst closed 4 years ago
in case this helps direct decision-making on this issue (esp. re: question 1), I have added a method to one of my plugins that uses vsearch to filter a FeatureData[Sequence]
artifact by length. I am planning on releasing this in 2020.6.
So if it's possible, you could disable the min-length filter (or set it to the lowest threshold possible), and users can apply their own length filter post-clustering if desired.
Yes, vsearch applies a minimum sequence length filter of 32 nucleotides for clustering, dereplication and search commands (cluster_smallmem
, cluster_fast
, cluster_size
, cluster_unoise
, derep_fulllength
, derep_prefix
, makeudb_usearch
, sintax
, usearch_global
) and 1 for other commands. This was implemented for maximum compatibility with usearch (version 7). It can be turned off with the option --minseqlength 1
for the commands where it is relevant.
Thanks @torognes!
@Oddant1 - can you please work on this bug when you get a chance, it would be great if we could resolve it in time for 2020.6. I think using the --minseqlength 1
flag in the internal calls to vsearch should handle this (which means we go with option 2 above, in my original post, as long as @nbokulich agrees).
Bug Description vsearch apparently applies a minimum length filter of 32 nts to input sequences - our
cluster-features-*
actions appear to assume that no reads are going to be filtered by vsearch, so there is no cross-referencing or post-vsearch filtering applied.Steps to reproduce the behavior
Expected behavior I see at least two ways to solve, detailed in questions 1 and 2, below.
Questions
References