nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
188 stars 118 forks source link

Edge case: Clustering with VSEARCH fails at QIIME2_INSEQ #668

Closed d4straub closed 11 months ago

d4straub commented 11 months ago

Description of the bug

With --vsearch_cluster in some (potentially rare) edge case, QIIME2_INSEQ complains that the fasta file isnt valid. This is due to masking low complexity regions (in that case multiple G's in a row) and QIIME2 expects all capitalized nucleotide symbols.

Masking can be prevented with --qmask "none", so that a config that contains

process {
    withName: VSEARCH_CLUSTER {
        ext.args = '--id 0.97 --usersort --qmask "none"'
        ext.args2 = '--cluster_smallmem'
        ext.args3 = '--clusters'
    }
}

will fix the issue.

Command used and terminal output

No response

Relevant files

No response

System information

No response

d4straub commented 11 months ago

Thats in dev, will be in 2.8.0