nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
182 stars 115 forks source link

Edge case: Clustering with VSEARCH fails at QIIME2_INSEQ #668

Closed d4straub closed 9 months ago

d4straub commented 10 months ago

Description of the bug

With --vsearch_cluster in some (potentially rare) edge case, QIIME2_INSEQ complains that the fasta file isnt valid. This is due to masking low complexity regions (in that case multiple G's in a row) and QIIME2 expects all capitalized nucleotide symbols.

Masking can be prevented with --qmask "none", so that a config that contains

process {
    withName: VSEARCH_CLUSTER {
        ext.args = '--id 0.97 --usersort --qmask "none"'
        ext.args2 = '--cluster_smallmem'
        ext.args3 = '--clusters'
    }
}

will fix the issue.

Command used and terminal output

No response

Relevant files

No response

System information

No response

d4straub commented 9 months ago

Thats in dev, will be in 2.8.0