steineggerlab / Metabuli

Metabuli: specific and sensitive metagenomic classification via joint analysis of DNA and amino acid.
GNU General Public License v3.0
118 stars 10 forks source link

add --match-per-kmer option in classify workflow #22

Closed jaebeom-kim closed 1 year ago

jaebeom-kim commented 1 year ago

Metabuli allocates memory for storing matches between query k-mers and reference k-mers. As default, --match-per-kmer is 4, and enough memory to store four matches per one query k-mer is allocated. Four matches per k-mer are enough in most cases. But we added an option to adjust the value because it is reported that four is not enough when input reads are populated with low-complexity sequences. We already added --mask option in classify to prevent matches from low-complexity regions. When you meet overflow!!! signal even with the --mask 1, set --match-per-kmer > 4.