rcedgar / muscle

Multiple sequence and structure alignment with top benchmark scores scalable to thousands of sequences. Generates replicate alignments, enabling assessment of downstream analyses such as trees and predicted structures.
https://drive5.com/muscle
GNU General Public License v3.0
186 stars 21 forks source link

usorter.cpp(88) assert failed: L >= m_WordLength #67

Closed yifengyuan closed 10 months ago

yifengyuan commented 10 months ago

Hi Developer,

I am using MUSCLE5.1 to align ~2000 bacterial ribosomal proteins. I got the error as below.

muscle 5.1.linux64 [12f0e2] 99.0Gb RAM, 16 cores Built Jan 13 2022 23:17:13 (C) Copyright 2004-2021 Robert C. Edgar. https://drive5.com

Input: 2146 seqs, length avg 273 max 392

00:00 5.7Mb 100.0% Derep 1083 uniques, 1062 dupes 00:00 6.1Mb CPU has 16 cores, running 4 threads
01:01 12Mb 100.0% UCLUST 1084 seqs EE<0.01, 327 centroids, 756 members

muscle5.1.linux_intel64 -super5 riboseq2203genomes/S2p.fasta -output riboseq2203genomes/S2p.fasta.MUSCLE5 -threads 4 Elapsed time 01:01 Max memory 12Mb

---Fatal error--- usorter.cpp(88) assert failed: L >= m_WordLength

######## The AA sequences look like below:

fig|3019912.3.peg.1395|PF049_06905 SSU ribosomal protein S2p (SAe) [Erythrobacteraceae bacterium WH01K | 3019912.3] MAAPTVTMQQLIEAGAHFGHQTHRWNPRMKPYIFGARNGVHIIDLSQTVPLMARALDFVA STAASGGKVLFVGTKRQAQEPMAEAARMSGQHFVNHRWLGGMLTNWKTISQSIKRLKSLD EQLGGEISGLTKKEVLQLTRERDKLELSLGGIRDMGGIPDVMIVVDANKEDLAIKEANVL GIPVVGILDTNVDPSGISFPVPGNDDAARAVRLYTEAFGEAAAAGRNQAQGKNAGEMENP PAEAAA fig|3019602.3.peg.412|P6U16_01770 SSU ribosomal protein S2p (SAe) [Rhizobium sp. 32/5-1 | 3019602.3] MALPDFSMRQLLEAGVHFGHQTHRWNPKMKPYIFGDRSNVHIIDLAQTVPMLSRALQMVS DTVAKGGRVLFVGTKRQASEIIADAAKRSAQYYVNARWLGGMMTNWKTISNSIQRLRKLD EILNSEASGFTKKERLNLEREREKLNRALGGIRDMGGVPDLMFIIDTNKESIAIDEAKRL GIPVVAIIDSNCDPDRIDYPIPGNDDASRAISLYCDLISRAALDGIARQQGASGRDLGAS AELPVEPALEEAAEA

Could you please advice? Thank you.

rcedgar commented 10 months ago

this is a bug (there should be a better error message) caused by a very short sequence, you have at least one sequence which is around 3 letters or less.

yifengyuan commented 10 months ago

You are correct. Thank you very much.