Open jolespin opened 1 year ago
That's not supposed to happen. These are null bytes that separate entries in MMseqs2 databases. For some reason MMseqs2 read past an entry boundary and included the next entries too. Can you send us an (excerpt) of the input fasta file so we can try to debug please?
Thanks for looking into this!
Here's the clustered file that produced the null bytes. It's relatively small. mmseqs2_rep_seq.gt11.fasta.gz
Note sure if it helps but I've been searching for null bytes with grep -Pa '\x00' [filename]
You can add the --createdb-mode 0
parameter as a workaround.
Edit: A space saving optimization is going wrong: The check for the optimization to work correctly depends on --dbtype
not being set. The check should not depend on this parameter as its unrelated. Leaving out --dbtype
should also fix the problem.
Should be now fixed in 6b93884.
Awesome! Thank you for such the quick turn around. What is the best way to update my installation?
You don't need to. Either drop the --dbtype
parameter or add --createdb-mode 0
. Either should fix your issue.
Expected Behavior
Current Behavior
Here's the
rep_seq.fasta
file:Steps to Reproduce (for bugs)
MMseqs Output (for bugs)
Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.
No errors
Context
Confusing about what the ^@ characters are doing. It looks like they are concatenating the proteins?
Your Environment
Include as many relevant details about the environment you experienced the bug in.