Open konstin opened 3 years ago
In ffindex.h
you can increase the following define.
#define FFINDEX_MAX_ENTRY_NAME_LENTH 32
We didn't want to change that as we are not 100% sure it won't break anything downstream.
You can build a database with MMseqs2 (mmseqs createdb
) and symlink the required file names for HH-suite as described in the other thread (https://github.com/soedinglab/hh-suite/issues/262#issuecomment-831439909).
I changed the source and did some quick tests. Found no problems so far (this doesn't mean that I encourage people to do this).
@milot-mirdita This is sort of related to this thread: I was trying to create a ffdata
and ffindex
files from A3M alignments using ffindex_build
and I got the character limit error. Could I use mmseqs createdb
here? As far as I know there's no way of creating a MSA DB from a set of A3M files using MMSeqs2 (convertmsa
only allows Stockholm).
mmseqs tar2db
might be one of the easiest ways to create a database like that.
Sorry, my createdb
suggestion doesn't actually work, for the original problem you'd need something with mmseqs createseqfiledb
.
I tried to search a fasta file containing ids more than 31 characters using hhblits, which resulted in an opaque error message when hhblits reached that sequence.
Expected Behavior
hhblits works with sequence ids longer than 31 characters
Current Behavior
hhblits should either work with sequence ids longer than 31 characters, or inform the user (ideally in the beginning) the long ids are not supported
Steps to Reproduce (for bugs)
Minimized example
cath_5.fasta
:Try to build a custom database:
This gives the following output:
Looking into
cath_5_fas.ffindex
(notice the non-ascii characters):The non-ASCII character get rendered differently depending on the editor:
To check let's use only ids with length <32:
This file now passes (with v3.1.0; with v3.3.0 it crashes due to #260)
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Your Environment