Closed genomewalker closed 7 years ago
Thanks for the bug report.
case '\n':
if (inHeader) {
inHeader = false;
} else {
if (seqLength > maxSeqLength) {
maxSeqLength = seqLength;
}
seqLength = 0;
}
break;
default:
if (!inHeader) {
seqLength++;
}
break;
These lines in util/msa2profile.cpp are wrong, i'll see if I can fix the issue tomorrow.
(Side remark, databases from ffindex_build calls are somewhat dangerous for mmseqs modules that do random accesses (msa2profile only does linear access). Please sort the index file numerically first (LC_ALL=C sort -n DB.index > DB.index_sorted && mv -f DB.index_sorted DB.index)).
Thanks Milot!
Thanks for the bug report again. The issue should now be resolved.
Another caveat about msa2profile. It requires a sensible query sequence, since all gap columns with respect to the the query are discarded. You might first want to execute hhconsensus from our hh-suite software on each MSA. With the -M parameter you can choose which columns are match columns are included for consensus sequence computation. This consensus sequence is then prepended to the MSA and msa2profile will not discard potentially useful columns in the MSA.
Thanks! Just tested it and seems to work fine now. Regarding the consensus building, we already have HMM (HMMER3) profiles for those MSAs. Reading MMseqs2 help there is convertprofiledb that seems to be able to convert the HMM from HMMER3 to the MMeqs2 format. Do you recommend it? Or better I go through hhconsensus and use msa2profile?
Using the HMMER3 hmms is probably a bad idea, they already include pseudocounts, which will negatively affect the sensitivity of MMseqs2.
I am currently evaluating all those tools again, but I don't have a clear recommendation yet.
If your first sequence in the alignment is not a real query, then your two options are to use hhconsensus
+ msa2profile
or hhconsensus
+ hhmake -nocontxt -diff 1000
(way slower) + convertprofiledb
. msa2profile
is the newer tool as not as well tested as convertprofiledb
. The testing we already did indicates that it is working well though.
Expected Behavior
When using a MSA DB from a fasta MSA with multi-lines msa2profile should create a profile.
Current Behavior
Now it fails with the error
Member sequence 0 in entry 0 too long!
Steps to Reproduce (for bugs)
Fasta MSA with multi-line: test_msa_ml.fa.gz Fasta MSA with single line: test_msa_sl.fa.gz
When creating the profile with the multi line MSA:
It results in:
But when using the fasta MSA with single lines:
It seems to work perfectly:
Your Environment
Include as many relevant details about the environment you experienced the bug in.