Closed salvoc81 closed 5 years ago
Hi Salvatore,
The Uniclust profiles need a different strategy to search against. The default profile search only works for at most a couple of 100k profiles, after that the memory requirements explode. We are currently working on a different profile search strategy for large databases. I'll update you once its ready.
Best regards, Milot
Thanks a lot @milot-mirdita . By the way do you think the HHBlit-PfamA profiles will be updated to the version 32 of Pfam anytime soon? I might consider using those...
Thanks a lot,
Salvo
Thanks for letting me know that there was an update. I just started the job, due to the irregular releases of the Pfam its not automated. If it doesn't run into any problems, we should have a new release up in a few days.
I just finished generating and uploaded the PfamA 32 db: http://gwdu111.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/pfamA_32.0.tar.gz
Thanks a lot Milot!
On Fri, Oct 26, 2018 at 4:13 AM Milot Mirdita notifications@github.com wrote:
I just finished generating and uploaded the PfamA 32 db:
http://gwdu111.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/pfamA_32.0.tar.gz
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/soedinglab/MMseqs2/issues/130#issuecomment-433171780, or mute the thread https://github.com/notifications/unsubscribe-auth/AMmgMRyHH3ZhAcw0xOnhzCdOVGwKu0YOks5uog1ygaJpZM4X5uqI .
Does the hhblits Pfam profiles work with mmseqs?
Anyway, hhblits runs very very slowly compared to mmseqs, so, if the Pfam profiles for hhblits don't work with mmseqs, I'd suggest using the Pfam profiles generated for mmseqs instead. It works great in my hands.
The hhblits PFAM profiles work with MMseqs2. However, I compared them recently to the PFAM.full MSAs and they were about equal with more effort needed to build the database. I would recommend to stick with the workflow described in the wiki.
HHblits will however be more sensitive than MMseqs2, due to its iterative profile-profile search capabilities.
Thanks Milot, I'm using what's described in the wiki, only using the Pfam-A.fasta.gz because I get results much more consistent with those obtained using HMMER with the Pfam-A HMM database (mmseqs does it in a fraction of the time, of course, for which I'm eternally grateful to you). Sorry for going off the topic here.
Happy to hear :)
I just added a small remark regarding k-mer size for the profile searches to the wiki entry (if you have enough system memory use -k 6
).
If I understood correctly, then the Pfam-A.full should be closer to our pfamA HHblits database, which represents three search iteration of the seed alignments against the Uniclust.
Yep. The Pfam-A.full should be closer to your HHblits database. However, it contained fewer families (yes, I know, I am surprised too) than the seed alignments (Pfam-A.fasta) last time I checked.
Yes, I'm using -k 6.
I'd like to insist suggesting that your program should report the memory and hard drive requirements in gigabytes to the user, even if it stays in bytes internally (please).
Best and thanks again.
Sorry, what I meant was that the Pfam-A.seed was the one that concurs with the hmm database. The fasta one is not a multiple alignment. I was a bit mistaken because I've been working with the CDD database too (which has multiple alignments in fasta format).
Sorry for the confusion if any was caused by my comment.
@salvoc81 The PfamA HH-suite database had one broken entry that was causing hhsearch to always fail and hhblits to possibly sometimes fail. Please download it again.
Is this solved now?
Hello @martin-steinegger and @milot-mirdita . Sorry if I could not test before... I have tried today to convert the HMM (Pfam 32) to profiles but I think some files are missing. The symlink to pfam_hmm.ffdata is missing, and pfam_hmm_db.index is missing.
I am not sure how to create the pfam_hmm_db.index file
Following the working command with pfamA_31.0.tgz
mmseqs convertprofiledb pfam_hhm_db pfam31_hhblits_profile --threads 36 -v 3
mmseqs createindex ./pfam31_hhblits_profile ./tmp -k 6 -s 7 --threads 36 -v 3
For version 31.0 of the package everything works fine, and search is completed correctly.
When I open the version 32 (pfamA_31.0.tgz) of the package, it does not contain the following files: pfam_hhm_db -> pfam_hhm.ffdata A symlink I can create myself, and pfam_hhm_db.index (a ~500K file which I am not sure how to create)
When I run the following command (after creating the symlink):
mmseqs convertprofiledb pfam_hhm_db pfam32_hhblits_profile --threads 36 -v 3
it fails with the following output:
convertprofiledb pfam_hhm_db pfam31_hhblits_profile --threads 36 -v 3
MMseqs Version: d36dea228b039f652a7d3e1c79e3e8d40df83125
Substitution matrix blosum62.out
Profile type 0
Threads 36
Compressed 0
Verbosity 3
No datafile could be found for pfam_hhm_db!
I have generated the symlinks as you suggested me, but the file pfam_hhm_db.index contained in the version 31 is not a symlink...
Sorry for the confusion. The _db.index
files were meant for compatibility with HHsuite 2.x. However, we dropped support for those.
The confusing part lies in that HHblits produces a data file with the suffix .ffdata
und index file with the suffix .ffindex
and MMseqs2 expects the same data file without suffix and the index file with suffix .index
.
You can make the following two symlinks:
ln -s pfam_hhm.ffdata pfam_hhm
ln -s pfam_hhm.ffindex pfam_hhm.index
And then call MMseqs2:
convertprofiledb pfam_hhm ...
Alternatively, I now changed this behavior in c9ac77558aa06391ead4dd95b5cf89eea715f348 to look for .ffdata and .ffindex first in convertprofiledb
Expected Behavior
Precompute mmseqs index tables are generate using
createindex
Current Behavior
Fails after a few minutes of computation with the following error message: indexdb died
Steps to Reproduce (for bugs)
MMseqs Output (for bugs)
Context
Trying to generate a profile DB from the file
uniclust30_2018_08_hhm_db
contained the 18-08 release of Uniclust30http://gwdu111.gwdg.de/~compbiol/uniclust/2018_08/uniclust30_2018_08_hhsuite.tar.gz
I am usingconvertprofiledb
and thencreateindex
...NOTE: I have used the same procedure to generate the profile DB using the HHblits profiles for Pfam 31 downloaded from:
http://wwwuser.gwdg.de/%7Ecompbiol/data/hhsuite/databases/hhsuite_dbs/pfamA_31.0.tgz
Your Environment