soedinglab / hh-suite

Remote protein homology detection suite.
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7
GNU General Public License v3.0
544 stars 134 forks source link

Why need to refer a database when I build a customized database? #332

Open EileenLLL opened 1 year ago

EileenLLL commented 1 year ago

Dear all, I want to use hhblits to build my own database(1,040,000seqs). I followed the tutorial "Building customized databases", and the second step is to build an MSA with HHblits for each sequence, comands as following: """mpirun -np \ hhblits_mpi -i _fas -d <path_to/uniclust30> -oa3m _a3m_wo_ss -n 2 -cpu 1 -v 0"""

I wonder why I need to define a -d <path_to/uniclust30>, the procedure is using my seqs to search homologous sequence from uniclust30? why do that? Also I learned that, I can split my fasta database to many single seqs, and then search hhr for each seq using hhblits from uniclust30. However I don't understand why we should do that, and why I can just change my fasta database to hhms directly?

Thanks

Citugulia40 commented 1 year ago

Hi, I am facing the same thing, are you able to get it? I am not able to get the exact steps to run the my 2 million query sequences against my own database of 250 seq.

Please let me know

Thanks

milot-mirdita commented 11 months ago

HMM-HMM comparisions require a diverse MSA on both sides to build good profiles for either.

If you just want to do a sequence-sequence, profile-sequence or sequence-profile search, please use MMseqs2 as it doesn't require involved steps to build databases.