soedinglab / hh-suite

Remote protein homology detection suite.
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7
GNU General Public License v3.0
515 stars 128 forks source link

hhalign vs hhsearch #255

Open ksteczk opened 3 years ago

ksteczk commented 3 years ago

I have two profiles within the pfam database: PF00336.20 and PF17919.3. Querying the database using hhsearch with PF00336 hmm as a query I got mapping to PF17919: ffindex_get pfam_hhm.ffdata pfam_hhm.ffindex PF00336.20|hhsearch -i stdin -d pfam -shift 0.01 -mact 0.1 -v 0 -o stdout -cpu 24 line from the results of interest: 5 PF17919.3 ; RT_RNaseH_2 ; RNas 92.7 0.49 2.6E-05 29.2 7.7 48 69-116 1-52 (99) Probab=92.67 E-value=0.49 Score=29.24 Aligned_cols=48 Identities=10% Similarity=0.076 Sum_probs=16.2 Template_Neff=9.800

Yet, if I take the two profiles from the db and align the with hhalign: hhalign -i ~/PF00336.20.hhm -t ~/PF17919.3.hhm -mact 0.1 -shift 0.01 I get longer alignment (which is what I'm trying to get): 1 PF17919.3 ; RT_RNaseH_2 ; RNas 92.7 2.6E-05 2.6E-05 29.2 7.7 78 69-146 1-94 (99) Probab=92.67 E-value=2.6e-05 Score=29.24 Aligned_cols=78 Identities=13% Similarity=0.103 Sum_probs=45.4 Template_Neff=9.800

The scores are identical but the alignment differs. Does the hhsearch align the profiles differently compared to hhalign? Is it possible to tweak hhsearch to make it perform like hhalign?

ksteczk commented 3 years ago

I think I found the reason - I was comparing hmm profiles prepared with altered neff threshold but didn't filter the a3m dataset so when hhsearch was searching the database it was using non filtered a3m profiles although the hmm database was already filtered. Am I thinking right? hhsearch is using both hmm and a3m for aligning hmm profiles?

milot-mirdita commented 3 years ago

When doing a database lookup, HHsearch and HHblits will first look if an HHM exist and prefer that, If no HHM exists it will take the a3m and compute the HHM on the fly.