soedinglab / hh-suite

Remote protein homology detection suite.
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7
GNU General Public License v3.0
529 stars 133 forks source link

HHsearch - inconsistent results #355

Open majnusova opened 1 year ago

majnusova commented 1 year ago

Hi, I'm trying to use hhsearch with a customized database. Both query and database alignments represent an entire family of homologs (there is no seed sequence). Hence, I reckon that -M 50 parameter should be applied to both query and database HMMs inferred from these alignments.

I tried to apply the -M 50 parameter and then, to test if it works, I changed the order of sequences in the alignments that were used to build the query HMMs, ran hhsearch, and it seemed to be OK (the results were basically the same when compared to the results inferred from the original alignments). But then, I changed the order of sequences in the alignments that were used to build the customized database and I got different results - it seems that the -M 50 parameter does not work...

Could you please check the bash script I've been using and help me to use the tool correctly? Thanks a lot!

#!/bin/sh

cd /home/users/mvladka/miniconda3/envs/hhsuite/msa

mamba activate hhsuite

ffindex_build -s ../mydatabase.ff{data,index} .
cd ..

ffindex_apply mydatabase.ff{data,index} -i mydatabase_a3m_wo_ss.ffindex -d mydatabase_a3m_wo_ss.ffdata -- hhconsensus -M 50 -maxres 65535 -i stdin -oa3m stdout -v 0

rm mydatabase.ff{data,index}

mv mydatabase_a3m_wo_ss.ffindex mydatabase_a3m.ffindex
mv mydatabase_a3m_wo_ss.ffdata mydatabase_a3m.ffdata

ffindex_apply mydatabase_a3m.ff{data,index} -i mydatabase_hhm.ffindex -d mydatabase_hhm.ffdata -- hhmake -i stdin -o stdout -v 0

cstranslate -f -x 0.3 -c 4 -I a3m -i mydatabase_a3m -o mydatabase_cs219 

sort -k3 -n -r mydatabase_cs219.ffindex | cut -f1 > sorting.dat

ffindex_order sorting.dat mydatabase_hhm.ff{data,index} mydatabase_hhm_ordered.ff{data,index}
mv mydatabase_hhm_ordered.ffindex mydatabase_hhm.ffindex
mv mydatabase_hhm_ordered.ffdata mydatabase_hhm.ffdata

ffindex_order sorting.dat mydatabase_a3m.ff{data,index} mydatabase_a3m_ordered.ff{data,index}
mv mydatabase_a3m_ordered.ffindex mydatabase_a3m.ffindex
mv mydatabase_a3m_ordered.ffdata mydatabase_a3m.ffdata

for query in $(ls *.fasta); do perl /home/users/mvladka/miniconda3/envs/hhsuite/scripts/reformat.pl fas a3m $query $(basename $query .fasta).a3m -M 50 &> $(basename $query .fasta).reformatlog; done

for a3m_query in $(ls *.a3m); do hhmake -i $a3m_query -M a3m -add_cons &> $(basename a3m_query .a3m).hhmakelog; done
for hhm in $(ls *.hhm); do hhsearch -i $hhm -d mydatabase -M a3m &> $(basename $hhm .hhm).hhsearchlog; done
altaetran commented 12 months ago

Could it be because the HHsuite databases may only use the first 100 sequences (or some other number) in the alignment?