I put one chain from a PDB into my library, then run either HHBLits or HHSearch against another homologous chain with indels and the indels do not align between query and target.
Expected Behavior - indels should align
Current Behavior - indels do not align and sequence identity lower than it "obviously" would be if the indels aligned. NCBI Blast gives 97.37% sequence ID (the indels are in the right place), HHBlits says 88%.
Steps to Reproduce (for bugs)
Put sequence of chain C from 5vol into the library, run query of chain A from 5vol against it. Chain C has a leading PW at the N-terminus, and an indel from 184-190 of QGAVPAD. Chain A has a G at the C-terminus. Otherwise in all respects the two chains have 100% sequence identity.
see attached file, but the interesting bit is here - note the indel for c5volC (target) appears around residues 168-174, but in the query (c5volA) appears around 196-202
Q ss_dssp CCSGGGEEEEEETHHHHHHHHHHHHTTTTCSEEEEESCCSSCCCCTTSHHHHHHHHHHHT
Q sspred ccchhheeecccchhHHHHHHHHhhcccccceeeeeccccCccCccccccccccccCCCC
Q c5volA 121 IGDRQHRAIAGLSMGGGGATNYGQRHSDMFCAVYAMSALMSIPEDPNSKIAILTRSVIEN 180 (260)
Q Consensus 121 ~~g~s~g~a~~~~~~~~~~~ 180 (260)
..+..++.+.|.|.|+..+...+...+..+..++..++......................
T Consensus 123 ~~G~S~Gga~~~~~~~~~~~ 182 (268)
T c5volC_ 123 IGDRQHRAIAGLSMGGGGATNYGQRHSDMFCAVYAMSALMSIPEQGAVPADDPNSKIAIL 182 (268)
T ss_dssp CCSGGGEEEEEETHHHHHHHHHHHHCTTTCSEEEEESCCSSCCSSC---CCCTTSHHHHH
T ss_pred CCCCcccEEEEEccchHHHHHHHHhChHHhHHHhhccccccccccccccccccccCccch
Q ss_dssp CHHHHHHTCCHHHHH-------HHTTSEEEEECCTTCTTHHHHHHHHHHHHHTTCCCEEE
Q sspred chHHHHhhcchhhhh-------ccccccccccccccCccchHHHHHHHHHHHCCCcEEEE
Q c5volA 181 SCVKYVMEADEDRKA-------DLRSVAWFVDCGDDDFLLDRNIEFYQAMRNAGVPCQFR 233 (260)
Q Consensus 181 ~~~-------~~~~~L~g~~ 233 (260)
............... ....+++++.+++.|....++++++++|++.|+++++.
T Consensus 183 ~~~~~~~gD~~l~g~~ 242 (268)
T c5volC_ 183 TRSVIENSCVKYVMEADEDRKADLRSVAWFVDCGDDDFLLDRNIEFYQAMRNAGVPCQFR 242 (268)
T ss_dssp HHHHHHTCHHHHHHTCCHHHHHHHTTSEEEEECCTTCTTHHHHHHHHHHHHHTTCCCEEE
T ss_pred hHHHHhcCHHHHHHhcChhhhhhccCceEEEEecCchHhHHHHHHHHHHHHHCCCCcEEE
Context
The context is that if a straightforward comparison between two homologous chains appears to give an erroneous alignment, how can I trust it for more complicated alignments with lower sequence identity?
Your Environment
Version/Git commit used: last publicly released version
Server specifications (especially CPU support for AVX2/SSE and amount of system memory): Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz (happy to upload o/p of 'more /proc/cpuinfo' if that would help), 264GB physical RAM
Operating system and version: Red Hat Enterprise Linux Workstation release 6.6 (Santiago)
I put one chain from a PDB into my library, then run either HHBLits or HHSearch against another homologous chain with indels and the indels do not align between query and target.
Expected Behavior - indels should align
Current Behavior - indels do not align and sequence identity lower than it "obviously" would be if the indels aligned. NCBI Blast gives 97.37% sequence ID (the indels are in the right place), HHBlits says 88%.
Steps to Reproduce (for bugs)
Put sequence of chain C from 5vol into the library, run query of chain A from 5vol against it. Chain C has a leading PW at the N-terminus, and an indel from 184-190 of QGAVPAD. Chain A has a G at the C-terminus. Otherwise in all respects the two chains have 100% sequence identity.
command to run:
/bmm/soft/linux64/src/hh-suite-bin/bin/hhblits -n 1 -i /bmm/www/servers/phyre2/test/hmm/testc7xrt//c5volA.hhblits.hhm -d /bmm/www/servers/phyre2/test/hmm/full -o /bmm/www/servers/phyre2/test/hmm/testc7xrt//c5volA.hhblits.hhr -b 100 -norealign -z 500 -alt 1 -aliw 60
HH-suite Output (for bugs)
see attached file, but the interesting bit is here - note the indel for c5volC (target) appears around residues 168-174, but in the query (c5volA) appears around 196-202
Q ss_dssp CCSGGGEEEEEETHHHHHHHHHHHHTTTTCSEEEEESCCSSCCCCTTSHHHHHHHHHHHT Q sspred ccchhheeecccchhHHHHHHHHhhcccccceeeeeccccCccCccccccccccccCCCC Q c5volA 121 IGDRQHRAIAGLSMGGGGATNYGQRHSDMFCAVYAMSALMSIPEDPNSKIAILTRSVIEN 180 (260) Q Consensus 121
~~g~s~g~a~~~~~~~~~~~ 180 (260) ..+..++.+.|.|.|+..+...+...+..+..++..++...................... T Consensus 123~~G~S~Gga~~~~~~~~~~~ 182 (268) T c5volC_ 123 IGDRQHRAIAGLSMGGGGATNYGQRHSDMFCAVYAMSALMSIPEQGAVPADDPNSKIAIL 182 (268) T ss_dssp CCSGGGEEEEEETHHHHHHHHHHHHCTTTCSEEEEESCCSSCCSSC---CCCTTSHHHHH T ss_pred CCCCcccEEEEEccchHHHHHHHHhChHHhHHHhhccccccccccccccccccccCccchQ ss_dssp CHHHHHHTCCHHHHH-------HHTTSEEEEECCTTCTTHHHHHHHHHHHHHTTCCCEEE Q sspred chHHHHhhcchhhhh-------ccccccccccccccCccchHHHHHHHHHHHCCCcEEEE Q c5volA 181 SCVKYVMEADEDRKA-------DLRSVAWFVDCGDDDFLLDRNIEFYQAMRNAGVPCQFR 233 (260) Q Consensus 181
~~~-------~~~~~L~g~~ 233 (260) ............... ....+++++.+++.|....++++++++|++.|+++++. T Consensus 183~~~~~~~gD~~l~g~~ 242 (268) T c5volC_ 183 TRSVIENSCVKYVMEADEDRKADLRSVAWFVDCGDDDFLLDRNIEFYQAMRNAGVPCQFR 242 (268) T ss_dssp HHHHHHTCHHHHHHTCCHHHHHHHTTSEEEEECCTTCTTHHHHHHHHHHHHHTTCCCEEE T ss_pred hHHHHhcCHHHHHHhcChhhhhhccCceEEEEecCchHhHHHHHHHHHHHHHCCCCcEEEContext
The context is that if a straightforward comparison between two homologous chains appears to give an erroneous alignment, how can I trust it for more complicated alignments with lower sequence identity?
Your Environment
Version/Git commit used: last publicly released version
Server specifications (especially CPU support for AVX2/SSE and amount of system memory): Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz (happy to upload o/p of 'more /proc/cpuinfo' if that would help), 264GB physical RAM
Operating system and version: Red Hat Enterprise Linux Workstation release 6.6 (Santiago)
c5volA_.hhblits.txt