steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
748 stars 101 forks source link

foldseek easy-multimersearch returns empty results #304

Open Rohit-Satyam opened 1 month ago

Rohit-Satyam commented 1 month ago

Expected Behavior

I was testing foldseek multimer search on these two structurally similar PDBs obtained from RCSB PDB (also tried PDB-redo). However, the results are not returned as the files are empty. I've attached the log file produced by foldseek. When I align these structures on the PDB database using TMalign, I get the following result

Screenshot 2024-07-10 220930

Tried removing the water atom (HOH) and ANISOU records as well and also tried PDBs from PDB-redo project but the empty files persists.

Steps to Reproduce (for bugs)

foldseek easy-multimersearch 4r19.pdb 8qus.pdb result tmpFolder 1> log

log.zip

Your Environment

foldseek Version: 9.427df8a
mamba 1.5.6
conda 23.11.0

LSB Version:    core-11.1.0ubuntu2-noarch:security-11.1.0ubuntu2-noarch
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:    20.04
Codename:   focal

Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      46 bits physical, 57 bits virtual
CPU(s):                             112
On-line CPU(s) list:                0-111
Thread(s) per core:                 2
Core(s) per socket:                 28
Socket(s):                          2
NUMA node(s):                       2
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              106
Model name:                         Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
Stepping:                           6
Frequency boost:                    enabled
CPU MHz:                            994.869
CPU max MHz:                        2001.0000
CPU min MHz:                        800.0000
BogoMIPS:                           4000.00
Virtualization:                     VT-x
L1d cache:                          2.6 MiB
L1i cache:                          1.8 MiB
L2 cache:                           70 MiB
L3 cache:                           84 MiB
Rohit-Satyam commented 1 month ago

I realized that some arguments are responsible for not returning results if the structural identity is less than 90% maybe or the TM align score is less but I cannot spot that argument. Can you please tell me what are those parameters?

Woosub-Kim commented 1 month ago

In this case, 8qus_D is the better partner for both 4r19_A and 4r19_B while alignments with 8qus_H are poor. We have built the algorithm to suppress poor alignments, so only chain-to-chain alignment remains and the hit is not multimer alignable. You can still use foldseek easy-search to compute the individual monomer alignments.

Rohit-Satyam commented 1 month ago

@Woosub-Kim Thanks for your reply. So you mean if the query PDB's one chain aligns with more than one chain in target multimer PDB it will be considered as poor alignment? I tried running foldseek after removing 8qus_H, but the results files are still empty.