steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
695 stars 92 forks source link

foldseek failed to search/process Pymol generated files #209

Open igortru opened 7 months ago

igortru commented 7 months ago

1) https://search.foldseek.com/queue/Z2fX2gw_ZuE4fLFe1iI-MSF4W-Sj2jktXz1nvw

I have searched https://tinyurl.com/39hdture or https://tinyurl.com/bdz38eeh

2)

foldseek structureto3didescriptor AF-A0A009GDF2-F1-model_v4.cif.1.cif AF-A0A009GDF2-F1-model_v4.cif.1.cif.3di produce empty output.

files looks ok when I load them using: https://www.rcsb.org/3d-view if I export cif files from viewer, structureto3didescriptor working just fine.

3) structures have identical protein sequences and 0.99 tm-score but 3di sequences difference is about 15% (original AF and EsmAtlas sequences)

original structures have different orientation, they were aligned in PyMol using cealign and exported back for downstream analysis, but unexpectedly foldseek completely rejected these structures.

igortru commented 7 months ago

continue.. I have found EsmAtlas and Alphafold 3D-structures with identical protein sequences and average pLDDT > 90% on both ends and then compare 3Di sequences, global 100-pident distribution you can see in attachment. Y - number of AF-ESM pairs.

Screenshot 2023-11-12 at 9 50 23 PM
igortru commented 7 months ago

feature assignments looks a little bit unstable for 0.99 tm-score i - j fluctuations: 6 1 -4 10 -1 -3 24 -1 4 36 -1 -3 52 -4 4 58 1 -1 60 -4 3 78 1 -3 102 -1 1 117 3 -4 118 4 -1 130 -4 -3 132 4 -4 149 1 -1 151 1 -1 156 1 -1 162 4 -1 176 1 -1 180 -3 1 188 4 -1 199 1 4 200 -1 4 203 1 -4 211 -1 1 213 3 -1 215 -4 -3 216 1 -3 218 4 -4 219 1 -3 253 -4 1 255 0 3521000000000

I think, findResiduePartners can be much more sophisticated than trivial nearest. I am reall interested to know , how exactly you come to this implementation, have you tried other approaches?

what about idea, using some ML based approach and millions of AF/ESM identical pairs try to achieve identical i - j matrices.