Open igortru opened 7 months ago
continue.. I have found EsmAtlas and Alphafold 3D-structures with identical protein sequences and average pLDDT > 90% on both ends and then compare 3Di sequences, global 100-pident distribution you can see in attachment. Y - number of AF-ESM pairs.
feature assignments looks a little bit unstable for 0.99 tm-score i - j fluctuations: 6 1 -4 10 -1 -3 24 -1 4 36 -1 -3 52 -4 4 58 1 -1 60 -4 3 78 1 -3 102 -1 1 117 3 -4 118 4 -1 130 -4 -3 132 4 -4 149 1 -1 151 1 -1 156 1 -1 162 4 -1 176 1 -1 180 -3 1 188 4 -1 199 1 4 200 -1 4 203 1 -4 211 -1 1 213 3 -1 215 -4 -3 216 1 -3 218 4 -4 219 1 -3 253 -4 1 255 0 3521000000000
I think, findResiduePartners can be much more sophisticated than trivial nearest. I am reall interested to know , how exactly you come to this implementation, have you tried other approaches?
what about idea, using some ML based approach and millions of AF/ESM identical pairs try to achieve identical i - j matrices.
1) https://search.foldseek.com/queue/Z2fX2gw_ZuE4fLFe1iI-MSF4W-Sj2jktXz1nvw
I have searched https://tinyurl.com/39hdture or https://tinyurl.com/bdz38eeh
2)
foldseek structureto3didescriptor AF-A0A009GDF2-F1-model_v4.cif.1.cif AF-A0A009GDF2-F1-model_v4.cif.1.cif.3di produce empty output.
files looks ok when I load them using: https://www.rcsb.org/3d-view if I export cif files from viewer, structureto3didescriptor working just fine.
3) structures have identical protein sequences and 0.99 tm-score but 3di sequences difference is about 15% (original AF and EsmAtlas sequences)
original structures have different orientation, they were aligned in PyMol using cealign and exported back for downstream analysis, but unexpectedly foldseek completely rejected these structures.