steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
693 stars 91 forks source link

Backbone only search returns no hit. #256

Open realfenston opened 3 months ago

realfenston commented 3 months ago

Hello, I am having a naive question for Foldseek search. It is noticed in the paper that only alpha carbon relations will be extracted and be used to predict 3Di tokens. In this way, I manually curate a backbone only dataset with only N, alpha-C, and C atoms and construct corresponding PDBs from this dataset and later on run Foldseek search.

However, the results always say no hits are found.

I am not sure what has actually happened. Any help from you will be highly appreciated.

milot-mirdita commented 3 months ago

Our check to rebuild the backbone with pulchra seems to be incomplete.

If both N and C are present, it will not try to rebuild the backbone, resulting in broken 3Di-tokens. If you want Foldseek to reconstruct the backbone, please only pass C-alpha and not other atoms.

Otherwise make sure you pass all of N, C, C-alpha and C-beta.

realfenston commented 3 months ago

Our check to rebuild the backbone with pulchra seems to be incomplete.

If both N and C are present, it will not try to rebuild the backbone, resulting in broken 3Di-tokens. If you want Foldseek to reconstruct the backbone, please only pass C-alpha and not other atoms.

Otherwise make sure you pass all of N, C, C-alpha and C-beta.

Hello, many thanks to your quick response. I follow what you have mentioned, by passing only the alpha-C in my backbone but still it hits no matching records. I have attached one sample PDB file to this message and I would appreciate it a lot if you could help me check it out!

HEADER HYDROLASE 19-JUL-00 1FCV ATOM 1 CA XAA A 1 8.306 -88.396 56.274 1.00 0.00 C ATOM 2 CA XAA A 2 11.628 -86.719 57.108 1.00 0.00 C ATOM 3 CA XAA A 3 13.334 -87.879 53.911 1.00 0.00 C ATOM 4 CA XAA A 4 10.647 -86.303 51.710 1.00 0.00 C ATOM 5 CA XAA A 5 10.812 -83.125 53.810 1.00 0.00 C ATOM 6 CA XAA A 6 14.529 -82.658 53.081 1.00 0.00 C ATOM 7 CA XAA A 7 13.838 -83.207 49.393 1.00 0.00 C ATOM 8 CA XAA A 8 11.092 -80.560 49.435 1.00 0.00 C ATOM 9 CA XAA A 9 13.354 -78.095 51.254 1.00 0.00 C ATOM 10 CA XAA A 10 15.850 -78.536 48.447 1.00 0.00 C ATOM 11 CA XAA A 11 13.292 -77.487 45.823 1.00 0.00 C ATOM 12 CA XAA A 12 12.116 -74.575 47.987 1.00 0.00 C ATOM 13 CA XAA A 13 15.711 -73.421 48.333 1.00 0.00 C ATOM 14 CA XAA A 14 16.222 -73.474 44.547 1.00 0.00 C ATOM 15 CA XAA A 15 12.973 -71.500 44.290 1.00 0.00 C ATOM 16 CA XAA A 16 14.097 -68.951 46.883 1.00 0.00 C ATOM 17 CA XAA A 17 17.465 -68.518 45.192 1.00 0.00 C ATOM 18 CA XAA A 18 15.914 -67.992 41.767 1.00 0.00 C ATOM 19 CA XAA A 19 13.578 -65.299 43.159 1.00 0.00 C ATOM 20 CA XAA A 20 16.473 -63.621 44.975 1.00 0.00 C ATOM 21 CA XAA A 21 18.342 -63.575 41.670 1.00 0.00 C ATOM 22 CA XAA A 22 15.317 -61.965 39.941 1.00 0.00 C ATOM 23 CA XAA A 23 15.084 -59.368 42.724 1.00 0.00 C ATOM 24 CA XAA A 24 18.804 -58.518 42.717 1.00 0.00 C ATOM 25 CA XAA A 25 18.816 -58.156 38.932 1.00 0.00 C ATOM 26 CA XAA A 26 15.666 -56.022 38.911 1.00 0.00 C ATOM 27 CA XAA A 27 17.119 -53.803 41.668 1.00 0.00 C ATOM 28 CA XAA A 28 20.386 -53.535 39.763 1.00 0.00 C ATOM 29 CA XAA A 29 18.376 -52.372 36.736 1.00 0.00 C ATOM 30 CA XAA A 30 16.562 -49.904 38.920 1.00 0.00 C ATOM 31 CA XAA A 31 19.815 -48.452 40.310 1.00 0.00 C ATOM 32 CA XAA A 32 20.883 -47.809 36.746 1.00 0.00 C ATOM 33 CA XAA A 33 17.530 -46.270 35.836 1.00 0.00 C ATOM 34 CA XAA A 34 17.415 -44.089 38.947 1.00 0.00 C ATOM 35 CA XAA A 35 20.874 -42.742 38.154 1.00 0.00 C ATOM 36 CA XAA A 36 19.685 -41.896 34.637 1.00 0.00 C ATOM 37 CA XAA A 37 16.546 -40.194 36.011 1.00 0.00 C ATOM 38 CA XAA A 38 18.814 -38.096 38.250 1.00 0.00 C ATOM 39 CA XAA A 39 21.120 -37.120 35.396 1.00 0.00 C ATOM 40 CA XAA A 40 18.242 -36.137 33.143 1.00 0.00 C ATOM 41 CA XAA A 41 16.381 -34.277 35.882 1.00 0.00 C ATOM 42 CA XAA A 42 19.550 -32.228 36.482 1.00 0.00 C ATOM 43 CA XAA A 43 19.717 -31.280 32.774 1.00 0.00 C ATOM 44 CA XAA A 44 16.077 -30.274 32.660 1.00 0.00 C ATOM 45 CA XAA A 45 15.873 -28.567 36.066 1.00 0.00 C ATOM 46 CA XAA A 46 13.367 -30.836 37.771 1.00 0.00 C ATOM 47 CA XAA A 47 13.049 -31.838 41.400 1.00 0.00 C END

You may ignore the PDB name which comes from my favorite protein entry. This backbone is from a CATH dataset so it mostly favor a reasonable structure.

realfenston commented 3 months ago

It is my bad. It seems I am working under 3Di+AA mode which corrupts everything totally. I could try local alignment mode with structure only. Anyway your response is really appreciated and I will report the result after giving a try.

milot-mirdita commented 3 months ago

You can try the 3Di only mode. Foldseek doesn't work as well with 3Di only though, normally you'd need to pass both C-alpha and AA letter.

Additionally, I don't think the pulchra backbone reconstruction works if you don't tell it what AA letter it is.