steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
695 stars 92 forks source link

createdb extracts (0,0,0) coordinates for residues that don't have C-alphas #214

Closed milot-mirdita closed 4 months ago

milot-mirdita commented 7 months ago

In https://www.rcsb.org/structure/4YXG LYS 148 doesn't have a C-alpha carbon solved, only the C and O atoms are solved.

ATOM   2955  N   VAL B 147      -3.594 -19.441  14.168  1.00 34.28           N  
ATOM   2956  CA  VAL B 147      -2.519 -18.538  14.505  1.00 34.00           C  
ATOM   2957  C   VAL B 147      -1.254 -19.343  14.398  1.00 34.01           C  
ATOM   2958  O   VAL B 147      -0.294 -18.848  13.842  1.00 34.02           O  
ATOM   2959  CB  VAL B 147      -2.678 -17.797  15.865  1.00 20.00           C  
ATOM   2960  CG2 VAL B 147      -3.996 -17.077  15.968  1.00 20.00           C  
ATOM   2961  C   LYS B 148       0.686 -22.003  13.154  1.00 42.88           C  
ATOM   2962  O   LYS B 148       1.735 -21.355  13.201  1.00 43.55           O  
ATOM   2963  N   ALA B 149      -0.290 -21.745  12.288  1.00 42.00           N  
ATOM   2964  CA  ALA B 149      -0.095 -21.900  10.852  1.00 41.73           C  
ATOM   2965  C   ALA B 149       0.722 -20.748  10.267  1.00 43.02           C  
ATOM   2966  O   ALA B 149       1.614 -20.978   9.451  1.00 43.31           O  
ATOM   2967  CB  ALA B 149      -1.434 -22.015  10.139  1.00 41.47           C  
ATOM   2968  N   PHE B 150       0.425 -19.519  10.688  1.00 43.24           N  

The coordinate that is extracted for this residue is therefore (0,0,0).

Same case for other atoms in the same structure.

We should probably skip these atoms completely and threat them just like residue index jumps. We have code for this for hetatoms already.