haddocking / pdb-tools

A dependency-free cross-platform swiss army knife for PDB files.
https://haddocking.github.io/pdb-tools/
Apache License 2.0
390 stars 114 forks source link

pdb_fixinsert can't handle the PDB, which has missing residue in insertion code region #148

Open hw-protein opened 2 years ago

hw-protein commented 2 years ago

pdb_fixinsert does not perform renumbering properly when a residue with insertion code is missing.

For example, 1-1) Input : full PDB of 7JMO heavy chain

ATOM   2306  N   ALA H  99     -52.017   1.887  12.504  1.00 37.00           N  
ATOM   2307  CA  ALA H  99     -51.622   0.503  12.717  1.00 46.76           C  
ATOM   2308  C   ALA H  99     -50.170   0.230  12.336  1.00 43.32           C  
ATOM   2309  O   ALA H  99     -49.696  -0.891  12.551  1.00 42.90           O  
ATOM   2310  CB  ALA H  99     -52.554  -0.435  11.942  1.00 29.26           C  
ATOM   2311  N   GLY H 100     -49.452   1.207  11.780  1.00 38.57           N  
ATOM   2312  CA  GLY H 100     -48.031   1.013  11.563  1.00 45.52           C  
ATOM   2313  C   GLY H 100     -47.348   1.851  10.499  1.00 38.13           C  
ATOM   2314  O   GLY H 100     -46.260   1.485  10.051  1.00 46.04           O  
ATOM   2315  N   GLY H 100A    -47.951   2.963  10.083  1.00 41.11           N  
ATOM   2316  CA  GLY H 100A    -47.279   3.829   9.122  1.00 41.50           C  
ATOM   2317  C   GLY H 100A    -46.986   3.152   7.785  1.00 45.26           C  
ATOM   2318  O   GLY H 100A    -47.531   2.098   7.446  1.00 43.32           O  
ATOM   2319  N   MET H 100B    -46.097   3.786   7.016  1.00 38.57           N  
ATOM   2320  CA  MET H 100B    -45.665   3.253   5.722  1.00 40.39           C  
ATOM   2321  C   MET H 100B    -44.512   2.288   5.965  1.00 44.36           C  
ATOM   2322  O   MET H 100B    -43.340   2.670   5.985  1.00 40.96           O  
ATOM   2323  CB  MET H 100B    -45.262   4.377   4.774  1.00 33.52           C  
ATOM   2324  CG  MET H 100B    -46.433   5.114   4.139  1.00 41.30           C  
ATOM   2325  SD  MET H 100B    -45.911   6.355   2.934  1.00 46.75           S  
ATOM   2326  CE  MET H 100B    -47.492   6.828   2.238  1.00 50.79           C  
ATOM   2327  N   ASP H 101     -44.847   1.010   6.143  1.00 46.96           N  
ATOM   2328  CA  ASP H 101     -43.864   0.026   6.576  1.00 38.63           C  
ATOM   2329  C   ASP H 101     -43.156  -0.683   5.429  1.00 46.59           C  
ATOM   2330  O   ASP H 101     -42.049  -1.195   5.632  1.00 48.32           O  
ATOM   2331  CB  ASP H 101     -44.528  -1.019   7.480  1.00 46.37           C  
ATOM   2332  CG  ASP H 101     -45.602  -1.817   6.762  1.00 54.42           C  
ATOM   2333  OD1 ASP H 101     -46.242  -1.270   5.839  1.00 43.57           O  
ATOM   2334  OD2 ASP H 101     -45.803  -2.996   7.119  1.00 51.72           O  
ATOM   2335  N   VAL H 102     -43.751  -0.734   4.239  1.00 39.00           N  
ATOM   2336  CA  VAL H 102     -43.159  -1.425   3.100  1.00 41.76           C  
ATOM   2337  C   VAL H 102     -42.959  -0.433   1.967  1.00 44.62           C  
ATOM   2338  O   VAL H 102     -43.911   0.234   1.539  1.00 42.63           O  
ATOM   2339  CB  VAL H 102     -44.017  -2.614   2.634  1.00 46.36           C  
ATOM   2340  CG1 VAL H 102     -43.351  -3.304   1.450  1.00 43.45           C  
ATOM   2341  CG2 VAL H 102     -44.224  -3.591   3.778  1.00 40.81           C  

1-2) Ouput of pdb_fixinsert , for this case work successful

ATOM   2306  N   ALA H 102     -52.017   1.887  12.504  1.00 37.00           N  
ATOM   2307  CA  ALA H 102     -51.622   0.503  12.717  1.00 46.76           C  
ATOM   2308  C   ALA H 102     -50.170   0.230  12.336  1.00 43.32           C  
ATOM   2309  O   ALA H 102     -49.696  -0.891  12.551  1.00 42.90           O  
ATOM   2310  CB  ALA H 102     -52.554  -0.435  11.942  1.00 29.26           C  
ATOM   2311  N   GLY H 103     -49.452   1.207  11.780  1.00 38.57           N  
ATOM   2312  CA  GLY H 103     -48.031   1.013  11.563  1.00 45.52           C  
ATOM   2313  C   GLY H 103     -47.348   1.851  10.499  1.00 38.13           C  
ATOM   2314  O   GLY H 103     -46.260   1.485  10.051  1.00 46.04           O  
ATOM   2315  N   GLY H 104     -47.951   2.963  10.083  1.00 41.11           N  
ATOM   2316  CA  GLY H 104     -47.279   3.829   9.122  1.00 41.50           C  
ATOM   2317  C   GLY H 104     -46.986   3.152   7.785  1.00 45.26           C  
ATOM   2318  O   GLY H 104     -47.531   2.098   7.446  1.00 43.32           O  
ATOM   2319  N   MET H 105     -46.097   3.786   7.016  1.00 38.57           N  
ATOM   2320  CA  MET H 105     -45.665   3.253   5.722  1.00 40.39           C  
ATOM   2321  C   MET H 105     -44.512   2.288   5.965  1.00 44.36           C  
ATOM   2322  O   MET H 105     -43.340   2.670   5.985  1.00 40.96           O  
ATOM   2323  CB  MET H 105     -45.262   4.377   4.774  1.00 33.52           C  
ATOM   2324  CG  MET H 105     -46.433   5.114   4.139  1.00 41.30           C  
ATOM   2325  SD  MET H 105     -45.911   6.355   2.934  1.00 46.75           S  
ATOM   2326  CE  MET H 105     -47.492   6.828   2.238  1.00 50.79           C  
ATOM   2327  N   ASP H 106     -44.847   1.010   6.143  1.00 46.96           N  
ATOM   2328  CA  ASP H 106     -43.864   0.026   6.576  1.00 38.63           C  
ATOM   2329  C   ASP H 106     -43.156  -0.683   5.429  1.00 46.59           C  
ATOM   2330  O   ASP H 106     -42.049  -1.195   5.632  1.00 48.32           O  
ATOM   2331  CB  ASP H 106     -44.528  -1.019   7.480  1.00 46.37           C  
ATOM   2332  CG  ASP H 106     -45.602  -1.817   6.762  1.00 54.42           C  
ATOM   2333  OD1 ASP H 106     -46.242  -1.270   5.839  1.00 43.57           O  
ATOM   2334  OD2 ASP H 106     -45.803  -2.996   7.119  1.00 51.72           O  
ATOM   2335  N   VAL H 107     -43.751  -0.734   4.239  1.00 39.00           N  
ATOM   2336  CA  VAL H 107     -43.159  -1.425   3.100  1.00 41.76           C  
ATOM   2337  C   VAL H 107     -42.959  -0.433   1.967  1.00 44.62           C  
ATOM   2338  O   VAL H 107     -43.911   0.234   1.539  1.00 42.63           O  
ATOM   2339  CB  VAL H 107     -44.017  -2.614   2.634  1.00 46.36           C  
ATOM   2340  CG1 VAL H 107     -43.351  -3.304   1.450  1.00 43.45           C  
ATOM   2341  CG2 VAL H 107     -44.224  -3.591   3.778  1.00 40.81           C  

However, when I delete the residue with number 100B and 101

2-1) Input, delete line for residue number 100B and 101

ATOM   2306  N   ALA H  99     -52.017   1.887  12.504  1.00 37.00           N  
ATOM   2307  CA  ALA H  99     -51.622   0.503  12.717  1.00 46.76           C  
ATOM   2308  C   ALA H  99     -50.170   0.230  12.336  1.00 43.32           C  
ATOM   2309  O   ALA H  99     -49.696  -0.891  12.551  1.00 42.90           O  
ATOM   2310  CB  ALA H  99     -52.554  -0.435  11.942  1.00 29.26           C  
ATOM   2311  N   GLY H 100     -49.452   1.207  11.780  1.00 38.57           N  
ATOM   2312  CA  GLY H 100     -48.031   1.013  11.563  1.00 45.52           C  
ATOM   2313  C   GLY H 100     -47.348   1.851  10.499  1.00 38.13           C  
ATOM   2314  O   GLY H 100     -46.260   1.485  10.051  1.00 46.04           O  
ATOM   2315  N   GLY H 100A    -47.951   2.963  10.083  1.00 41.11           N  
ATOM   2316  CA  GLY H 100A    -47.279   3.829   9.122  1.00 41.50           C  
ATOM   2317  C   GLY H 100A    -46.986   3.152   7.785  1.00 45.26           C  
ATOM   2318  O   GLY H 100A    -47.531   2.098   7.446  1.00 43.32           O  
ATOM   2335  N   VAL H 102     -43.751  -0.734   4.239  1.00 39.00           N  
ATOM   2336  CA  VAL H 102     -43.159  -1.425   3.100  1.00 41.76           C  
ATOM   2337  C   VAL H 102     -42.959  -0.433   1.967  1.00 44.62           C  
ATOM   2338  O   VAL H 102     -43.911   0.234   1.539  1.00 42.63           O  
ATOM   2339  CB  VAL H 102     -44.017  -2.614   2.634  1.00 46.36           C  
ATOM   2340  CG1 VAL H 102     -43.351  -3.304   1.450  1.00 43.45           C  
ATOM   2341  CG2 VAL H 102     -44.224  -3.591   3.778  1.00 40.81           C

2-2) Output, WRONG numbering compare it with 1-2. Different residue numbering with 1-2

ATOM   2306  N   ALA H 102     -52.017   1.887  12.504  1.00 37.00           N  
ATOM   2307  CA  ALA H 102     -51.622   0.503  12.717  1.00 46.76           C  
ATOM   2308  C   ALA H 102     -50.170   0.230  12.336  1.00 43.32           C  
ATOM   2309  O   ALA H 102     -49.696  -0.891  12.551  1.00 42.90           O  
ATOM   2310  CB  ALA H 102     -52.554  -0.435  11.942  1.00 29.26           C  
ATOM   2311  N   GLY H 103     -49.452   1.207  11.780  1.00 38.57           N  
ATOM   2312  CA  GLY H 103     -48.031   1.013  11.563  1.00 45.52           C  
ATOM   2313  C   GLY H 103     -47.348   1.851  10.499  1.00 38.13           C  
ATOM   2314  O   GLY H 103     -46.260   1.485  10.051  1.00 46.04           O  
ATOM   2315  N   GLY H 104     -47.951   2.963  10.083  1.00 41.11           N  
ATOM   2316  CA  GLY H 104     -47.279   3.829   9.122  1.00 41.50           C  
ATOM   2317  C   GLY H 104     -46.986   3.152   7.785  1.00 45.26           C  
ATOM   2318  O   GLY H 104     -47.531   2.098   7.446  1.00 43.32           O  
ATOM   2335  N   VAL H 106     -43.751  -0.734   4.239  1.00 39.00           N  
ATOM   2336  CA  VAL H 106     -43.159  -1.425   3.100  1.00 41.76           C  
ATOM   2337  C   VAL H 106     -42.959  -0.433   1.967  1.00 44.62           C  
ATOM   2338  O   VAL H 106     -43.911   0.234   1.539  1.00 42.63           O  
ATOM   2339  CB  VAL H 106     -44.017  -2.614   2.634  1.00 46.36           C  
ATOM   2340  CG1 VAL H 106     -43.351  -3.304   1.450  1.00 43.45           C  
ATOM   2341  CG2 VAL H 106     -44.224  -3.591   3.778  1.00 40.81           C 
joaomcteixeira commented 2 years ago

Hi @hw-protein

First of all, thanks for your feedback.

I don't know if this is a bug because the insertion is correctly addressed (to my knowledge and expectations). But, for what I understood, you were expecting VAL106 to become VAL105, is that it? The philosophy of pdbtools is "one-tool-one-job". Therefore, I think other tools should take the residue number correction. You may need to chain several tools for example: 1. split the chain in two, 2. correct the numbers, 3. merge the chain again.

pdb_selres -:100 my.pdb > first_100.pdb
pdb_selres -102: my.pdb > second_half.pdb
pdb_shiftres --1 second_half.pdb > shifted.pdb
pdb_merge first_100.pdb shifted.pdb | pdb_fixinsert > final.pdb

This could be an approach if your PDB has only one chain.

@JoaoRodrigues Do you think the example reported by @hw-protein is a real bug of pdb_fixinsert?

hw-protein commented 2 years ago

I was expecting VAL106 (in 2-2) to become VAL107 (as in 1-2). Because input protein (1-1) and input protein(2-1) is essentially same. The only difference is there is missing residue( ex. not resolved) in input protein(2-1)

I would like to ask about the algorithm of this program.

1)Can "pdb_fixinsert" also be used for PDB files where there is missing residue in ATOM line? For example, the structure of residue 100B and 101 is not determined. Therefore, there is no ATOM line for residue number 100B and 101. Can "pdb_fixinsert" renumber input PDB considering this kind of missing using SEQRES line?

2) I thought that this program creates a reference sequence through the SEQRES line and renumbers the ATOM line information by aligning it with the reference sequence. Is it right?

I think the second question is related with first question.

joaomcteixeira commented 2 years ago

Can "pdb_fixinsert" renumber input PDB considering this kind of missing using SEQRES line?

No. pdb_fixinsert operates only by reading the ATOM/HETATM lines. Therefore, I don't expect VAL102 to become VAL107 because 100B and ASP101 are not there, contrarily to 1-1. pdb_fixinsert behaves as we expect it to behave. The 2-2 scenario is correct to our expectations and tests.

Having clarified your concerns and the fixinsert algorithm, I see this is not a bug, but instead feature request :wink: Thanks for this insightful conversation! :rocket:

@JoaoRodrigues @amjjbonvin should pdb_fixinsert give priority to SEQRES lines in case they are present? Is this a feature we would be willing to add?

Cheers,