openmm / pdbfixer

PDBFixer fixes problems in PDB files
Other
453 stars 114 forks source link

PDBs from RCSB failed to reconstruct #219

Open slawek111 opened 3 years ago

slawek111 commented 3 years ago

Hi,

I have been testing PDBFixer for some time and it works great in most cases. Unfortunately, I have also found that there are several PDBs for which the missing residues are not reconstructed (6RPA, 6RPB and 1AO7). PDBFixer just does not find the missing parts in the bakcbone and thus it does not reconstruct them. I have tried several approaches: first I checked if the SEQRES is well defined (even created own proper SEQRES), which was true. Then, I realized that the amino acids in these PDBs are not numbered in order for the continuous parts (not missing ones) - I also fixed that. Another fix was also to remove letters for the variant mutations done in the experiments (in comparison to reference sequence). Unfortunately, none of the above worked and at the moment there is effort to check through the code and find the reason. But maybe I could find help from you. All the suggestions and help will be very appreciated!

All the best, Sławek

peastman commented 3 years ago

I downloaded 6RPA, and it looks to me like there are problems in the residue numbering. Take a look at chain D. The SEQRES and ATOM records match through residue 29 (GLY). Then the residue number jumps to 36, indicating six missing residues. But they don't appear in the SEQRES records. It carries right on at residue 36 as if nothing were there.

As a result, PDBFixer can't figure out any way to match up the sequence of chain D to the SEQRES records. If it can't align them, then it can't identify what residues are missing.

slawek111 commented 3 years ago

Hi Peter, thank you for your answer. Yes, that is correct. But there are no missing residues in the 29-36 aa - it is just a matter of residue numbering according to IMGT standard. The SEQRES represents properly the structure. Nevertheless, in attempt to solve it, the residues were renumbered properly in the input PDB for PDBFixer, but the problem was not resolved by doing that. Another possible issue, was that at that part 29-36 there is residue numbering ending with "A" letter, which is for indication of construct's mutations. This was also removed in input PDB and yet the structure remains unreconstructed. So the issue is not resolved. Cheers, Sławek

śr., 31 mar 2021, 20:38 użytkownik Peter Eastman @.***> napisał:

I downloaded 6RPA, and it looks to me like there are problems in the residue numbering. Take a look at chain D. The SEQRES and ATOM records match through residue 29 (GLY). Then the residue number jumps to 36, indicating six missing residues. But they don't appear in the SEQRES records. It carries right on at residue 36 as if nothing were there.

As a result, PDBFixer can't figure out any way to match up the sequence of chain D to the SEQRES records. If it can't align them, then it can't identify what residues are missing.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openmm/pdbfixer/issues/219#issuecomment-811327240, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATOUZI7VRYRA2KIZ4SF534TTGNT25ANCNFSM42BMES3Q .

peastman commented 3 years ago

Can you post your modified PDB file where you've fixed those problems? I can investigate what else is happening.

slawek111 commented 3 years ago

Hi, sorry for late response, but I was going through Covid and I was unable to take up the issue. The 6RPA PDB with fixed numbering is in the attachment. 6RPA_numbers_fixed.zip

NatureGeorge commented 3 years ago

pdbfixer should make use of the _pdbx_poly_seq_scheme record in the mmCIF format rather than the SEQRES in the legacy pdb format.

loop_
_pdbx_poly_seq_scheme.asym_id 
_pdbx_poly_seq_scheme.entity_id 
_pdbx_poly_seq_scheme.seq_id 
_pdbx_poly_seq_scheme.mon_id 
_pdbx_poly_seq_scheme.ndb_seq_num 
_pdbx_poly_seq_scheme.pdb_seq_num 
_pdbx_poly_seq_scheme.auth_seq_num 
_pdbx_poly_seq_scheme.pdb_mon_id 
_pdbx_poly_seq_scheme.auth_mon_id 
_pdbx_poly_seq_scheme.pdb_strand_id 
_pdbx_poly_seq_scheme.pdb_ins_code 
_pdbx_poly_seq_scheme.hetero 
...
D 4 1   MET 1   0   ?   ?   ?   D . n 
D 4 2   ALA 2   1   1   ALA ALA D . n 
D 4 3   GLN 3   2   2   GLN GLN D . n 
D 4 4   SER 4   3   3   SER SER D . n 
D 4 5   VAL 5   4   4   VAL VAL D . n 
D 4 6   ALA 6   5   5   ALA ALA D . n 
D 4 7   GLN 7   6   6   GLN GLN D . n 
D 4 8   PRO 8   7   7   PRO PRO D . n 
D 4 9   GLU 9   8   8   GLU GLU D . n 
D 4 10  ASP 10  9   9   ASP ASP D . n 
D 4 11  GLN 11  10  10  GLN GLN D . n 
D 4 12  VAL 12  11  11  VAL VAL D . n 
D 4 13  ASN 13  12  12  ASN ASN D . n 
D 4 14  VAL 14  13  13  VAL VAL D . n 
D 4 15  ALA 15  14  14  ALA ALA D . n 
D 4 16  GLU 16  15  15  GLU GLU D . n 
D 4 17  GLY 17  16  16  GLY GLY D . n 
D 4 18  ASN 18  17  17  ASN ASN D . n 
D 4 19  PRO 19  18  18  PRO PRO D . n 
D 4 20  LEU 20  19  19  LEU LEU D . n 
D 4 21  THR 21  20  20  THR THR D . n 
D 4 22  VAL 22  21  21  VAL VAL D . n 
D 4 23  LYS 23  22  22  LYS LYS D . n 
D 4 24  CYS 24  23  23  CYS CYS D . n 
D 4 25  THR 25  24  24  THR THR D . n 
D 4 26  TYR 26  25  25  TYR TYR D . n 
D 4 27  SER 27  26  26  SER SER D . n 
D 4 28  VAL 28  27  27  VAL VAL D . n 
D 4 29  SER 29  28  28  SER SER D . n 
D 4 30  GLY 30  29  29  GLY GLY D . n 
D 4 31  ASN 31  36  36  ASN ASN D . n       <---------------------------
D 4 32  PRO 32  37  37  PRO PRO D . n 
D 4 33  TYR 33  38  38  TYR TYR D . n 
D 4 34  LEU 34  39  39  LEU LEU D . n 
D 4 35  PHE 35  40  40  PHE PHE D . n 
D 4 36  TRP 36  41  41  TRP TRP D . n 
D 4 37  TYR 37  42  42  TYR TYR D . n 
D 4 38  VAL 38  43  43  VAL VAL D . n 
D 4 39  GLN 39  44  44  GLN GLN D . n 
D 4 40  TYR 40  45  45  TYR TYR D . n 
D 4 41  PRO 41  46  46  PRO PRO D . n 
D 4 42  ASN 42  47  47  ASN ASN D . n 
...

where ? in _pdbx_poly_seq_scheme.auth_seq_num column indicates a missing/unmodeled residue.

Hoping everyone is alright.

peastman commented 3 years ago

I'm not sure what you mean. His input file is a PDB, not a PDBx/mmCIF.

Ruibin-Liu commented 3 years ago

Has anyone figured out how to solve the problem? I think 5J7S is problematic too.