openmm / pdbfixer

PDBFixer fixes problems in PDB files
Other
443 stars 112 forks source link

Enhancement requests if not already existing? #247

Open BJWiley233 opened 2 years ago

BJWiley233 commented 2 years ago

I think pdbfixer will benefit from and I am willing to work on at least the first (the easier of the 2 😄 ) and eventually the second one if these don't already exist.

First: Some PDB files may have a chain termination TER (see TER in Chimera PDB Format) atoms. Example 3NBQ.pdb. It would be great to 1) be able to remove these chain terminators since we don't simulate these and 2) to reindex the rest of the atoms if TER is numbered in the PDB files. Let me know where in the code this would be best or I can just find a nice place :) and I will create a pull request.

ATOM   4548  HB3 SER A 308     -29.752  11.441 -51.010  1.00  0.00           H  
ATOM   4549  OG  SER A 308     -29.229   9.532 -50.835  1.00  0.00           O  
ATOM   4550  HG  SER A 308     -28.082   9.553 -51.132  1.00  0.00           H  
ATOM   4551  OXT SER A 308     -31.463   8.688 -49.735  1.00  0.00           O  
TER    4552      SER A 308
ATOM   4553  N   ASP B  16       6.316  11.005   5.784  1.00  0.00           N  
ATOM   4554  H   ASP B  16       6.004  11.744   6.672  1.00  0.00           H  
ATOM   4555  H2  ASP B  16       7.348  11.570   5.523  1.00  0.00           H 

Second: It would also be great to be able to have hydrogens added to ligands as well since when running Tinker/Poltype2 we would want to model these in our simulations and the SDF/MOL2 files have hydrogens that we are optimizing in Poltype2. This might need import from openbabel.

Brian

peastman commented 2 years ago

I don't understand what you mean by the first suggestion. TER records are how PDB files indicate the breaks between chains. What do you mean by removing them?

For the second one, it can add hydrogens to ligands, but you need to give it some help to tell it where to add them. This cookbook entry describes how to do it.

BJWiley233 commented 2 years ago

Yeah I understand the chain Termination but sometimes these TER have an ATOM index number and so if you manipulating pdb files let's say like GenerateUncomplexedProteinPDBFromComplexedPDB for Polytype2 for removing non ATOM atoms from the pdb using for example openbabel and GetAtom it will include these and then mess up writing the residues to file.

peastman commented 2 years ago

The index numbers on TER records are actually required by the spec:

The TER record has the same residue name, chain identifier, sequence number and insertion code as the terminal residue. The serial number of the TER record is one number greater than the serial number of the ATOM/HETATM preceding the TER.

Don't ask me why they did it that way! But any program that can't deal with them is broken.

BJWiley233 commented 2 years ago

Yea I know now that maybe since I was missing hydrogens on 5-FU plus I have crystal structure with 4 chains and 4 molecules of 5-FU that was probably the issue. Polytype2's pdbfixer is playing much nicer with single molecule of ribociclib and single chain CDk6. Just wondering if people like to do multi-subunit simulations with multiple small molecules. Probably just take a little light manual work.

BJWiley233 commented 2 years ago

Thanks for sending notes on xml for non-standard residues. I believe it worked