best way to insert ACE/NME caps?

nitroamos commented 6 years ago

I've got code that does something like this:

def addACENMECaps(fixer,origFilename):
   chainWithGaps = getChainWithGapsFromRemark465(origFilename,fixer)
   fixer.findMissingResidues(chainWithGapsOverride=chainWithGaps)
   displayMissingResidues(fixer,"pdbfixer",True,
                          loopExtension=0,maxLoopLength=None,substituteCaps=True)
   fixer.findMissingAtoms()
   displayMissingAtoms(fixer)
   fixer.addMissingAtoms()
   fixer.addMissingHydrogens(7)

where my displayMissingResidues has a section that looks like this:

  if substituteCaps:
     if isNTerminal:
        residues = ["ACE"]
     elif isCTerminal:
        residues = ["NME"]
     else:
        residues = ["NME","ACE"]
     fixer.missingResidues[key] = residues

this seems to work pretty well except that when I call addMissingHydrogens to add hydrogens to my new ACE and NME residues it's changing/minimizing all my previously placed hydrogens. This is a bit of chicken-and-egg for my default workflow which looks partially like this:

remove any ACE/NME residues because Rosetta doesn't handle them gracefully
do loop modeling (possibly with Rosetta)
place hydrogens so I can get the correct restypes
use Rosetta for structure refinement
put ACE/NME back any unmodelled loops in my structure (using code shown above)

That is, I want to do add ACE/NME after I've placed hydrogens. One approach could be to grab code from modeler.addHydrogens and add the hydrogens to the topology myself. However, there are other more elegant approaches if you were willing to implement them:

addHydrogens in OpenMM could have a more conservative mode and only add hydrogens to residues which don't have any (exposing an appropriate kw in PDBFixer's addMissingAtoms)
but the best approach might be to expose the heavyAtomsOnly option in _addAtomsToTopology to addMissingAtoms.

Any suggestions?

As an aside, I'm not sure if residues = ["NME","ACE"] is valid for OpenMM because there's a chain break implied by that sequence. This approach is ok for me in PDBFixer because it seems like it can save the PDB ok.

Another aside, I'll comment here about my getChainWithGapsFromRemark465 when I've tested it better.

peastman commented 6 years ago

Could you clarify exactly what the unwanted behavior is? Is it just that it does an energy minimization that modifies the positions of all hydrogens, including ones that already existed? Or is the problem that it's actually adding more hydrogens to residues you don't want?

If it's the former, a simple and reasonable possibility would be to change the behavior of addHydrogens() so it only minimizes the hydrogens it just added itself, not ones that already existed.

nitroamos commented 6 years ago

I would definitely not want it to change the protonation state of residues that already have hydrogens, but I didn't investigate if that's happening or not. Maybe that's implied by your proposal, which sounds great to me.

peastman commented 6 years ago

That's my question: which problem are you encountering? That it's adding hydrogens you don't want, or that it's changing the positions of existing hydrogens?

nitroamos commented 6 years ago

I wrote some sample code like this:

pp = pprint.PrettyPrinter(indent=3,width=200,compact=False)
def displayStatistics(fixer,summary):
   summary["chains"] = []
   for c in fixer.topology.chains():
      numResidues = len(list(c.residues()))
      numAtoms = len(list(c.atoms()))
      composition = defaultdict(int)

      for r in c.residues():
         atoms = set()
         for a in r.atoms():
            atoms.add(a.name)
         restype = r.name
         if restype == "HIS":
            if set(['HD1', 'HE1', 'HE2']).issubset(atoms):
               restype = 'HIP'
            elif 'HD1' in atoms:
               restype = 'HID'
            elif 'HE2' in atoms:
               restype = 'HIE'
         else:
            continue
         atoms = sorted(list(atoms))
         key = ','.join(atoms)
         key = restype+":"+key
         composition[key] += 1

      summary["chains"].append({
         "id":c.id,
         "numResidues":numResidues,
         "composition":dict(composition),
         "numAtoms":numAtoms
      })
   pp.pprint(summary)

summary = {}
fixer.findMissingResidues()
fixer.findNonstandardResidues()
fixer.replaceNonstandardResidues()
fixer.removeHeterogens(True)
fixer.findMissingAtoms()
fixer.addMissingAtoms()

idx = 1
displayStatistics(fixer,summary)

for ph in [7,7,14,0,7]:
   idx += 1
   print(idx,"pH:",ph)
   fixer.addMissingHydrogens(ph)
   displayStatistics(fixer,summary)

and here's the output:

{  'chains': [  {  'composition': {  'HID:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,N,ND1,NE2,O': 5,
                                     'HIE:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD2,HE1,HE2,N,ND1,NE2,O': 2,
                                     'HIP:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,HE2,N,ND1,NE2,O': 2},
                   'id': 'A',
                   'numAtoms': 3960,
                   'numResidues': 248},
                {'composition': {}, 'id': ' ', 'numAtoms': 120, 'numResidues': 40}]}
2 pH: 7
{  'chains': [  {  'composition': {  'HID:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,N,ND1,NE2,O': 5,
                                     'HIE:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD2,HE1,HE2,N,ND1,NE2,O': 2,
                                     'HIP:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,HE2,N,ND1,NE2,O': 2},
                   'id': 'A',
                   'numAtoms': 3969,
                   'numResidues': 248},
                {'composition': {}, 'id': ' ', 'numAtoms': 120, 'numResidues': 40}]}
3 pH: 7
{  'chains': [  {  'composition': {  'HID:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,N,ND1,NE2,O': 5,
                                     'HIE:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD2,HE1,HE2,N,ND1,NE2,O': 2,
                                     'HIP:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,HE2,N,ND1,NE2,O': 2},
                   'id': 'A',
                   'numAtoms': 3969,
                   'numResidues': 248},
                {'composition': {}, 'id': ' ', 'numAtoms': 120, 'numResidues': 40}]}
4 pH: 14
{  'chains': [  {  'composition': {  'HID:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,N,ND1,NE2,O': 5,
                                     'HIE:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD2,HE1,HE2,N,ND1,NE2,O': 2,
                                     'HIP:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,HE2,N,ND1,NE2,O': 2},
                   'id': 'A',
                   'numAtoms': 3969,
                   'numResidues': 248},
                {'composition': {}, 'id': ' ', 'numAtoms': 120, 'numResidues': 40}]}
5 pH: 0
{  'chains': [  {'composition': {'HIP:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,HE2,N,ND1,NE2,O': 9}, 'id': 'A', 'numAtoms': 4008, 'numResidues': 248},
                {'composition': {}, 'id': ' ', 'numAtoms': 120, 'numResidues': 40}]}
6 pH: 7
{  'chains': [  {'composition': {'HIP:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,HE2,N,ND1,NE2,O': 9}, 'id': 'A', 'numAtoms': 4008, 'numResidues': 248},
                {'composition': {}, 'id': ' ', 'numAtoms': 120, 'numResidues': 40}]}

This pdb had hydrogens in the starting structure. Here are the undesirable things I'm seeing:

the number of atoms never goes down. once a histidine becomes HIP, it never goes back to HIE/HID.
all hydrogens in the system are repositioned every time
the residue assignment in my starting structure is not preserved. the original hydrogens are preserved, but adding more hydrogens is changing the residue assignment.

In my thinking, the ideal approach would be to only add hydrogens to residues that don't have any hydrogens: i.e., either adds all hydrogens or adds none. If the user wants to change the pH, then you could offer to remove all hydrogens first.

peastman commented 6 years ago

the residue assignment in my starting structure is not preserved.

What do you mean by residue assignment?

nitroamos commented 6 years ago

Residues which were initially assigned to be HIE or HID are becoming HIP.

peastman commented 6 years ago

It only does that when you specify pH 0. On each of the first four calls, the assignments remain fixed. But on the fifth call it changes them all to HIP, because that's the correct variant for them to be at pH 0. Is that not what you were expecting?

Also see the API documentation, especially the following paragraph which is relevant:

A special case is when the model already contains a hydrogen that should not be present in the desired variant. If you explicitly specify a variant using the ‘variants’ parameter, the residue will be modified to match the desired variant, removing hydrogens if necessary. On the other hand, for residues whose variant is selected automatically, this function will only add hydrogens. It will never remove ones that are already present in the model, regardless of the specified pH.

nitroamos commented 6 years ago

Ah, OpenMM's behavior for automatically selected variants is intentional then -- not a bug. Sorry, I could have found and read that doc myself. 😊

To get the behavior I want, it looks like I could bypass PDBFixer to use the variants parameter, assign it all the current variant values (does PDBFixer/OpenMM have code anywhere to determine/assign residue names based on the current variant?), and then it'll only add hydrogens for residues which are missing them. Only remaining glitch is that it'll still minimize them all, but that's not a big deal to me.

peastman commented 6 years ago

it looks like I could bypass PDBFixer to use the variants parameter

Correct. addMissingHydrogens() is just a trivial wrapper around Modeller.addHydrogens(). Here's the complete implementation of it: https://github.com/pandegroup/pdbfixer/blob/master/pdbfixer/pdbfixer.py#L1019-L1022. (Though really it ought to expose the variants option too. I think that option may have been added after PDBFixer was created, and I just forgot to add it to the corresponding PDBFixer method.)

does PDBFixer/OpenMM have code anywhere to determine/assign residue names based on the current variant?

Residue names don't depend on the variant. They remain the same regardless of protonation state. Or do you mean determine the variant names based on which hydrogens are currently present? There's no routine specifically for that, though the logic is pretty simple.

nitroamos commented 6 years ago

Ok, here's a code snippet. I believe this should be included in OpenMM because a) to my knowledge there's no accepted standard for the names of these variants, b) these names are required to use variants which is a public API, and c) it relies on internals in modeler.py. But do what you like with it. 😄

p.s., I noticed that modeller.py never sets _hasLoadedStandardHydrogens = True.

def getTag(element):
   if isinstance(element,Residue):
      r = element
      a = None
   elif isinstance(element,Atom):
      r = element.residue
      a = element
   text =  "%2s%-4s %3s"%(r.chain.id,r.id+r.insertionCode,r.name)
   if a:
      text += " %4s"%a.name
   return text

def getOpenMMRestypes(fixer):
   if not app.Modeller._hasLoadedStandardHydrogens:
      app.Modeller.loadHydrogenDefinitions(os.path.join(os.path.dirname(app.__file__), 'data', 'hydrogens.xml'))
      app.Modeller._hasLoadedStandardHydrogens = True

   residueHydrogens = app.Modeller._residueHydrogens
   restypes = {}
   for r in fixer.topology.residues():
      key = getTag(r)
      if r.name not in residueHydrogens: continue
      data = residueHydrogens[r.name]
      if len(data.variants) == 0: continue
      possibilities = set(data.variants)
      hydrogens = set([a.name for a in r.atoms() if a.element == element.hydrogen])

      if len(hydrogens) == 0:
         continue

      for h in data.hydrogens:
         if h.variants is None: continue
         if h.name in hydrogens:
            possibilities = set(h.variants) & possibilities
         else:
            possibilities = possibilities - set(h.variants)

      restypes[key] = None
      if len(possibilities) == 1:
         restypes[key] = next(iter(possibilities))
      elif len(possibilities) == 0:
         print("Unknown restype variant:",key,hydrogens)
      else:
         print("Ambiguous restype variant:",key,possibilities)

   return restypes

peastman commented 6 years ago

p.s., I noticed that modeller.py never sets _hasLoadedStandardHydrogens = True.

Oops! Thanks for pointing that out.

BJWiley233 commented 2 years ago

Should HIS residues be protonated on ND1 at the default pH of 7? I thought only this was weird. I don't think it should be and so HD1 shouldn't be here?? pdb2pqr does this as well

ATOM   1039  N   HIS A  67      15.454  -4.133   8.820  1.00  0.00           N  
ATOM   1040  CA  HIS A  67      15.017  -5.334   9.509  1.00  0.00           C  
ATOM   1041  C   HIS A  67      13.787  -6.005   8.885  1.00  0.00           C  
ATOM   1042  O   HIS A  67      13.592  -7.209   9.064  1.00  0.00           O  
ATOM   1043  CB  HIS A  67      14.731  -5.060  11.013  1.00  0.00           C  
ATOM   1044  CG  HIS A  67      14.421  -6.333  11.745  1.00  0.00           C  
ATOM   1045  ND1 HIS A  67      15.333  -7.362  11.814  1.00  0.00           N  
ATOM   1046  CD2 HIS A  67      13.219  -6.848  12.177  1.00  0.00           C  
ATOM   1047  CE1 HIS A  67      14.728  -8.477  12.267  1.00  0.00           C  
ATOM   1048  NE2 HIS A  67      13.435  -8.164  12.505  1.00  0.00           N  
ATOM   1049  H   HIS A  67      15.371  -3.246   9.335  1.00  0.00           H  
ATOM   1050  HA  HIS A  67      15.836  -6.091   9.402  1.00  0.00           H  
ATOM   1051  HB2 HIS A  67      15.604  -4.556  11.490  1.00  0.00           H  
ATOM   1052  HB3 HIS A  67      13.878  -4.344  11.101  1.00  0.00           H  
ATOM   1053  HD1 HIS A  67      16.300  -7.326  11.456  1.00  0.00           H  
ATOM   1054  HD2 HIS A  67      12.244  -6.379  12.164  1.00  0.00           H  
ATOM   1055  HE1 HIS A  67      15.178  -9.454  12.353  1.00  0.00           H  
ATOM   1056  HE2 HIS A  67      12.714  -8.839  12.798  1.00  0.00           H

I would think default from pdb2pqr would be this from their docs HIE Neutral HIS, proton HE2 present. GROMACs and assigns HIS in PDB files as HIE (Really HISE) which is the same as these AMBER docs here as HIE (HIS is same thing)

EDIT: I removed all hydrogens and ran again and it looks like pdbfixer at ph=7 defaults to having the HID protonation state having HD1 rather than HE2 and no HD1 which I think is not correct? Doc I posted indicates this is "*atypical". Should be like this:

jchodera commented 2 years ago

I believe the Maestro prep wizard places the proton on whichever nitrogen better satisfies hydrogen bonds. From here:

BJWiley233 commented 2 years ago

Thanks for the resource @jchodera! I would def agree with whatever Schrodinger does 😄. Looks like most of the time at physiological pH it's going to try to make as many hydrogen-bond donors as possible. I just tested all the players on one PDB and Tinker pdbxyz to xyzpdb actually adds both HD1 and HE2, pdb2pqr only protonates ND1 (adds HD1), and pdbfixer follows same as pdb2pqr.

openmm / pdbfixer

best way to insert ACE/NME caps? #173