Open nitroamos opened 6 years ago
Could you clarify exactly what the unwanted behavior is? Is it just that it does an energy minimization that modifies the positions of all hydrogens, including ones that already existed? Or is the problem that it's actually adding more hydrogens to residues you don't want?
If it's the former, a simple and reasonable possibility would be to change the behavior of addHydrogens()
so it only minimizes the hydrogens it just added itself, not ones that already existed.
I would definitely not want it to change the protonation state of residues that already have hydrogens, but I didn't investigate if that's happening or not. Maybe that's implied by your proposal, which sounds great to me.
That's my question: which problem are you encountering? That it's adding hydrogens you don't want, or that it's changing the positions of existing hydrogens?
I wrote some sample code like this:
pp = pprint.PrettyPrinter(indent=3,width=200,compact=False)
def displayStatistics(fixer,summary):
summary["chains"] = []
for c in fixer.topology.chains():
numResidues = len(list(c.residues()))
numAtoms = len(list(c.atoms()))
composition = defaultdict(int)
for r in c.residues():
atoms = set()
for a in r.atoms():
atoms.add(a.name)
restype = r.name
if restype == "HIS":
if set(['HD1', 'HE1', 'HE2']).issubset(atoms):
restype = 'HIP'
elif 'HD1' in atoms:
restype = 'HID'
elif 'HE2' in atoms:
restype = 'HIE'
else:
continue
atoms = sorted(list(atoms))
key = ','.join(atoms)
key = restype+":"+key
composition[key] += 1
summary["chains"].append({
"id":c.id,
"numResidues":numResidues,
"composition":dict(composition),
"numAtoms":numAtoms
})
pp.pprint(summary)
summary = {}
fixer.findMissingResidues()
fixer.findNonstandardResidues()
fixer.replaceNonstandardResidues()
fixer.removeHeterogens(True)
fixer.findMissingAtoms()
fixer.addMissingAtoms()
idx = 1
displayStatistics(fixer,summary)
for ph in [7,7,14,0,7]:
idx += 1
print(idx,"pH:",ph)
fixer.addMissingHydrogens(ph)
displayStatistics(fixer,summary)
and here's the output:
{ 'chains': [ { 'composition': { 'HID:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,N,ND1,NE2,O': 5,
'HIE:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD2,HE1,HE2,N,ND1,NE2,O': 2,
'HIP:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,HE2,N,ND1,NE2,O': 2},
'id': 'A',
'numAtoms': 3960,
'numResidues': 248},
{'composition': {}, 'id': ' ', 'numAtoms': 120, 'numResidues': 40}]}
2 pH: 7
{ 'chains': [ { 'composition': { 'HID:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,N,ND1,NE2,O': 5,
'HIE:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD2,HE1,HE2,N,ND1,NE2,O': 2,
'HIP:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,HE2,N,ND1,NE2,O': 2},
'id': 'A',
'numAtoms': 3969,
'numResidues': 248},
{'composition': {}, 'id': ' ', 'numAtoms': 120, 'numResidues': 40}]}
3 pH: 7
{ 'chains': [ { 'composition': { 'HID:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,N,ND1,NE2,O': 5,
'HIE:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD2,HE1,HE2,N,ND1,NE2,O': 2,
'HIP:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,HE2,N,ND1,NE2,O': 2},
'id': 'A',
'numAtoms': 3969,
'numResidues': 248},
{'composition': {}, 'id': ' ', 'numAtoms': 120, 'numResidues': 40}]}
4 pH: 14
{ 'chains': [ { 'composition': { 'HID:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,N,ND1,NE2,O': 5,
'HIE:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD2,HE1,HE2,N,ND1,NE2,O': 2,
'HIP:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,HE2,N,ND1,NE2,O': 2},
'id': 'A',
'numAtoms': 3969,
'numResidues': 248},
{'composition': {}, 'id': ' ', 'numAtoms': 120, 'numResidues': 40}]}
5 pH: 0
{ 'chains': [ {'composition': {'HIP:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,HE2,N,ND1,NE2,O': 9}, 'id': 'A', 'numAtoms': 4008, 'numResidues': 248},
{'composition': {}, 'id': ' ', 'numAtoms': 120, 'numResidues': 40}]}
6 pH: 7
{ 'chains': [ {'composition': {'HIP:C,CA,CB,CD2,CE1,CG,H,HA,HB2,HB3,HD1,HD2,HE1,HE2,N,ND1,NE2,O': 9}, 'id': 'A', 'numAtoms': 4008, 'numResidues': 248},
{'composition': {}, 'id': ' ', 'numAtoms': 120, 'numResidues': 40}]}
This pdb had hydrogens in the starting structure. Here are the undesirable things I'm seeing:
In my thinking, the ideal approach would be to only add hydrogens to residues that don't have any hydrogens: i.e., either adds all hydrogens or adds none. If the user wants to change the pH, then you could offer to remove all hydrogens first.
the residue assignment in my starting structure is not preserved.
What do you mean by residue assignment?
Residues which were initially assigned to be HIE or HID are becoming HIP.
It only does that when you specify pH 0. On each of the first four calls, the assignments remain fixed. But on the fifth call it changes them all to HIP, because that's the correct variant for them to be at pH 0. Is that not what you were expecting?
Also see the API documentation, especially the following paragraph which is relevant:
A special case is when the model already contains a hydrogen that should not be present in the desired variant. If you explicitly specify a variant using the ‘variants’ parameter, the residue will be modified to match the desired variant, removing hydrogens if necessary. On the other hand, for residues whose variant is selected automatically, this function will only add hydrogens. It will never remove ones that are already present in the model, regardless of the specified pH.
Ah, OpenMM's behavior for automatically selected variants is intentional then -- not a bug. Sorry, I could have found and read that doc myself. 😊
To get the behavior I want, it looks like I could bypass PDBFixer to use the variants
parameter, assign it all the current variant values (does PDBFixer/OpenMM have code anywhere to determine/assign residue names based on the current variant?), and then it'll only add hydrogens for residues which are missing them. Only remaining glitch is that it'll still minimize them all, but that's not a big deal to me.
it looks like I could bypass PDBFixer to use the variants parameter
Correct. addMissingHydrogens()
is just a trivial wrapper around Modeller.addHydrogens()
. Here's the complete implementation of it: https://github.com/pandegroup/pdbfixer/blob/master/pdbfixer/pdbfixer.py#L1019-L1022. (Though really it ought to expose the variants
option too. I think that option may have been added after PDBFixer was created, and I just forgot to add it to the corresponding PDBFixer method.)
does PDBFixer/OpenMM have code anywhere to determine/assign residue names based on the current variant?
Residue names don't depend on the variant. They remain the same regardless of protonation state. Or do you mean determine the variant names based on which hydrogens are currently present? There's no routine specifically for that, though the logic is pretty simple.
Ok, here's a code snippet. I believe this should be included in OpenMM
because a) to my knowledge there's no accepted standard for the names of these variants, b) these names are required to use variants
which is a public API, and c) it relies on internals in modeler.py
. But do what you like with it. 😄
p.s., I noticed that modeller.py
never sets _hasLoadedStandardHydrogens = True
.
def getTag(element):
if isinstance(element,Residue):
r = element
a = None
elif isinstance(element,Atom):
r = element.residue
a = element
text = "%2s%-4s %3s"%(r.chain.id,r.id+r.insertionCode,r.name)
if a:
text += " %4s"%a.name
return text
def getOpenMMRestypes(fixer):
if not app.Modeller._hasLoadedStandardHydrogens:
app.Modeller.loadHydrogenDefinitions(os.path.join(os.path.dirname(app.__file__), 'data', 'hydrogens.xml'))
app.Modeller._hasLoadedStandardHydrogens = True
residueHydrogens = app.Modeller._residueHydrogens
restypes = {}
for r in fixer.topology.residues():
key = getTag(r)
if r.name not in residueHydrogens: continue
data = residueHydrogens[r.name]
if len(data.variants) == 0: continue
possibilities = set(data.variants)
hydrogens = set([a.name for a in r.atoms() if a.element == element.hydrogen])
if len(hydrogens) == 0:
continue
for h in data.hydrogens:
if h.variants is None: continue
if h.name in hydrogens:
possibilities = set(h.variants) & possibilities
else:
possibilities = possibilities - set(h.variants)
restypes[key] = None
if len(possibilities) == 1:
restypes[key] = next(iter(possibilities))
elif len(possibilities) == 0:
print("Unknown restype variant:",key,hydrogens)
else:
print("Ambiguous restype variant:",key,possibilities)
return restypes
p.s., I noticed that modeller.py never sets _hasLoadedStandardHydrogens = True.
Oops! Thanks for pointing that out.
Should HIS
residues be protonated on ND1
at the default pH of 7? I thought only this was weird. I don't think it should be and so HD1
shouldn't be here?? pdb2pqr does this as well
ATOM 1039 N HIS A 67 15.454 -4.133 8.820 1.00 0.00 N
ATOM 1040 CA HIS A 67 15.017 -5.334 9.509 1.00 0.00 C
ATOM 1041 C HIS A 67 13.787 -6.005 8.885 1.00 0.00 C
ATOM 1042 O HIS A 67 13.592 -7.209 9.064 1.00 0.00 O
ATOM 1043 CB HIS A 67 14.731 -5.060 11.013 1.00 0.00 C
ATOM 1044 CG HIS A 67 14.421 -6.333 11.745 1.00 0.00 C
ATOM 1045 ND1 HIS A 67 15.333 -7.362 11.814 1.00 0.00 N
ATOM 1046 CD2 HIS A 67 13.219 -6.848 12.177 1.00 0.00 C
ATOM 1047 CE1 HIS A 67 14.728 -8.477 12.267 1.00 0.00 C
ATOM 1048 NE2 HIS A 67 13.435 -8.164 12.505 1.00 0.00 N
ATOM 1049 H HIS A 67 15.371 -3.246 9.335 1.00 0.00 H
ATOM 1050 HA HIS A 67 15.836 -6.091 9.402 1.00 0.00 H
ATOM 1051 HB2 HIS A 67 15.604 -4.556 11.490 1.00 0.00 H
ATOM 1052 HB3 HIS A 67 13.878 -4.344 11.101 1.00 0.00 H
ATOM 1053 HD1 HIS A 67 16.300 -7.326 11.456 1.00 0.00 H
ATOM 1054 HD2 HIS A 67 12.244 -6.379 12.164 1.00 0.00 H
ATOM 1055 HE1 HIS A 67 15.178 -9.454 12.353 1.00 0.00 H
ATOM 1056 HE2 HIS A 67 12.714 -8.839 12.798 1.00 0.00 H
I would think default from pdb2pqr would be this from their docs HIE Neutral HIS, proton HE2 present
. GROMACs and assigns HIS in PDB files as HIE (Really HISE) which is the same as these AMBER docs here as HIE (HIS is same thing)
EDIT: I removed all hydrogens and ran again and it looks like pdbfixer at ph=7 defaults to having the HID protonation state having HD1
rather than HE2
and no HD1
which I think is not correct? Doc I posted indicates this is "*atypical". Should be like this:
I believe the Maestro prep wizard places the proton on whichever nitrogen better satisfies hydrogen bonds. From here:
Thanks for the resource @jchodera! I would def agree with whatever Schrodinger does 😄. Looks like most of the time at physiological pH it's going to try to make as many hydrogen-bond donors as possible. I just tested all the players on one PDB and Tinker pdbxyz to xyzpdb
actually adds both HD1 and HE2, pdb2pqr only protonates ND1 (adds HD1), and pdbfixer follows same as pdb2pqr.
I've got code that does something like this:
where my
displayMissingResidues
has a section that looks like this:this seems to work pretty well except that when I call
addMissingHydrogens
to add hydrogens to my new ACE and NME residues it's changing/minimizing all my previously placed hydrogens. This is a bit of chicken-and-egg for my default workflow which looks partially like this:That is, I want to do add ACE/NME after I've placed hydrogens. One approach could be to grab code from modeler.addHydrogens and add the hydrogens to the topology myself. However, there are other more elegant approaches if you were willing to implement them:
addHydrogens
in OpenMM could have a more conservative mode and only add hydrogens to residues which don't have any (exposing an appropriate kw in PDBFixer'saddMissingAtoms
)heavyAtomsOnly
option in _addAtomsToTopology toaddMissingAtoms
.Any suggestions?
As an aside, I'm not sure if
residues = ["NME","ACE"]
is valid for OpenMM because there's a chain break implied by that sequence. This approach is ok for me in PDBFixer because it seems like it can save the PDB ok.Another aside, I'll comment here about my
getChainWithGapsFromRemark465
when I've tested it better.