Open sukritsingh opened 1 year ago
Lots of tests are failing with errors like this:
Exception: Multiple non-identical matching templates found for residue 4 (ASP): ASH, ASP.
With the original version, we manually removed the duplicates in #132 (see also #131). We could try to add logic to the script to automatically remove duplicates, but just doing it by hand again is probably simplest.
I looked at the failure log and it looks like there were only four residues recognized as duplicates are causing the failures: LYN, CYM, GLH, and ASH.
I've deleted these four residue templates like was done in #132 (but in #132 a different set of residues were deleted than here, interestingly).
Hopefully this fixes things!
There are a few others that need to be removed too: CYX, HID, HIE. Also rename HIP to HIS. (HID, HIE, and HIP are all variants of histidine which are identical after removing hydrogens.) Then for each residue you've removed, there are separate templates corresponding to N- and C- terminal residues. Their names are the same but with N or C at the beginning. The same changes need to be made to them. If you're not sure about something, take a look at the diff of #132.
Ah I see! Didn't want to mess around and delete established residues so I wasn't sure. I've gone ahead and deleted CYX, HID, HIE, and renamed HIP->HIS. I then did the same but for N- and C-terminal versions of any of these residues (or the ones I deleted in the previous commit).
I went through the diff at #132 and I think all the residues deleted there are now correspondingly removed from soft.xml here. I can't think of anything else that would cause issues?
This all looks good now. I assume you've tested it on a file containing those residues to make sure it works? I would suggest adding one to test_build_and_simulate.py
, but it would just replace them with standard residues since that's still the default behavior for them.
Oh I haven't yet! I do have a structure of a kinase (attached below) and I'll test it on that...the only test I can think of at the moment is to load it in, protonate it, and then simulate it...anything else I should try out?
That's fine. Just to make sure it's able to build missing residues of the new types correctly.
Hm, looks like protonation is an issue...
I just ran:
fixer = pdbfixer.PDBFixer("/Users/singhs15/Downloads/pser-ripk2-conf1.pdb")
fixer.findMissingAtoms()
fixer.addMissingAtoms()
pH = 7.0
ionic = 50.0 * unit.millimolar
box = 10.0 * unit.angstrom
positiveIon = 'Na+'
negativeIon = 'Cl-'
fixer.addMissingHydrogens(pH)
app.PDBFile.writeFile(fixer.topology, fixer.positions)
and it wrote out a PDB file without complaint, but SEP isn't protonated (every other residue IS protonated though). In retrospect this isn't surprising since I don't think I ever specified anything about protonation of SEP residues?
Should the template of SEP I include just have the protons included by default? I didn't include hydrogens in SEP.pdb because I thought none of the template PDB files should be protonated?
Adding hydrogens is done by OpenMM's Modeller class. It has its own internal specification of what hydrogens to add to what residues, and of course it doesn't include these two. You can call loadHydrogenDefinitions()
to load a file with additional residues that you create yourself. Here's some relevant documentation.
http://docs.openmm.org/latest/userguide/application/03_model_building_editing.html#adding-hydrogens http://docs.openmm.org/latest/api-python/generated/openmm.app.modeller.Modeller.html#openmm.app.modeller.Modeller.loadHydrogenDefinitions
And here's the file with the built in definitions. The format should hopefully be clear.
https://github.com/openmm/openmm/blob/master/wrappers/python/openmm/app/data/hydrogens.xml
This does raise an issue: this feature can't be fully implemented just by changes to PDBFixer. If you just need it for your own scripts, of course, it's fine to put a call to loadHydrogenDefinitions()
at the start. But it will have limited value for other people.
Ah! I see! Makes sense --- let me try using it and see if I can get it working using either the amber/phosaa14
forcefield or if I need to make a custom .xml
file. Good to know at the very least that it is possible to use!
This does raise an issue: this feature can't be fully implemented just by changes to PDBFixer. If you just need it for your own scripts, of course, it's fine to put a call to loadHydrogenDefinitions() at the start. But it will have limited value for other people.
Hrmm the intention of this whole addition was to have it be useful for some other folks besides me as well. Also, it is rather nice to be able to auto-build in the heavy atoms of phosphorylated residues into protein chains rather than using other closed-source tools like Maestro where atom names/orders may be incompatible, so I see no reason not to at least be able to include these new residues.
How about a middle ground solution: Rather than modifying OpenMM perhaps we can just include an appropriate .xml
file and some instructions in one of the directories (maybe just in pdbfixer
) for how to include and use PTM residues? Happy to try creating/testing a set out and including a script + README.
If I understand correctly, @sukritsingh's goal here was to permit users to use pdbfixer
to introduce phosphorylations---a very common form of post-translational modification---via the mutation capability. To support that, we should be able to simply have pdbfixer load the appropriate hydrogen definitions as needed for the mutation residues supported.
Would pdbfixer
write out the CONECT
records for these residues since they are non-standard? If so, since amber/phosaa14
supports these residues, we wouldn't need to change anything in OpenMM---would we?
More generally, we might want to make pdbfixer
aware of all the chemical components available in the PDB via their three-letter codes using the Chemical Component Dictionary, which we could automatically process and use to (1) fill in missing bonds, (2) fill in missing atoms, and (3) allow user-specified mutations to any of these known three-letter codes.
we should be able to simply have pdbfixer load the appropriate hydrogen definitions as needed for the mutation residues supported.
True, we could make make it automatically load its own definitions.
Update: I've gotten it to to load the hydrogen definitions (I think) ok, but I'm encountering an issue about bonds being different?
Mutation from SER to SEP works when I run the following:
import pdbfixer.pdbfixer as pdbf
fixer = pdbf.PDBFixer("test-ser-noPhos.pdb")
fixer.applyMutations(["SER-173-SEP"], "A")
history
fixer.findMissingResidues()
fixer.findMissingAtoms()
fixer.addMissingAtoms()
I then load in a sep-hydrogens.xml
file that seems to load up without complaint:
from openmm import app
model = app.modeller.Modeller(fixer.topology, fixer.positions)
model.loadHydrogenDefinitions("sep-hydrogens.xml")
forcefield = app.ForceField('amber/ff14SB.xml', 'tip3p.xml', 'amber/phosaa14SB.xml')
model.addHydrogens(forcefield)
However, when I run the final model.addHydrogens(forcefield)
line above, the following error occurs:
ValueError: No template found for residue 173 (SEP). The set of atoms matches SEP, but the bonds are different.
Correct me if I'm wrong, but does this mean that the SEP listed in the amber/phosaa14SB.xml
force field does not match the template I downloaded from online? If so should I just copy over the SEP listed there as my template? Edit: Actually that doesn't make sense right - I generated the soft.xml
using that SEP template, why would it complain about bonds being different now?
Ah! I think it's related to this comment above:
Would pdbfixer write out the CONECT records for these residues since they are non-standard? If so, since amber/phosaa14 supports these residues, we wouldn't need to change anything in OpenMM---would we?
TLDR: Any advice on what to change so PDBFixer writes out the full CONECT
records?
In the example file I'm using for all this (attached below), if you you mutate SER->SEP using:
fixer = pdbf.PDBFixer("test-ser-noPhos.pdb")
fixer.applyMutations(["SER-173-SEP"], "A")
fixer.findMissingResidues()
fixer.findMissingAtoms()
fixer.addMissingAtoms()
It does NOT write out the complete CONECT
records for the residue. What it writes out is (found in mutate-ser-sep-wrong-conect.pdb
:
CONECT 1392 1398
CONECT 1398 1392 1399
CONECT 1399 1400 1402 1398
CONECT 1400 1408 1399 1401 1408
CONECT 1401 1400
CONECT 1402 1399 1403
CONECT 1403 1402
CONECT 1408 1400 1400
What it should write out is (Edit: Found in mutate-ser-sep-correct-conect.pdb
):
CONECT 1392 1398
CONECT 1398 1392 1399
CONECT 1399 1398 1400 1402
CONECT 1400 1399 1401 1408
CONECT 1401 1400
CONECT 1402 1399 1403
CONECT 1403 1402 1404
CONECT 1404 1403 1405 1406 1407
CONECT 1405 1404
CONECT 1406 1404
CONECT 1407 1404
CONECT 1408 1400
I think because it's non-standard it's only writing out the conect records for the standard part of the SEP residue.
On the bright side, as soon as you use the correct CONECT
records, mutation + protonation becomes trivial (no need for a separate .xml
file)
Any insight on what changes to make to allow the correct CONECT records to be written?
PDBFile decides which bonds to write based on the name of the residue:
for atom1, atom2 in topology.bonds():
if atom1.residue.name not in PDBFile._standardResidues or atom2.residue.name not in PDBFile._standardResidues:
conectBonds.append((atom1, atom2))
If those bonds aren't being written, that suggests they aren't present in the topology.
Ah ok! Just to clarify the code snippet, it seems like anything outside the standard residue set (ie things involving HETATOMS) would be written out as CONECT
records right? Based on what's being written it seems like the phosphate group's CONECT
records aren't being written out, even though the phosphate atoms are being written out in the .pdb
file....
Would the topology information be contained with the residue record for SEP in soft.xml
? I'm looking at the residue record now for SEP and the bonds seem to connect correctly....
<Residue name="SEP">
<Atom name="N" type="N"/>
<Atom name="CA" type="CX"/>
<Atom name="CB" type="2C"/>
<Atom name="OG" type="OZ"/>
<Atom name="P" type="P"/>
<Atom name="O1P" type="OX"/>
<Atom name="O2P" type="OX"/>
<Atom name="O3P" type="OX"/>
<Atom name="C" type="C"/>
<Atom name="O" type="O"/>
<Bond from="0" to="1"/>
<Bond from="1" to="8"/>
<Bond from="1" to="2"/>
<Bond from="2" to="3"/>
<Bond from="3" to="4"/>
<Bond from="4" to="5"/>
<Bond from="4" to="7"/>
<Bond from="4" to="6"/>
<Bond from="8" to="9"/>
<ExternalBond from="0"/>
<ExternalBond from="8"/>
</Residue>
Could there be confusion somehow because half the SEP topology is a standard residue, and then there's a few more atoms attached, or that the topology isn't being updated properly? Behaviorally, it's almost like it created CONECT
records for residues that are standard, but ignored the bonds that occur at the post-translational modification...
There's no such thing as a "half standard residue". A residue is either a heterogen or it isn't. Here's what the spec says about CONECT records:
CONECT records are present for:
- Intra-residue connectivity within non-standard (HET) residues (excluding water).
- Inter-residue connectivity of HET groups to standard groups (including water) or to other HET groups.
- Disulfide bridges specified in the SSBOND records have corresponding records.
Since SEP is not a standard amino acid, the entire residue is a heterogen and all bonds in it need to be listed. PDBFile decides what to write based on the residue name. Since 'SEP' is not contained in PDBFile._standardResidues
, it's a heterogen and all bonds within it will be written out. I think the problem must be at an earlier point. The Topology being written to the file doesn't contain those bonds.
Ahh ok thanks for clarifying! Sorry for all the questions - just trying to parse what's going on....
It looks like, just as you mentioned, the entire residue (pre-protonation) is being written out as heterogens (with HETATM
labels)
HETATM 1398 N SEP A 173 -66.676 -3.794 -15.914 1.00 0.00 N
HETATM 1399 CA SEP A 173 -68.120 -3.850 -16.128 1.00 0.00 C
HETATM 1400 C SEP A 173 -68.716 -2.462 -16.158 1.00 0.00 C
HETATM 1401 O SEP A 173 -69.923 -2.345 -16.568 1.00 0.00 O
HETATM 1402 CB SEP A 173 -68.801 -4.739 -15.058 1.00 0.00 C
HETATM 1403 OG SEP A 173 -68.721 -4.186 -13.740 1.00 0.00 O
HETATM 1404 P SEP A 173 -68.939 -5.268 -12.397 1.00 0.00 P
HETATM 1405 O1P SEP A 173 -70.229 -5.944 -12.659 1.00 0.00 O
HETATM 1406 O2P SEP A 173 -68.036 -6.189 -11.434 1.00 0.00 O
HETATM 1407 O3P SEP A 173 -69.213 -3.851 -11.684 1.00 0.00 O
It looks like all the atoms do get added, but the bonds aren't rebuilt, which would be happening in _addAtomsToTopology
right? I guess I should first check where it's drawing the bond information from - Does it draw the bond connectivity from the template PDB file? or soft.xml
?
Does it draw the bond connectivity from the template PDB file?
It's from the template. That's the problem: your template doesn't have CONECT records. The existing templates don't need them because they're standard residues. SEP and PTR aren't, so they do need them.
Ok great! Let me try adding the correct CONECT records into the template
Thanks so much for all your help!
Update: I added in CONECT
records to the template for SEP.pdb
, including explicit CONECT
records for the phosphate group:
HETATM 1398 N SEP A 1 -66.676 -3.794 -15.914 1.00 0.00 N
HETATM 1399 CA SEP A 1 -68.120 -3.850 -16.128 1.00 0.00 C
HETATM 1400 C SEP A 1 -68.716 -2.462 -16.158 1.00 0.00 C
HETATM 1401 O SEP A 1 -69.923 -2.345 -16.568 1.00 0.00 O
HETATM 1402 CB SEP A 1 -68.801 -4.739 -15.058 1.00 0.00 C
HETATM 1403 OG SEP A 1 -68.721 -4.186 -13.740 1.00 0.00 O
HETATM 1404 P SEP A 1 -68.939 -5.268 -12.397 1.00 0.00 P
HETATM 1405 O1P SEP A 1 -70.229 -5.944 -12.659 1.00 0.00 O
HETATM 1406 O2P SEP A 1 -68.036 -6.189 -11.434 1.00 0.00 O
HETATM 1407 O3P SEP A 1 -69.213 -3.851 -11.684 1.00 0.00 O
TER 1408 SEP A 1
CONECT 1398 1392 1399
CONECT 1399 1398 1400 1402
CONECT 1400 1399 1401 1408
CONECT 1401 1400
CONECT 1402 1399 1403
CONECT 1403 1402 1404
CONECT 1404 1403 1405 1406 1407
CONECT 1405 1404
CONECT 1406 1404
CONECT 1407 1404
END
and it looks good when I open it up in pymol:
However, if I run the following commands to introduce a mutation:
import pdbfixer.pdbfixer as pdbf
fixer = pdbf.PDBFixer("../sep-testing/test-ser-noPhos.pdb")
fixer.applyMutations(["SER-173-SEP"], "A")
fixer.findMissingResidues()
fixer.findMissingAtoms()
fixer.addMissingAtoms()
from openmm import app
app.PDBFile.writeFile(fixer.topology, fixer.positions, open("./test-mutate.pdb", "w"))
then the same behavior shows up where the atoms are added in but the bonds are for the phosphate aren't written out:
I felt like I'm missing some simple thing? I'm trying to run through addMissingAtoms
and _addAtomsToTopology
to see what I may be missing that's causing this.
You can print out all the bonds involving the residue with something like this:
print([(a1, a2) for a1, a2 in fixer.topology.bonds() if 'SEP' in [a1.residue.name, a2.residue.name]])
Print that at each point in the script and see whether the residue contains the bonds you expect it to at that point.
Finally returning to this (so sorry for the delay - I got buried in K99 grant writing right after that last post and it was my first time writing an NIH grant!)
So I went through and ran through the above commands line by line and then ran:
print([(a1, a2) for a1, a2 in fixer.topology.bonds() if 'SEP' in [a1.residue.name, a2.residue.name]])
to write out the bonds after each step. The problem bonds are missing occurred right after I run:
fixer.applyMutations(["SER-173-SEP"], "A")
I've pasted out the bonds below for completeness, but to summarize, it's only got the bonds of the original SER residue that was in that position. It does not seem to have any record of the added bonds when I run applyMutations
, even though the template for SEP (pasted above) contains the appropriate CONECT
records.
Should the bonds be recognized right after applying the mutation? I had assumed so but in looking at applyMutations
now within pdbfixer.py
all it does remap the residues based on the mutation list, and then deletes unneeded atoms. I then figured that bonds would be added in fixer.addMissingAtoms
, but nothing appears to alter the output of the print
statement to append the appropriate bonds.
Any advice as to what to next try?
Bonds output from the print
statement pasted above (same across all lines after running applyMutations
):
[(<Atom 1391 (C) of chain 0 residue 171 (LEU)>, <Atom 1397 (N) of chain 0 residue 172 (SEP)>), (<Atom 1399 (C) of chain 0 residue 172 (SEP)>, <Atom 1398 (CA) of chain 0 residue 172 (SEP)>), (<Atom 1399 (C) of chain 0 residue 172 (SEP)>, <Atom 1400 (O) of chain 0 residue 172 (SEP)>), (<Atom 1398 (CA) of chain 0 residue 172 (SEP)>, <Atom 1401 (CB) of chain 0 residue 172 (SEP)>), (<Atom 1398 (CA) of chain 0 residue 172 (SEP)>, <Atom 1397 (N) of chain 0 residue 172 (SEP)>), (<Atom 1401 (CB) of chain 0 residue 172 (SEP)>, <Atom 1402 (OG) of chain 0 residue 172 (SEP)>), (<Atom 1399 (C) of chain 0 residue 172 (SEP)>, <Atom 1403 (N) of chain 0 residue 173 (GLN)>)]
This is the bonds output from running addMissingAtoms()
- it also does not add any of the sidechain bonds, but weirdly shuffles around the order in which the bonds are printed?
[(<Atom 1399 (C) of chain 0 residue 172 (SEP)>, <Atom 1407 (N) of chain 0 residue 173 (GLN)>), (<Atom 1391 (C) of chain 0 residue 171 (LEU)>, <Atom 1397 (N) of chain 0 residue 172 (SEP)>), (<Atom 1399 (C) of chain 0 residue 172 (SEP)>, <Atom 1398 (CA) of chain 0 residue 172 (SEP)>), (<Atom 1399 (C) of chain 0 residue 172 (SEP)>, <Atom 1400 (O) of chain 0 residue 172 (SEP)>), (<Atom 1398 (CA) of chain 0 residue 172 (SEP)>, <Atom 1401 (CB) of chain 0 residue 172 (SEP)>), (<Atom 1398 (CA) of chain 0 residue 172 (SEP)>, <Atom 1397 (N) of chain 0 residue 172 (SEP)>), (<Atom 1401 (CB) of chain 0 residue 172 (SEP)>, <Atom 1402 (OG) of chain 0 residue 172 (SEP)>), (<Atom 1399 (C) of chain 0 residue 172 (SEP)>, <Atom 1407 (N) of chain 0 residue 173 (GLN)>)]
Small update: Dug around a bit more and noticed that if I run:
newTopology, newPositions, newAtoms, existingAtomMap = fixer._addAtomsToTopology(True, True)
Then the newToplogy
object also does not contain the SEP
bonds, even though I can verify the bonds are in the templates using
template = fixer.templates["SEP"]
print([(a1, a2) for a1, a2 in template.topology.bonds() if 'SEP' in [a1.residue.name, a2.residue.name]])
Interestingly, I also noticed that the existingAtomMap
created by _addAtomToTopology
does not contain any Atoms mapped from the template (although I would expect them to based on the docstring)
The section for residue SEP
in existingAtomMap
looks like:
<Atom 1397 (N) of chain 0 residue 172 (SEP)>: <Atom 1397 (N) of chain 0 residue 172 (SEP)>,
<Atom 1398 (CA) of chain 0 residue 172 (SEP)>: <Atom 1398 (CA) of chain 0 residue 172 (SEP)>,
<Atom 1399 (C) of chain 0 residue 172 (SEP)>: <Atom 1399 (C) of chain 0 residue 172 (SEP)>,
<Atom 1400 (O) of chain 0 residue 172 (SEP)>: <Atom 1400 (O) of chain 0 residue 172 (SEP)>,
<Atom 1401 (CB) of chain 0 residue 172 (SEP)>: <Atom 1401 (CB) of chain 0 residue 172 (SEP)>,
<Atom 1402 (OG) of chain 0 residue 172 (SEP)>: <Atom 1402 (OG) of chain 0 residue 172 (SEP)>,
<Atom 1403 (N) of chain 0 residue 173 (GLN)>: <Atom 1407 (N) of chain 0 residue 173 (GLN)>,
with no mention of the new atoms. Given that existingAtomMap
is used for the creation of new bonds in _addAtomsToTopology
- could this be why it's never building the CONECT
records?
This PR provides the following changes in reference to #259 :
pdbfixer/pdbfixer.py
is updated to add these residues as well.devtools/createSoftForcefield.py
for the current state ofatomType
objectspdbfixer/soft.xml
file using the newly updated scriptdevtools/createSoftForcefield.py