openmm / pdbfixer

PDBFixer fixes problems in PDB files
Other
461 stars 115 forks source link

PDBFixer not writing out complete CONECT records for downloaded Templates #297

Closed sukritsingh closed 1 month ago

sukritsingh commented 1 month ago

With #296 being merged in, I wanted to revisit trying to model in a serine to phosphoserine mutation. For OpenMM simulations, my understanding is that it's the CONECT records should be written out for the entirety of a non-standard amino acid to ensure appropriate template matching occurs with a given forcefield (please correct me if I'm wrong!). However, it seems like the full CONECT records for a non-standard SEP residue aren't written out. Detailed below:

Starting with the following starting file: test-ser-noPhos.pdb.zip

I downloaded the latest version of this release from github and ran the following code:

from pdbfixer import PDBFixer
from openmm.app import PDBFile
fixer.downloadTemplate('SEP')
fixer = PDBFixer(filename='test-ser-noPhos.pdb')
fixer.downloadTemplate('SEP')
fixer.downloadTemplate('SEP')
fixer.applyMutations(["SER-173-SEP"], "A")
fixer.findMissingResidues()
fixer.findMissingAtoms()
fixer.addMissingAtoms()
PDBFile.writeFile(fixer.topology, fixer.positions, open('test-sep-output.pdb', 'w'))

When I visualize the structure, I noticed that, as happening in #284, the atoms were added to the PDB file, but the CONECT records weren't being explicitly added for the phosphate group. The current output for the CONECT records at the bottom of the test-sep-output.pdb is:

TER    2519      LYS A 310
CONECT 1392 1398
CONECT 1398 1392 1399
CONECT 1399 1400 1402 1398
CONECT 1400 1408 1399 1401 1408
CONECT 1401 1400
CONECT 1402 1399 1403
CONECT 1403 1402
CONECT 1408 1400 1400
END

Visualized below is the SEP residue in pymol, the atoms are present (pink dots) but aren't being rendered with bonds due to lack of CONECT records: image

What I would expect for a complete set of CONECT records for SEP would be something like this:

CONECT 1392 1398
CONECT 1398 1392 1399
CONECT 1399 1398 1400 1402
CONECT 1400 1399 1401 1408
CONECT 1401 1400
CONECT 1402 1399 1403
CONECT 1403 1402 1404
CONECT 1404 1403 1405 1406 1407
CONECT 1405 1404
CONECT 1406 1404
CONECT 1407 1404
CONECT 1408 1400

Which yields this in Pymol: image

This might just be a point of clarification for me I suppose - should I expect complete CONECT records written out for a non-standard residue? If not, then I guess this is expected behavior, but I was under the impression that I would want the full set of a CONECT records for a residue like SEP since the phosphate group are written out as HETATM.

sukritsingh commented 1 month ago

Update: This code does work for Tyrosine -> Phosphotyrosine mutations! 🎉

When I take the same starting file above and run the following code:

tyrfixer = PDBFixer(filename='../sep-testing/test-tyr-noPhos.pdb')
tyrfixer.downloadTemplate("PTR")
tyrfixer.applyMutations(["TYR-189-PTR"], "A")
tyrfixer.findMissingResidues()
tyrfixer.findMissingAtoms()
tyrfixer.addMissingAtoms()
PDBFile.writeFile(tyrfixer.topology, tyrfixer.positions, open('../sep-testing/test-ptr-output.pdb', 'w'))

The CONECT records and visualization for the output are both consistent with my expectations:

TER    2519      LYS A 310
CONECT 1503 1509
CONECT 1509 1503 1510
CONECT 1510 1511 1513 1509
CONECT 1511 1525 1510 1512 1525
CONECT 1512 1511
CONECT 1513 1510 1514
CONECT 1514 1513 1515 1516
CONECT 1515 1517 1514
CONECT 1516 1518 1514
CONECT 1517 1515 1519
CONECT 1518 1516 1519
CONECT 1519 1517 1518 1520
CONECT 1520 1519
CONECT 1525 1511 1511
END

Visualized PTR residue in Pymol: image

This led me to wonder if there was an issue with the SEP template being downloaded and used, so I went and downloaded from the ligand-expo, specifically the "ideal coordinates" version. Weirdly, the 'ideal coordinates" version of SEP (SEP_ideal.pdb) does have the CONECT records I expect:

CONECT    1    2   12   13
CONECT    2    1    3    5   14
CONECT    3    2    4   15   16
CONECT    4    3    8
CONECT    5    2    6    7
CONECT    6    5
CONECT    7    5   17
CONECT    8    4    9   10   11
CONECT    9    8
CONECT   10    8   18
CONECT   11    8   19
CONECT   12    1
CONECT   13    1
CONECT   14    2
CONECT   15    3
CONECT   16    3
CONECT   17    7
CONECT   18   10
CONECT   19   11
END   

Summary points:

  1. PTR worked fine but SEP did not for some reason with the same block of code. This points to an issue in the SEP template vs the PTR template. But the SEP template on the ligand expo does have the CONECT records.
  2. Is here a way to verify what the template looks like upon downloading? It might be useful to check if there's another SEP template being downloaded (which might be missing the CONECT records)

Edit: Added image of "as expected" PTR residue output

peastman commented 1 month ago

Strange, I have no idea why it would work for PTR but not SEP. The file it downloads is

https://github.com/openmm/pdbfixer/blob/92044f3c524d21ee1e79f53c54882201ff2b5b34/pdbfixer/pdbfixer.py#L402

where name is the three letter residue code.

I'll take a look and see if I can figure out what's happening.

peastman commented 1 month ago

I think the problem is in these lines:

https://github.com/openmm/pdbfixer/blob/92044f3c524d21ee1e79f53c54882201ff2b5b34/pdbfixer/pdbfixer.py#L579-L583

existingAtomMap is a dict mapping from atoms in the original Topology to atoms in the new Topology being created. The intent here is to preserve any bonds that were already present in heterogens. But the effect is that it doesn't add any bonds between atoms that weren't already present. That was never an issue before, because we only ever added new heavy atoms to standard residues. Let me see what I can do to fix it.

peastman commented 1 month ago

The fix is in #298. Give it a try and see if it works now.

peastman commented 1 month ago

Does the thumbs up mean you checked and it works, or just that you're going to test? Emoji can be unclear. 😀

sukritsingh commented 1 month ago

Hah sorry for the lack of clarity - that was just an "acknowledgement" that I've seen it (had to step out for meetings all afternoon afterwards). Repeating tests with #298 now - will update!

peastman commented 1 month ago

And I bet you're wondering why I posted an emoji of someone opening their mouth to have dental work done! Or are they about to bite down on a carrot? Really hard to tell.

sukritsingh commented 1 month ago

Can't wait till the day github lets us insert gifs into posts - then we'll have even more ways to create confusion (sorry again!)

sukritsingh commented 1 month ago

I'll cross post into #298 but looks like the fix worked! I ran the following commands:

from pdbfixer import PDBFixer
from openmm.app import PDBFile
fixer = PDBFixer(filename='../sep-testing/test-ser-noPhos.pdb')
fixer.downloadTemplate('SEP')
fixer.applyMutations(["SER-173-SEP"], "A")
fixer.findMissingResidues()
fixer.findMissingAtoms()
fixer.addMissingAtoms()
PDBFile.writeFile(fixer.topology, fixer.positions, open('../sep-testing/test-sep-output.pdb', 'w'))

And the visualization appears as expected! image

The CONECT records are more explicitly complete as well, which is what I would have expected:

TER    2519      LYS A 310
CONECT 1392 1398
CONECT 1398 1392 1399
CONECT 1399 1400 1402 1398
CONECT 1400 1408 1399 1401 1408
CONECT 1401 1400
CONECT 1402 1399 1403
CONECT 1403 1402 1404
CONECT 1404 1403 1405 1406 1407
CONECT 1405 1404
CONECT 1406 1404
CONECT 1407 1404
CONECT 1408 1400 1400

I can also confirm that the TYR -> PTR still works as before with no issues. Huzzah!

peastman commented 1 month ago

Thanks!