Open Yoshanuikabundi opened 1 month ago
That's an interesting corner case I hadn't thought about. PDBFixer.addSolvent()
is mostly just a thin wrapper around Modeller.addSolvent()
:
Notice that Modeller lets you pass in a force field to use for identifying charges, but PDBFixer generates its own force field, which of course only knows about standard residues. Here are a few possibilities.
PDBFixer.addSolvent()
, but pass your own force field instead of having it generate one.PDBFixer.addSolvent()
to let you pass in a force field to use._createForceField()
generate charges for nonstandard residues based on their formal charges.A complication with the third one is that it often doesn't know what the formal charges are. Registering a template is only required if you're going to add new atoms to a residue. If all you're doing is preserving a molecule from the input file, it will happily copy it over even if it has no idea what that molecule is. It doesn't need to know anything about it... except in this case, it kind of does because it affects how many ions get added.
Thanks @peastman, that clarifies the issues really well for me! I might take a crack at option 3 in a fork - would you accept a PR if I can come up with a good solution?
If you can come up with a good solution, sure, but let's discuss it first. What do you have in mind for figuring out the charges?
If we can't figure out anything robust, I'd vote for the second one. It's easy to implement, and if you plan to simulate the system with OpenMM, you presumably have a force field that can parameterize it.
I'd be happy to try using @Yoshanuikabundi's fork as a way to evaluate implementing Peter's option 3 via either taking formal charges from the CCD, or using the MDAnalysis chemical information guesser at scale. These would both depend on the user knowing their protonation state beforehand, so users could either use the already-implemented protonation functionality, or they could come with fully-protonated inputs.
Unfortunately, Peter's options 1 and 2 would hit a circular dependency in SMIRNOFF force field land. We can't make a FF for a molecule without already knowing its formal charges.
I think that's orthogonal to this issue? This is only about how to determine the total charge for purposes of adding counter ions. Parameterizing the system is much more complicated. And if you have a solution for parameterizing, then allowing a force field to be passed to addSolvent()
should handle this case.
Option 2 would be fine if we had an OpenMM force field that could charge, say, anything in the PDB... but unfortunately we don't. And as you point out, if we did have an appropriate force field we could just use Modeller to add solvent. The createForceField
machinery in PDBFixer is part of what makes it such a flexible tool, which is why I think the best path forward is to improve its ability to assign a wide variety of correct formal charges (option 3). I'll also point out that PDBFixer is a tool that can be used for preparing systems for any MD engine, and it would be a shame to tie its full functionality to any specific force field format - in the current API, this relationship is entirely implicit.
I'd propose adding a method to the PDBFixer
class that uses the existing machinery for collecting CCD templates to assign formal charges to all atoms. Formal charges could be stored as a list[int|None]
on the PDBFixer
object, or as a dict[int, int]
if we expect formal charges to be sparsely populated and the size of the list to be an issue. This would also let users manually set formal charges for unknown ligands or alternate protonation states.
Then createForceField
could use those formal charges when assigning its charges. This would avoid any unnecessary CCD lookup for users who do not require formal charges or solvation. When the CCD is looked up, formal charges could be set at that time to avoid a future duplicate lookup. createForceField
could issue a warning if it has any ligands with unknown formal charges, or it could call the new API point to determine charges.
I think @j-wags point is that OpenFF does have a solution for parametrizing, but (a) it isn't in the form of an OpenMM force field, and (b) it requires formal charges.
I'm not familiar enough with the new code to know off the top of my head how much of a refactor the above is, but I'm happy to experiment in a fork!
Feel free to give it a try and see if you can work out something robust. Doing it based on the templates will have two main issues you have to overcome. The first, as noted above, is that we often don't have a template. Ligands may not be in the CCD, or the name may be nonstandard so we don't know what it is. The other is that formal charges depend on protonation. For example, here's the CCD template for LYS. Formal charge is the 5th column.
LYS N N N 0 1 N N N Y Y N 37.577 40.385 -3.968 1.422 1.796 0.198 N LYS 1
LYS CA CA C 0 1 N N S Y N N 38.631 39.459 -4.356 1.394 0.355 0.484 CA LYS 2
LYS C C C 0 1 N N N Y N Y 38.094 38.304 -5.212 2.657 -0.284 -0.032 C LYS 3
LYS O O O 0 1 N N N Y N Y 36.873 38.235 -5.490 3.316 0.275 -0.876 O LYS 4
LYS CB CB C 0 1 N N N N N N 39.374 38.919 -3.139 0.184 -0.278 -0.206 CB LYS 5
LYS CG CG C 0 1 N N N N N N 38.523 38.111 -2.181 -1.102 0.282 0.407 CG LYS 6
LYS CD CD C 0 1 N N N N N N 39.164 36.749 -1.903 -2.313 -0.351 -0.283 CD LYS 7
LYS CE CE C 0 1 N N N N N N 38.106 35.761 -1.382 -3.598 0.208 0.329 CE LYS 8
LYS NZ NZ N 1 1 N N N N N N 37.176 36.546 -0.539 -4.761 -0.400 -0.332 NZ LYS 9
LYS OXT OXT O 0 1 N Y N Y N Y 38.961 37.678 -5.886 3.050 -1.476 0.446 OXT LYS 10
LYS H H H 0 1 N N N Y Y N 37.933 41.152 -3.399 1.489 1.891 -0.804 H LYS 11
LYS H2 HN2 H 0 1 N Y N Y Y N 36.812 39.900 -3.498 0.521 2.162 0.464 H2 LYS 12
LYS HA HA H 0 1 N N N Y N N 39.352 40.037 -4.979 1.322 0.200 1.560 HA LYS 13
LYS HB2 1HB H 0 1 N N N N N N 40.262 38.326 -3.460 0.210 -0.047 -1.270 HB2 LYS 14
LYS HB3 2HB H 0 1 N N N N N N 39.882 39.750 -2.596 0.211 -1.359 -0.068 HB3 LYS 15
LYS HG2 1HG H 0 1 N N N N N N 38.317 38.670 -1.238 -1.128 0.050 1.471 HG2 LYS 16
LYS HG3 2HG H 0 1 N N N N N N 37.474 38.007 -2.546 -1.130 1.363 0.269 HG3 LYS 17
LYS HD2 1HD H 0 1 N N N N N N 39.701 36.351 -2.795 -2.287 -0.120 -1.348 HD2 LYS 18
LYS HD3 2HD H 0 1 N N N N N N 40.034 36.831 -1.210 -2.285 -1.432 -0.145 HD3 LYS 19
LYS HE2 1HE H 0 1 N N N N N N 37.593 35.194 -2.194 -3.625 -0.023 1.394 HE2 LYS 20
LYS HE3 2HE H 0 1 N N N N N N 38.544 34.882 -0.854 -3.626 1.289 0.192 HE3 LYS 21
LYS HZ1 1HZ H 0 1 N N N N N N 36.474 35.891 -0.193 -4.736 -0.185 -1.318 HZ1 LYS 22
LYS HZ2 2HZ H 0 1 N N N N N N 37.644 37.064 0.203 -4.735 -1.400 -0.205 HZ2 LYS 23
LYS HZ3 3HZ H 0 1 N N N N N N 36.774 37.350 -1.021 -5.609 -0.031 0.071 HZ3 LYS 24
LYS HXT HXT H 0 1 N Y N Y N Y 38.628 36.963 -6.415 3.861 -1.886 0.115 HXT LYS 25
It lists a +1 charge on NZ, and no charges anywhere else. But at high pH NZ can be neutral, and terminal residues can be charged. So you can't assume the charges listed in the template are correct for the current model.
The
addSolvent
method uses a force field to assign charges, which it then uses to add ions that neutralise the net charge of the box. Unfortunately, the force field seems to assign formal charges of zero to many charged groups in small molecules; for example the citrate ion, which has a -3 formal charge:The
citrate.pdb
file generated above is available here: citrate.pdb.zipWith the current
main
branch (commit 92044f3), you can also see this behavior with a PDB file like 1PO0, which has 2 citrate ions (each with -3 formal charge).It would be great if the correct formal charges could be stored when the template is downloaded! Then they could be used when neutralising the system. This might also give an opportunity for users to manually specify the formal charges of
UNK
residues.Thanks!
(also the ACE and NME residues in the second example above don't seem to be getting hydrogens - let me know if you'd like me to open a second issue for that!)