mosdef-hub / foyer

A package for atom-typing as well as applying and disseminating forcefields
https://foyer.mosdef.org
MIT License
118 stars 77 forks source link

OPLS molecule net charge #430

Closed mphoward closed 3 years ago

mphoward commented 3 years ago

Bug summary

I tried to apply OPLS-AA to a molecule that looks similar to PEGDA:

CC(=O)OCCOC(=O)C

I also attach a PDB file of the structure that I created pegdaish.zip.

Initially, I obtained the error

FoyerError: Found multiple types for atom 4 (6): ['opls_182', 'opls_468'].

when I used the builtin OPLS force field (#427). When I corrected the XML file as proposed there, the atoms successfully typed, but the molecule had a net charge. This likely indicates some of the types were not assigned correctly.

Code to reproduce the behavior

mol = mb.load('pegdaish.pdb')
ff = foyer.Forcefield(name='oplsaa')
s = ff.apply(mol)
net_charge = sum([a.charge for a in s.atoms])
print(net_charge)

Software versions

rsdefever commented 3 years ago

I think the problem that you are getting opls_182 instead of opls_468 on the methoxy C. AFAIK, something strange is happening here. The atom types I'm getting are:

['opls_135', 'opls_465', 'opls_466', 'opls_467', 'opls_182', 'opls_182', 'opls_467', 'opls_465', 'opls_466', 'opls_135', 'opls_140', 'opls_140', 'opls_140', 'opls_185', 'opls_185', 'opls_185', 'opls_185', 'opls_140', 'opls_140', 'opls_140']

But the definition of opls_182 is:

Type name="opls_182" class="CT" element="C" mass="12.01100" def="[C;X4]([O;%opls_180])(H)(H)"

This should need to be bonded to opls_180; however, it clearly isn't.

@justinGilmer @daico007 any ideas?

mphoward commented 3 years ago

Thanks! I agree that this seems funny based on the SMARTS strings and was where I was poking around. If opls_468 is a methoxy C, that would need to be in a CH3 bonded to an O, right? This molecule should not have any of those, since the end group methyls are bonded to carbons. Perhaps could the problem be the other way around, and the carbon is being typed correctly but the others are not (e.g., opls_465, opls_466, and opls_467)?

CalCraven commented 3 years ago

The carbon opls_182 has to be incorrect based on its smarts string, since opls_180 is not present in the molecule. Since that smarts string for 468 was so general, there's a chance it was defined in a way that covered a carbon of the above type regardless of being technically a "methoxy" and that was just used as the naming convention. Here's the source for where that type is defined gromacs and we should go look and see the paper that is sourced from there to see the precise application of that type. If opls_468 is only used as a methoxy carbon, then we'll need to add a new type that covers this case to foyer's xml. The smarts string should be easy, [C;X4]([O;%opls_467])([O;%opls_467])(H)(H), but making sure we find the right opls type is the trickier part.

Beyond all of that, there is still the issue of how the heck is opls_182 even being matched to this in the first place.

daico007 commented 3 years ago

I worked with Justin on this issue for a little bit and I think the non-zero charge is because we are missing the atomtype for the ethoxy carbon. We just made a commit to #429 to add the smarts string for opls_490. opls_490 has a weird description in the gromacs atp file , but its parameters matched with that of ethoxy carbon from Table 1 in this paper. This addition solved the non-zero charge issue of those ester molecules we tried locally and the pegda molecule from above. Can you give that PR a try and see if it works as expected for your system?

mphoward commented 3 years ago

Thank you both for looking into this!! I definitely agree based on the paper and gromacs description that the ethoxy C should be opls_490, and that would fix the charge issue. However, I think some of the atoms in the pegda test case are still not typed quite right?

By the same paper, I think you want to use 469 ("methoxy H in esters") instead of 185 for any alpha-alkoxy H (468, 490, and probably 491 and 492 too)? The gromacs parameters match the ones that are reported in the paper. The charges of 185 and 469 are the same, but their LJ parameters are different.

Also, I am wondering if the H on the CH3 group should be 282 rather than 140? The gromacs notes for 465 say to use 280-282 for R groups on C=O. 280 says to use 135-139 for C-alpha (so 135 is correct for C), but 282 is for H on the C-alpha. It seems like that should take precedence over 140, and again, same charges but slightly different LJ parameters.

I did run this on my system for the pegda molecule and confirmed the neutral charge and types, but I will hold off on running on my full system until I hear back with your opinion on these two atom types.

daico007 commented 3 years ago

Good catch @mphoward and I am totally agree with you on this! I have updated the SMARTS string of the opls_490 in PR #429 and it seems to fix this issue. I will try to add a test or two for these chemistries in PR #429, and it will ready to be reviewed/merged.

daico007 commented 3 years ago

For the case of opls_282, I still working on how to encode the info to its SMARTS string, but I will let you all know when I figure that part out.

daico007 commented 3 years ago

@rsdefever I think we will need to do some digging on the issue you found. It might have to do with how we parse SMARTS grammar and the order we evaluate each rule(?).

justinGilmer commented 3 years ago

I think the problem that you are getting opls_182 instead of opls_468 on the methoxy C. AFAIK, something strange is happening here. The atom types I'm getting are:

['opls_135', 'opls_465', 'opls_466', 'opls_467', 'opls_182', 'opls_182', 'opls_467', 'opls_465', 'opls_466', 'opls_135', 'opls_140', 'opls_140', 'opls_140', 'opls_185', 'opls_185', 'opls_185', 'opls_185', 'opls_140', 'opls_140', 'opls_140']

But the definition of opls_182 is:

Type name="opls_182" class="CT" element="C" mass="12.01100" def="[C;X4]([O;%opls_180])(H)(H)"

This should need to be bonded to opls_180; however, it clearly isn't.

@justinGilmer @daico007 any ideas?

Im not entirely sure what might have led to that issue, but I wonder if it is related to the number of elements to match to?

[C;X4]([O;%opls_180])(H)(H) is missing its 4th neighbor to match to. Normally this should default to a wildcard match iirc. Im not sure if that would have still matched, but my current hypothesis is that since opls_180 is very generic, just [O;X2](C)C, and the %atom_type matching simply checks the whitelist for each node to determine if there is a match. As we build up the system, we iterate through the atom typing process multiple times, with each time using the new state of the topology graph.

So I think it is quite possible that opls_180 was in the atomtype whitelists at some point, but as we iterate through the matching process, we slowly build up both our white and blacklists. It might have just been an unlucky coincidence that 182 matched at some point, but there was a blacklist that removed opls_180 at some point. But i am a bit shaky on saying that definitively since I am not 100% sure.

justinGilmer commented 3 years ago

@mphoward we have made some additional edits to #429 which should handle these cases you have brought. Let us know how those work for your test molecules!

mphoward commented 3 years ago

I ran this forcefield on my actual polymer, and it is (1) charge neutral and (2) appears to be typed as I would have done by hand!

justinGilmer commented 3 years ago

That is excellent to hear! Also, if you run the apply method with the references_file option set to a filename you want to write to, it should generate a bibtex file citing the parameters you used as well.