openforcefield / smirnoff99Frosst

A general small molecule force field descended from AMBER99 and parm@Frosst, available in the SMIRNOFF format
Creative Commons Attribution 4.0 International
28 stars 9 forks source link

*-O-O-* Torsion (not in parm99/parm@Frosst) #23

Closed bannanc closed 7 years ago

bannanc commented 7 years ago

I'm pull relevante Torsion parameters from GAFF2:

X -oh-os-X    1    1.600         0.000           2.000
X -os-os-X    1    1.000         0.000           1.000
c3-os-oh-ho   1    1.010         0.000           2          c178 SS AUE=0.2810 RMSE=0.3796 TorType=3
c3-os-os-c3   1    0.380         0.000           1          c187 SS AUE=0.4838 RMSE=0.6593 TorType=3

My vote is that we keep the SMIRKS patterns as generic as possible so the the added parameters could cover the "weird" chemical space included in DrugBank, such as C-O-O-O-H, N-O-O-H, etc. That being said this is my suggestion:

[*:1]-[#8:2]-[#8:3]-[*:4]    1    0.4         0.000           1      
[*:1]-[#8:2]-[#8:3]-[#1:4]   1    1.0         0.000           2          

@davidlmobley Thoughts? I don't think we can keep the hydrogen inside the generic...

davidlmobley commented 7 years ago

@bannanc - can you clarify "I don't think we can keep the hydrogen inside the generic"? Do you just mean that you don't think hydrogen can continue being treated as part of a generic?

bannanc commented 7 years ago

Do you just mean that you don't think hydrogen can continue being treated as part of a generic?

Yes, as in they seem like two complete separate torsions for the C-O-O-H and C-O-O-C in GAFF2, I don't think we can reasonably merge them into one generic *-O-O-*

davidlmobley commented 7 years ago

But, @bannanc , I think I agree with your proposal -- Chris did a lot of rough "binning" of things based on size, and here I'd say there are roughly two bins -- "small" (around 0.4) and "not big" (around 1), and, unless we generate our own data to say otherwise, we should just keep oh and os lumped together in the generic case. So that leads me to something similar to where you ended up with this one style question:

Style: I tend to not like reading torsions which look like [*:1]-[#8:2]-[#8:3]-[#1:4] as it looks to me like, er, half a generic -- that is to say it's a generic on one end, but hydrogen-specific on the other, which bothers me, aesthetically (I'm used to AMBER-style which either has X on both ends or specific atom types on both ends; there is never an X just on one end). Do you have a strong preference for this over [*:1]-[#8:2]-[#8H1:3]-[*:4] which is basically equivalent? However, I don't feel that strongly about it so if you have a strong preference or you think there's a chemical reason to have things the way you have them I can go with it. :)

But wait, aren't you missing X -os-os-X? Particularly, GAFF has this with a "not big" barrier (1.0), whereas in yours it has the "small" barrier (0.4). On the other hand, maybe these are good to merge unless we have the data to split them. Is that what you're thinking?

I do agree that the hydrogen can't be merged. For one thing, there is different multiplicity for the ones containing hydrogen...

(Note to self: At first I thought you were missing something like the X -oh-os-X case, but then I realized that the first X has to be hydrogen of some type, since oh is an oxygen attached to a hydrogen, so really you've subsumed this into your second SMIRKS pattern. Clever.)

bannanc commented 7 years ago

But wait, aren't you missing X -os-os-X? Particularly, GAFF has this with a "not big" barrier (1.0), whereas in yours it has the "small" barrier (0.4). On the other hand, maybe these are good to merge unless we have the data to split them. Is that what you're thinking?

Yes, I chose to skip that one. The Xs are almost always going to be carbon. I didn't want to introduce more parameters than we have to

bannanc commented 7 years ago

Also, I'm happy to do the mirrored generic in this case, though we definitely have angles and torsions with hygrogens on one side. The H1 only works because it is divalent, anything else and you have to specify the #1 explicitly.

[*:1]-[#8:2]-[#8:3]-[*:4]    1    0.4         0.000           1      
[*:1]-[#8:2]-[#8H1:3]-[*:4]   1    1.0         0.000           2   
davidlmobley commented 7 years ago

Yes, I agree that in general we can't avoid asymmetric generics. I'll just have to get used to things not looking AMBER-ish.

bannanc commented 7 years ago

Added in pull request #43