Closed bannanc closed 7 years ago
@bannanc - can you clarify "I don't think we can keep the hydrogen inside the generic"? Do you just mean that you don't think hydrogen can continue being treated as part of a generic?
Do you just mean that you don't think hydrogen can continue being treated as part of a generic?
Yes, as in they seem like two complete separate torsions for the C-O-O-H and C-O-O-C in GAFF2, I don't think we can reasonably merge them into one generic *-O-O-*
But, @bannanc , I think I agree with your proposal -- Chris did a lot of rough "binning" of things based on size, and here I'd say there are roughly two bins -- "small" (around 0.4) and "not big" (around 1), and, unless we generate our own data to say otherwise, we should just keep oh
and os
lumped together in the generic case. So that leads me to something similar to where you ended up with this one style question:
Style: I tend to not like reading torsions which look like [*:1]-[#8:2]-[#8:3]-[#1:4]
as it looks to me like, er, half a generic -- that is to say it's a generic on one end, but hydrogen-specific on the other, which bothers me, aesthetically (I'm used to AMBER-style which either has X on both ends or specific atom types on both ends; there is never an X just on one end). Do you have a strong preference for this over [*:1]-[#8:2]-[#8H1:3]-[*:4]
which is basically equivalent? However, I don't feel that strongly about it so if you have a strong preference or you think there's a chemical reason to have things the way you have them I can go with it. :)
But wait, aren't you missing X -os-os-X
? Particularly, GAFF has this with a "not big" barrier (1.0), whereas in yours it has the "small" barrier (0.4). On the other hand, maybe these are good to merge unless we have the data to split them. Is that what you're thinking?
I do agree that the hydrogen can't be merged. For one thing, there is different multiplicity for the ones containing hydrogen...
(Note to self: At first I thought you were missing something like the X -oh-os-X
case, but then I realized that the first X has to be hydrogen of some type, since oh
is an oxygen attached to a hydrogen, so really you've subsumed this into your second SMIRKS pattern. Clever.)
But wait, aren't you missing X -os-os-X? Particularly, GAFF has this with a "not big" barrier (1.0), whereas in yours it has the "small" barrier (0.4). On the other hand, maybe these are good to merge unless we have the data to split them. Is that what you're thinking?
Yes, I chose to skip that one. The Xs are almost always going to be carbon. I didn't want to introduce more parameters than we have to
Also, I'm happy to do the mirrored generic in this case, though we definitely have angles and torsions with hygrogens on one side. The H1 only works because it is divalent, anything else and you have to specify the #1
explicitly.
[*:1]-[#8:2]-[#8:3]-[*:4] 1 0.4 0.000 1
[*:1]-[#8:2]-[#8H1:3]-[*:4] 1 1.0 0.000 2
Yes, I agree that in general we can't avoid asymmetric generics. I'll just have to get used to things not looking AMBER-ish.
Added in pull request #43
I'm pull relevante Torsion parameters from GAFF2:
My vote is that we keep the SMIRKS patterns as generic as possible so the the added parameters could cover the "weird" chemical space included in DrugBank, such as
C-O-O-O-H
,N-O-O-H
, etc. That being said this is my suggestion:@davidlmobley Thoughts? I don't think we can keep the hydrogen inside the generic...