Closed yudongqiu closed 2 years ago
Well... that's an issue. Ugh. When you say "many" don't work, how many is it?
Can you see what happens if you use antechamber
on this?
It'd also be very helpful if you put SMILES strings for molecules in such issues, rather than a 3D-ish graphic. I can't see bond orders or formal charges in your 3D graphics so I don't know what molecule we're talking about without downloading the dataset, whereas a SMILES string makes it immediately obvious.
Among the ~500 optimizations I was able to download from the server, there are 42 of them currently having trouble loading. Only a few of them have the same "ERROR: Unable to assign charges", but many others have a different error message "ProperTorsionHandler was not able to find parameters for the following valence terms"
Is there an example I can follow to use antechamber
, for the partial charge assignment I guess?
The canonical SMILES string for the above molecule is CC(=O)O[O-]
Great bug report, @yudongqiu! It's actually kind of awesome that SMIRNOFF is no longer the limiting factor in chemical coverage, but presumably the MMFF types in Omega or the BCCs.
Is there an example I can follow to use antechamber, for the partial charge assignment I guess?
This is a bit of a hack---there's a better-controlled way to do this that involves constructing a ToolkitRegistry
and passing it to each of the functions you call---but this will likely work for testing:
from openforcefield.utils.toolkits import GLOBAL_TOOLKIT_REGISTRY
GLOBAL_TOOLKIT_REGISTRY._toolkits.pop(0)
You should see
<openforcefield.utils.toolkits.OpenEyeToolkitWrapper object at 0x11046a518>
as the OpenEye toolkit is removed.
Subsequent calls should only use RDKit/AmberTools.
On the bright side, we still got 484/504 parameters covered with the currently available OptimizationDatasets for the Roche set and the coverage set.
For the record, @jmaat commented that this is about 9 of the 70ish molecules (before protonation state/tautomer enumeration) with this issue.
@yudongqiu can you provide a list of the ones (parameters) which are missing, ideally (but not necessarily) with the molecules in which they were supposed to occur? I'd be interested to take a manual look at them -- it's possible that the missing BCCs are not necessarily missing for exactly the same chemical reasons so I could find supplemental molecules to cover them without triggering the missing BCCs. I'm uncertain.
Still, 484/504 is quite good. :)
@yudongqiu :
but many others have a different error message "ProperTorsionHandler was not able to find parameters for the following valence terms"
Can I get these examples and the missing valence terms? It seems likely that after protonation state/tautomer enumeration, we've managed to hit some chemistry we do not cover and I (and #striketeam) should look at it to see what it is/whether we should be covering it.
It would be great if you could spin this out into a separate issue as it's a different problem.
@cbayly13 notes that there may be a way to be "permissive" in assigning BCCs, which is to say that if there is no BCC available, it sticks with just AM1
. That's what he had originally intended, and there may be a low level option in the charge engines to allow it. Probably this is what we would want to utilize, if present -- it would allow us to proceed to cover otherwise unusual chemistry and, eventually, learn new BCCs, while at present we'd just have to skip it.
Ah! He says it can be set. Pulling in info from him in Slack. See the "setPermissiveBCCs" option here: https://docs.eyesopen.com/toolkits/python/quacpactk/OEProtonClasses/OEAM1BCCCharges.html
That's what we want
But... we may have to ask support about this, because it's an option to OEAM1BCCCharges
but we're doing ELF10 charges, which is a composite of engines. Chris suggests e-mailing support. I'll e-mail support.
I wonder if a similar approach to #346 could work here. There, we caught a specific error thrown by OEAM1BCCELF10
charge calculation, and would downgrade to OEAM1BCC
in the case that the specific problem arose. That PR is still open, so I could also catch this error. What do you think @davidlmobley and @yudongqiu?
The PR for the above is #385
@j-wags let's do that for now, and then I'll get back to you if support has a better option. Chris Bayly thinks there should be a way to get OEAM1BCCELF10 charges that utilize this fallback approach, but it's undocumented/will take some wrangling and so we need to wait for support.
But your approach should be great as a first pass solution and will get these cases working.
So I haven't actually found a case among Yudong's yet where there is "good chemistry" but a missing BCC. So maybe we should put this on hold until we have an example with good chemistry to test on.
However, I did hear back from OpenEye support as to the right way to do this:
Unfortunately there is not method in the OEAM1BCCELF10Charges charge engine that allows for setting permissive BCCs. However, there is a workaround. Instead of using the convenient above-mentioned charge engine, you can take a more round about way (see snippet below). This workaround uses the OEELFCharges() charge engine, which is instantiated with the OEAM1BCCCharges() charge engine that has SetPermissiveBCCs(True). However, with OEELFCharges(), the default conditions for setting the number of diverse conformers that are selected based on the total number of conformers in the input molecule are not quite the same as with OEAM1BCCELF10Charges(). So the code snippet reflects making those adjustments to the OEELFCharges() charge engine as well. Let me know if you have any further questions or issues.
Best regards,
Luke OpenEye Customer Support
am1bcc = oequacpac.OEAM1BCCCharges() am1bcc.SetPermissiveBCCs(True)
chargeEngine = oequacpac.OEELFCharges(am1bcc) chargeEngine.SetPercentage(10) chargeEngine.SetLimit(10)
oequacpac.OEAssignCharges(mol, chargeEngine)
@j-wags I've tagged this as "priority:low" until we have an example where there is good chemistry but a missing BCC (I believe I've seen such examples in the past, but they are rare). I'm not that enthusiastic about removing barriers to people setting up molecules with bad chemistry.
I expect we WILL come across such examples eventually but I don't think there are any in the coverage set.
I'm not that enthusiastic about removing barriers to people setting up molecules with bad chemistry.
We certainly want to be able to parameterize these molecules, though! And "nearly useful coverage" is one of our main selling points, so we don't want to handicap ourself here.
We certainly want to be able to parameterize these molecules, though! And "nearly useful coverage" is one of our main selling points, so we don't want to handicap ourself here.
@jchodera what I'm saying is, if someone tries to simulate a molecule and/or protonation state which could never exist, we have at least two options: 1) Tell them it's chemistry which could never exist and stop, or 2) Extend the toolkit so it will allow people to blindly plunge ahead and produce useless results
The first of these sounds more useful to me.
Thus, until we have an example where we ACTUALLY HAVE missing BCCs for chemistry which is reasonable/should exist, it's more important that we have "sanity checks" for incoming molecules (see #61 ) than that we fix this issue.
Right now, we do not have a test case to test a fix to this issue -- that is to say, we don't have a molecule which we SHOULD be able to assign parameters to which we cannot parameterize because of a missing BCC. All we have are molecules which cannot exist and therefore we don't care about parameterizing.
To say that another way, I see our goal is to be able to parameterize all (organic) compounds anyone might reasonably want to consider. We aren't particularly concerned about our coverage of chemical space that could never possibly exist -- we have our hands full enough covering chemical space which does or could reasonably exist without having to worry about, say, carbon with six bonds, or nitrogen with five bonds, or...
Do we want to have a "safe mode" for default application, and an "unsafe mode" for maximum coverage?
Do you have a use case example of where you would want the unsafe mode?
This molecule can be assigned charges in the current infrastructure, either by using the ELF10 variant with OpenEye or the RDKit/AmberTools code path. The specific issue of using non-ELF10 AM1BCC (i.e. oequacpac.OEAM1BCCCharges
) remains and seems to be an actual behavior issue with the underlying toolkit (?).
from openff.toolkit.typing.engines.smirnoff import ForceField
from openff.toolkit.topology import Molecule
from openff.toolkit.topology import Topology
from openff.toolkit.utils.exceptions import UnassignedBondParameterException
from openff.toolkit.utils import (
ToolkitRegistry,
RDKitToolkitWrapper,
AmberToolsToolkitWrapper,
OpenEyeToolkitWrapper,
)
molecule = Molecule.from_file("27_C2H3O3.mol2")
forcefield = ForceField("smirnoff99Frosst_experimental.offxml")
try:
system = forcefield.create_openmm_system(molecule.to_topology())
except UnassignedBondParameterException:
print("Still can't parameterize that O-O bond")
# These cases each work
molecule.assign_partial_charges(
partial_charge_method="am1bccelf10",
toolkit_registry=OpenEyeToolkitWrapper(),
)
molecule.assign_partial_charges(
partial_charge_method="am1bcc",
toolkit_registry=ToolkitRegistry([RDKitToolkitWrapper(), AmberToolsToolkitWrapper()]),
)
# This one still fails - but is somewhat obtuse
molecule.assign_partial_charges(
partial_charge_method="am1bcc",
toolkit_registry=OpenEyeToolkitWrapper(),
)
Still can't parameterize that O-O bond
Traceback (most recent call last):
File "reproduce.py", line 22, in <module>
molecule.assign_partial_charges(
File "/Users/mwt/software/openforcefield/openff/toolkit/topology/molecule.py", line 3648, in assign_partial_charges
toolkit.assign_partial_charges(
File "/Users/mwt/software/openforcefield/openff/toolkit/utils/openeye_wrapper.py", line 2089, in assign_partial_charges
raise ChargeCalculationError(
openff.toolkit.utils.exceptions.ChargeCalculationError: Unable to assign charges: Warning: BCIChargeCorrector: BCI charge corrections for mol
Warning: missing bcc parameter for the O3-O5 bond (BCIType 310931)
Warning: Unable to assign all BCCs in
This issue is old without movement and advocating for some out-of-spec behavior so I'm going to close it. Presumably this idea can be revived if somebody wants to pitch this behavior (falling back from AM1BCC to just AM1 in cases of strange enough chemistry that no BCCs are found) as a spec change.
Describe the bug Many molecules in the coverage set are not able to be loaded, probably because the AN1BCC model has missing types.
To Reproduce Files: report.tar.gz
Output
Computing environment (please complete the following information):
conda list
conda_list.txtAdditional context For this molecule, the O-O bond is causing the error if I understand correctly. However, many other molecules also have errors when loading. Do we want to fix them one by one or there is a better way to fix them all?
Screenshot for the reported molecule: