openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
318 stars 92 forks source link

Unable to assign all BCCs in mol - switch to `setPermissiveBCCs` to fall back to AM1 when no BCC available #375

Closed yudongqiu closed 2 years ago

yudongqiu commented 5 years ago

Describe the bug Many molecules in the coverage set are not able to be loaded, probably because the AN1BCC model has missing types.

To Reproduce Files: report.tar.gz

Output

$ python reproduce.py 
Warning: BCIChargeCorrector: BCI charge corrections for mol 
Warning:                     missing bcc parameter for the O3-O5 bond (BCIType 310931)
Warning: OEBCCPartialCharges: Unable to assign all BCCs in mol 
Warning: OESEmpBCCPartialCharges: Stopping the charging process at the first stage for mol 
Traceback (most recent call last):
  File "reproduce.py", line 10, in <module>
    system = forcefield.create_openmm_system(topology)
  File "/home/qyd/codes/anaconda3/envs/openff/lib/python3.7/site-packages/openforcefield/typing/engines/smirnoff/forcefield.py", line 1139, in create_openmm_system
    parameter_handler.create_force(system, topology, **kwargs)
  File "/home/qyd/codes/anaconda3/envs/openff/lib/python3.7/site-packages/openforcefield/typing/engines/smirnoff/parameters.py", line 2466, in create_force
    temp_mol.compute_partial_charges_am1bcc(toolkit_registry=toolkit_registry)
  File "/home/qyd/codes/anaconda3/envs/openff/lib/python3.7/site-packages/openforcefield/topology/molecule.py", line 2014, in compute_partial_charges_am1bcc
    self
  File "/home/qyd/codes/anaconda3/envs/openff/lib/python3.7/site-packages/openforcefield/utils/toolkits.py", line 3006, in call
    return method(*args, **kwargs)
  File "/home/qyd/codes/anaconda3/envs/openff/lib/python3.7/site-packages/openforcefield/utils/toolkits.py", line 1258, in compute_partial_charges_am1bcc
    raise Exception('Unable to assign charges')
Exception: Unable to assign charges

Computing environment (please complete the following information):

Additional context For this molecule, the O-O bond is causing the error if I understand correctly. However, many other molecules also have errors when loading. Do we want to fix them one by one or there is a better way to fix them all?

Screenshot for the reported molecule: image

davidlmobley commented 5 years ago

Well... that's an issue. Ugh. When you say "many" don't work, how many is it?

Can you see what happens if you use antechamber on this?

It'd also be very helpful if you put SMILES strings for molecules in such issues, rather than a 3D-ish graphic. I can't see bond orders or formal charges in your 3D graphics so I don't know what molecule we're talking about without downloading the dataset, whereas a SMILES string makes it immediately obvious.

yudongqiu commented 5 years ago

Among the ~500 optimizations I was able to download from the server, there are 42 of them currently having trouble loading. Only a few of them have the same "ERROR: Unable to assign charges", but many others have a different error message "ProperTorsionHandler was not able to find parameters for the following valence terms"

Is there an example I can follow to use antechamber, for the partial charge assignment I guess?

The canonical SMILES string for the above molecule is CC(=O)O[O-]

jchodera commented 5 years ago

Great bug report, @yudongqiu! It's actually kind of awesome that SMIRNOFF is no longer the limiting factor in chemical coverage, but presumably the MMFF types in Omega or the BCCs.

jchodera commented 5 years ago

Is there an example I can follow to use antechamber, for the partial charge assignment I guess?

This is a bit of a hack---there's a better-controlled way to do this that involves constructing a ToolkitRegistry and passing it to each of the functions you call---but this will likely work for testing:

from openforcefield.utils.toolkits import GLOBAL_TOOLKIT_REGISTRY
GLOBAL_TOOLKIT_REGISTRY._toolkits.pop(0)

You should see

<openforcefield.utils.toolkits.OpenEyeToolkitWrapper object at 0x11046a518>

as the OpenEye toolkit is removed.

Subsequent calls should only use RDKit/AmberTools.

yudongqiu commented 5 years ago

On the bright side, we still got 484/504 parameters covered with the currently available OptimizationDatasets for the Roche set and the coverage set.

davidlmobley commented 5 years ago

For the record, @jmaat commented that this is about 9 of the 70ish molecules (before protonation state/tautomer enumeration) with this issue.

davidlmobley commented 5 years ago

@yudongqiu can you provide a list of the ones (parameters) which are missing, ideally (but not necessarily) with the molecules in which they were supposed to occur? I'd be interested to take a manual look at them -- it's possible that the missing BCCs are not necessarily missing for exactly the same chemical reasons so I could find supplemental molecules to cover them without triggering the missing BCCs. I'm uncertain.

Still, 484/504 is quite good. :)

davidlmobley commented 5 years ago

@yudongqiu :

but many others have a different error message "ProperTorsionHandler was not able to find parameters for the following valence terms"

Can I get these examples and the missing valence terms? It seems likely that after protonation state/tautomer enumeration, we've managed to hit some chemistry we do not cover and I (and #striketeam) should look at it to see what it is/whether we should be covering it.

It would be great if you could spin this out into a separate issue as it's a different problem.

davidlmobley commented 5 years ago

@cbayly13 notes that there may be a way to be "permissive" in assigning BCCs, which is to say that if there is no BCC available, it sticks with just AM1. That's what he had originally intended, and there may be a low level option in the charge engines to allow it. Probably this is what we would want to utilize, if present -- it would allow us to proceed to cover otherwise unusual chemistry and, eventually, learn new BCCs, while at present we'd just have to skip it.

davidlmobley commented 5 years ago

Ah! He says it can be set. Pulling in info from him in Slack. See the "setPermissiveBCCs" option here: https://docs.eyesopen.com/toolkits/python/quacpactk/OEProtonClasses/OEAM1BCCCharges.html

That's what we want

davidlmobley commented 5 years ago

But... we may have to ask support about this, because it's an option to OEAM1BCCCharges but we're doing ELF10 charges, which is a composite of engines. Chris suggests e-mailing support. I'll e-mail support.

j-wags commented 5 years ago

I wonder if a similar approach to #346 could work here. There, we caught a specific error thrown by OEAM1BCCELF10 charge calculation, and would downgrade to OEAM1BCC in the case that the specific problem arose. That PR is still open, so I could also catch this error. What do you think @davidlmobley and @yudongqiu?

j-wags commented 5 years ago

The PR for the above is #385

davidlmobley commented 5 years ago

@j-wags let's do that for now, and then I'll get back to you if support has a better option. Chris Bayly thinks there should be a way to get OEAM1BCCELF10 charges that utilize this fallback approach, but it's undocumented/will take some wrangling and so we need to wait for support.

But your approach should be great as a first pass solution and will get these cases working.

davidlmobley commented 5 years ago

So I haven't actually found a case among Yudong's yet where there is "good chemistry" but a missing BCC. So maybe we should put this on hold until we have an example with good chemistry to test on.

However, I did hear back from OpenEye support as to the right way to do this:

Unfortunately there is not method in the OEAM1BCCELF10Charges charge engine that allows for setting permissive BCCs. However, there is a workaround. Instead of using the convenient above-mentioned charge engine, you can take a more round about way (see snippet below). This workaround uses the OEELFCharges() charge engine, which is instantiated with the OEAM1BCCCharges() charge engine that has SetPermissiveBCCs(True). However, with OEELFCharges(), the default conditions for setting the number of diverse conformers that are selected based on the total number of conformers in the input molecule are not quite the same as with OEAM1BCCELF10Charges(). So the code snippet reflects making those adjustments to the OEELFCharges() charge engine as well. Let me know if you have any further questions or issues.

Best regards,

Luke OpenEye Customer Support


am1bcc = oequacpac.OEAM1BCCCharges()
am1bcc.SetPermissiveBCCs(True)

chargeEngine = oequacpac.OEELFCharges(am1bcc) chargeEngine.SetPercentage(10) chargeEngine.SetLimit(10)

oequacpac.OEAssignCharges(mol, chargeEngine)

davidlmobley commented 5 years ago

@j-wags I've tagged this as "priority:low" until we have an example where there is good chemistry but a missing BCC (I believe I've seen such examples in the past, but they are rare). I'm not that enthusiastic about removing barriers to people setting up molecules with bad chemistry.

I expect we WILL come across such examples eventually but I don't think there are any in the coverage set.

jchodera commented 5 years ago

I'm not that enthusiastic about removing barriers to people setting up molecules with bad chemistry.

We certainly want to be able to parameterize these molecules, though! And "nearly useful coverage" is one of our main selling points, so we don't want to handicap ourself here.

davidlmobley commented 5 years ago

We certainly want to be able to parameterize these molecules, though! And "nearly useful coverage" is one of our main selling points, so we don't want to handicap ourself here.

@jchodera what I'm saying is, if someone tries to simulate a molecule and/or protonation state which could never exist, we have at least two options: 1) Tell them it's chemistry which could never exist and stop, or 2) Extend the toolkit so it will allow people to blindly plunge ahead and produce useless results

The first of these sounds more useful to me.

Thus, until we have an example where we ACTUALLY HAVE missing BCCs for chemistry which is reasonable/should exist, it's more important that we have "sanity checks" for incoming molecules (see #61 ) than that we fix this issue.

Right now, we do not have a test case to test a fix to this issue -- that is to say, we don't have a molecule which we SHOULD be able to assign parameters to which we cannot parameterize because of a missing BCC. All we have are molecules which cannot exist and therefore we don't care about parameterizing.

davidlmobley commented 5 years ago

To say that another way, I see our goal is to be able to parameterize all (organic) compounds anyone might reasonably want to consider. We aren't particularly concerned about our coverage of chemical space that could never possibly exist -- we have our hands full enough covering chemical space which does or could reasonably exist without having to worry about, say, carbon with six bonds, or nitrogen with five bonds, or...

jchodera commented 5 years ago

Do we want to have a "safe mode" for default application, and an "unsafe mode" for maximum coverage?

davidlmobley commented 5 years ago

Do you have a use case example of where you would want the unsafe mode?

mattwthompson commented 2 years ago

This molecule can be assigned charges in the current infrastructure, either by using the ELF10 variant with OpenEye or the RDKit/AmberTools code path. The specific issue of using non-ELF10 AM1BCC (i.e. oequacpac.OEAM1BCCCharges) remains and seems to be an actual behavior issue with the underlying toolkit (?).

from openff.toolkit.typing.engines.smirnoff import ForceField
from openff.toolkit.topology import Molecule
from openff.toolkit.topology import Topology
from openff.toolkit.utils.exceptions import UnassignedBondParameterException
from openff.toolkit.utils import (
    ToolkitRegistry,
    RDKitToolkitWrapper,
    AmberToolsToolkitWrapper,
    OpenEyeToolkitWrapper,
)

molecule = Molecule.from_file("27_C2H3O3.mol2")

forcefield = ForceField("smirnoff99Frosst_experimental.offxml")

try:
    system = forcefield.create_openmm_system(molecule.to_topology())
except UnassignedBondParameterException:
    print("Still can't parameterize that O-O bond")

# These cases each work
molecule.assign_partial_charges(
    partial_charge_method="am1bccelf10",
    toolkit_registry=OpenEyeToolkitWrapper(),
)
molecule.assign_partial_charges(
    partial_charge_method="am1bcc",
    toolkit_registry=ToolkitRegistry([RDKitToolkitWrapper(), AmberToolsToolkitWrapper()]),
)

# This one still fails - but is somewhat obtuse
molecule.assign_partial_charges(
    partial_charge_method="am1bcc",
    toolkit_registry=OpenEyeToolkitWrapper(),
)
Still can't parameterize that O-O bond
Traceback (most recent call last):
  File "reproduce.py", line 22, in <module>
    molecule.assign_partial_charges(
  File "/Users/mwt/software/openforcefield/openff/toolkit/topology/molecule.py", line 3648, in assign_partial_charges
    toolkit.assign_partial_charges(
  File "/Users/mwt/software/openforcefield/openff/toolkit/utils/openeye_wrapper.py", line 2089, in assign_partial_charges
    raise ChargeCalculationError(
openff.toolkit.utils.exceptions.ChargeCalculationError: Unable to assign charges: Warning: BCIChargeCorrector: BCI charge corrections for mol
Warning:                     missing bcc parameter for the O3-O5 bond (BCIType 310931)
Warning: Unable to assign all BCCs in

This issue is old without movement and advocating for some out-of-spec behavior so I'm going to close it. Presumably this idea can be revived if somebody wants to pitch this behavior (falling back from AM1BCC to just AM1 in cases of strange enough chemistry that no BCCs are found) as a spec change.