Closed tarungog closed 2 years ago
Hi @tarungog,
Thanks for the reproducing example -- I can generate the same error on my computer.
When I run this, antechamber prints out a particular message to the terminal:
Info: Bond types are assigned for valence state (1) with penalty (1).
Info: Total number of electrons: 713; net charge: 0
Info: The number of electrons is odd (713).
Please check the total charge (-nc flag) and spin multiplicity (-m flag).
My first guess based on this is that some atoms in the SMILES should have a formal charge. My next guess is that this is just a particularly large molecule, and antechamber doesn't like it.
The following would be good next steps:
@j-wags I just checked and there is no formal charge on this molecule.
Tried a much smaller molecule and it still failed:
Exception: Antechamber/sqm partial charge calculation failed on molecule molecule (SMILES [H][O][C]([H])([H])[C]([H])=[C]([H])[C]1=[C]([H])[C]([O][C]([H])([H])[H])=[C]2[O][C]([H])([C]3=[C]([H])[C]([H])=[C]([OH])[C]([O][C]([H])([H])[H])=[C]3[H])[C]([H])([C]([H])([H])[O][H])[C]2=[C]1[H])
Here is the original smiles string of the smaller molecule if you want to try and reproduce it:
molOrig = Chem.MolFromSmiles("COc1cc(C2Oc3c(OC)cc(C=CCO)cc3C2CO)ccc1[O]")
Hi @tarungog,
Interesting. There's something extremely subtle happening here, and it will take some time to put my finger on it. The root cause is probably related to #566, so I'd like to move discussion of long-term solutions to there.
Short-term, I think the following workaround may work for you:
ofmol2 = Molecule.from_smiles("COc1cc(C2Oc3c(OC)cc(C=CCO)cc3C2CO)ccc1[O]",
allow_undefined_stereo=True)
ofmol2 = Molecule.from_smiles(ofmol2.to_smiles(), allow_undefined_stereo=True)
ofmol2.generate_conformers(n_conformers=10)
ofmol2.compute_partial_charges_am1bcc()
ofmol2.to_file('test.sdf', file_format='sdf')
The SMILES round trip forces all the hydrogens to become "really" explicit (I'm not sure that AddHs
made them explicit in exactly the way we want, but I'll continue looking into this in #566)
It'll be very important to check out the resulting SDF and ensure that it yielded the protomer that you intended it to -- Different hydrogen addition methods disagree about the protomeric state of the oxygen on the aromatic ring.
Please let me know if this resolves the issue for you.
Does this mean anything to you?
NameError: name 'topologyobjects' is not defined
when running create_system
Hmm, I haven't seen that before. The Open Force Field Toolkit has Topology
and TopologyMolecule
classes, but I've never seen topologyobject
s. Googling makes me think it's a ParmEd thing: https://parmed.github.io/ParmEd/html/topobj/parmed.topologyobjects.Atom.html
Can I assume that the above workaround resolved the original issue? I'd like to close the issue if so.
@j-wags hard to say if the original question has been 'solved' until I figure out this parmed bug. I'd prefer to leave it open for the time being. I'll get back to this soon!
@j-wags two questions before we can close this topic. Your workaround appears to be performing ok for now (thank you!!!), but it makes a couple of things difficult for my workflow.
(1) Rather than generating conformers on the openff side, my pipeline will generate conformers in RDKit that need to be minimized. With the smiles "gateway", the original RDKit conformers are lost. Can you give me a workaround for this workaround, without sacrificing too much performance?
(2) The first time this line is run on a new molecule of reasonable size (~100 atoms), it takes upwards of 2 minutes walltime and I'm sure it'll grow even longer for my molecules of interest (500-1000 atoms). Why is that? Will I have to simply preload and cache all molecules I'm interested in?
%time system = system_generator.create_system(ofmol.to_topology().to_openmm())
#CPU times: user 772 ms, sys: 47.1 ms, total: 820 ms
#Wall time: 2min 8s
@tarungog I'll speak just to the speed issue and leave the others for Jeff.
Partial charges come from a semiempirical QM calculation (AM1-BCC). While fast, like all QM it doesn't scale particularly well with size, so as you get to larger systems charge assignment takes particularly long. I don't think you'll be able to use it for larger systems (500-1000 atoms).
That means you have three main options as you pass the size of molecules which can be handled with AM1-BCC, I think: 1) Preassign your own charges and bring them in rather than having the toolkit assign them; 2) Define library charges for the building blocks of your large molecules so the toolkit does not have to assign charges or 3) Wait for us to get to support for polymers/biopolymers (months to a year) where we'd have a procedure which identifies building blocks, separates them, caps them, charges them individually and then splices them back together
(Or you could do your own version of (3) and then bring in the charges via option (1).)
Hmm, so double-checking the input, it seems that there's a particular oxygen in the input SMILES that can either be considered to have an implicit hydrogen, or to have a negative charge. It's this one at the end:
COc1cc(C2Oc3c(OC)cc(C=CCO)cc3C2CO)ccc1 [O]
The square brackets around it have a special meaning. Per the SMILES specification (sec 3.2.1):
Within brackets, any attached hydrogens and formal charges must always be specified.
So, we put RDKit in a pickle by running AddHs
on this. The input SMILES specifies "this is an oxygen with one single bond to it, no hydrogens, and no formal charge". Thus, RDKit fails to turn this into a chemically sensible molecule. Removing the "no hydrogens" or "no charge" constraint yields a molecule that runs in OpenFF without issue:
I can fix this either by specifying that the O should have an H added to fill its valence requirement, by removing the brackets:
COc1cc(C2Oc3c(OC)cc(C=CCO)cc3C2CO)ccc1 O
or specifying that the O really has no hydrogens, and instead has a negative formal charge:
COc1cc(C2Oc3c(OC)cc(C=CCO)cc3C2CO)ccc1 [O-1]
Thank you both for responses!
@davidlmobley Can we get cruder partial charges using a different method? For example RDKit's native method (Gasteiger) seems to work pretty rapidly. Here is an example: https://sourceforge.net/p/rdkit/mailman/message/34306459/
How would I go about transferring these Gasteiger charges to the GAFF OpenMM side? Is GAFF validated to use such charges or no?
@j-wags Thank you for this discovery! We were using someone else's module to produce these biopolymers, and attracting the Smiles string via the RDKit Chem.MolToSmiles. We didn't realize the SMILES output was doing this strange thing. We have fixed this, but I'd still like your input on my questions above regarding (2). Since we don't need the smiles gateway workaround, I think (1) is more or less answered.
@tarungog :
You are certainly welcome to provide whatever charges you like via the molecules you load (e.g. place them in a mol2 or -- coming soon if it's not already ready, an sdf file -- and load them into the toolkit). However, Gasteiger charges have a number of well-documented problems that you can find in the MD literature and are generally considered unsuitable for simulations in our field, so I really don't recommend those. If you insist on using them, though, you can consult our docs on Molecules and just make sure the molecules in the system you're parameterizing have charges when you go to parameterize the system, and you ask the parameter assignment routine to use the provided charges.
In terms of whether GAFF and OpenFF are validated to use such charges, the answer is "no" (see above); Gasteiger charges are considered poor for these purposes. If you want to be able to use reasonable quality charges for larger molecules you have a more painful road ahead of you.
@davidlmobley What is the nature of that painful road? What would it take to get reasonable charges that don't take forever? If it's really that bad, I may just revert back to using RDKit's MMFF94. I don't know too much personally about force fields (I'm a machine learning guy), so I'm happy with any input you may have. I came here to find faster force field calculations at a similar level of accuracy. Molecules of size 400 atoms or so are taking around 1 second to minimize in RDKit, which puts a serious damper on my ML training. If GAFF with Gasteiger charges is problematic but no more problematic than RDKit's built in MMFF, then I'm fine with using it. Is that the case?
@tarungog We are not here focused on faster FFs at a similar level of accuracy than MMFF94, but (a) more accurate FFs even if slower, and (b) infrastructure to continue improving FFs.
That said, RDKit's built-in MMFF may not truly be MMFF94 (especially when it pertains to electrostatics) and I am not sure anyone knows how accurate it is in general.
In terms of what the "more painful road" would be if you want to use reasonable charges for larger molecules -- basically someone needs to implement a procedure which involves fragmenting the larger molecules, capping them, then charging the individual component sub-moleucles, removing the caps, redistributing the charges, and linking them back together. This is doable and on the roadmap, but not something we are doing yet. It is not that hard, though, but would take a reasonable amount of developer time and is not urgent for us as it seems to be for you.
Hello,
I am getting an error when trying to calculate energy on a molecule via the openff and openmm packages.
Anybody have any idea what's wrong?