Open kcreemer opened 4 years ago
Hi @kcreemer,
Thanks for the thoroughly-written issue.
Are there alternative ways to fragmentate the polymer so charge calculations do not have to be performed for the molecule itself but for the units it consists of?
We're planning to develop a workflow to automatically fragment and charge polymers like this. This workflow would allow users to define a protein force field, a residue separator SMARTS, and a method for treating nonstandard amino acids and capping groups. Unfortunately, we have several more pressing features on our roadmap, so I can't say whether this will be ready within a year. In the short-term, you're probably best off defining LibraryCharges
for your polymer, using the 10-mer approach that you mentioned above.
Can we define a new library residue so this can be recognized within our molecule?
Yes, and this is the approach I would suggest. Use examples for the LibraryCharges
tag can be found in the SMIRNOFF spec documentation. This will require writing three SMARTS strings by hand (which can be a bit painful), but once it works, it should suffice for polymers of arbitrary length. As you mentioned, the three LibraryCharge
elements that would need to be defined are the initiator, terminator, and monomer unit. If writing the SMARTS is prohibitively difficult, we provide functionality in Chemper for machine-generating SMARTS patterns, however it is not as geared for external use as the OFF Toolkit, so the installation/documentation may not be as mature.
Alternatively use: create_openmm_system(topology, charge_from_molecules=molecule_list)?
This would also work, though you'd need to manually update offmol.partial_charges
with your desired charges ahead of time. I'd be cautious about this, just because the workflow would need to assign a large number of charges in exactly the right order to match up with the atom indexing. If, for any reason, the molecule atom indexing changed or was misinterpreted, the partial charges assigned to the resulting system would silently be scrambled. For this reason, I'd recommend using LibraryCharges
, since it ensures that charges are assigned based on the connectivity of the model.
Let's leave this issue open until we get a solution working for you. Please let me know if there's anything else I can do to help.
Dear Mr. Wagner,
Thank you for your fast reply and clear explanation.
Other questions arose now. Where can the LibraryCharges be defined? Does this require smirks notation of each part of the polymer as in the example you provided? Or does SMARTS suffice, as you suggest in your mail.
Thanks in advance.
Kind regards, Karolien Creemers
Op di 25 feb. 2020 om 17:44 schreef Jeff Wagner notifications@github.com:
Hi @kcreemer https://github.com/kcreemer,
Thanks for the thoroughly-written issue.
Are there alternative ways to fragmentate the polymer so charge calculations do not have to be performed for the molecule itself but for the units it consists of?
We're planning to develop a workflow to automatically fragment and charge polymers like this. This workflow would allow users to define a protein force field, a residue separator SMARTS, and a method for treating nonstandard amino acids and capping groups. Unfortunately, we have several more pressing features on our roadmap, so I can't say whether this will be ready within a year. In the short-term, you're probably best off defining LibraryCharges for your polymer, using the 10-mer approach that you mentioned above.
Can we define a new library residue so this can be recognized within our molecule?
Yes, and this is the approach I would suggest. Use examples for the LibraryCharges tag can be found in the SMIRNOFF spec documentation https://open-forcefield-toolkit.readthedocs.io/en/0.6.0/smirnoff.html#librarycharges-library-charges-for-polymeric-residues-and-special-solvent-models. This will require writing three SMARTS strings by hand (which can be a bit painful), but once it works, it should suffice for polymers of arbitrary length. As you mentioned, the three LibraryCharge elements that would need to be defined are the initiator, terminator, and monomer unit. If writing the SMARTS is prohibitively difficult, we provide functionality in Chemper https://github.com/mobleylab/chemper for machine-generating SMARTS patterns, however it is not as geared for external use as the OFF Toolkit, so the installation/documentation may not be as mature.
Alternatively use: create_openmm_system(topology, charge_from_molecules=molecule_list)?
This would also work, though you'd need to manually update offmol.partial_charges with your desired charges ahead of time. I'd be cautious about this, just because the workflow would need to assign a large number of charges in exactly the right order to match up with the atom indexing. If, for any reason, the molecule atom indexing changed or was misinterpreted, the partial charges assigned to the resulting system would silently be scrambled. For this reason, I'd recommend using LibraryCharges, since it ensures that charges are assigned based on the connectivity of the model.
Let's leave this issue open until we get a solution working for you. Please let me know if there's anything else I can do to help.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openforcefield/openforcefield/issues/528?email_source=notifications&email_token=ANQHY5VHQBOS5LEDUMV3LCTREVDFTA5CNFSM4K3JBCG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM4T4VA#issuecomment-590954068, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANQHY5QRT5F6HVUTGUWGCKLREVDFTANCNFSM4K3JBCGQ .
Where can the LibraryCharges be defined?
LibraryCharges can be defined in a force field file (OFFXML format). Here's an example of what one looks like.
To use this in practice, you'll want to load a complete force field (with bonds and angles and such), and then ALSO load the LibraryCharges OFFXML file (or string) that provides charges for the molecule of interest.
Here is an example of how to use LibraryCharges. A ForceField object can load multiple data sources, so the example below first loads our "Parsley" force field ("openff-1.0.0.offxml"), and then the LibraryCharges FF that I linked above.
from openforcefield.topology import Molecule
from openforcefield.typing.engines.smirnoff import ForceField
xml_ethanol_library_charges_by_atom_ff = '''
<SMIRNOFF version="0.3" aromaticity_model="OEAroModel_MDL">
<LibraryCharges version="0.3">
<LibraryCharge smirks="[#1:1]-[#6]" charge1="-0.02*elementary_charge" />
<LibraryCharge smirks="[#6X4:1]" charge1="-0.2*elementary_charge" />
<LibraryCharge smirks="[#1:1]-[#6]-[#8]" charge1="-0.01*elementary_charge" />
<LibraryCharge smirks="[#6X4:1]-[#8]" charge1="-0.1*elementary_charge" />
<LibraryCharge smirks="[#8X2:1]" charge1="0.3*elementary_charge" />
<LibraryCharge smirks="[#1:1]-[#8]" charge1="0.08*elementary_charge" />
</LibraryCharges>
</SMIRNOFF>
'''
ethanol = Molecule.from_smiles('CCO')
topology = ethanol.to_topology()
ff = ForceField('openff-1.0.0.offxml', xml_ethanol_library_charges_by_atom_ff)
system = ff.create_openmm_system(topology)
Note that the charge values here are just used for testing -- They don't make any physical sense. But we can check over the system this creates and verify that the charges were assigned to the different atoms correctly:
for index in range(ethanol.n_atoms):
print('mass:', system.getParticleMass(index), 'charge:', system.getForce(2).getParticleParameters(index)[0])
mass: 12.01078 Da charge: -0.2 e mass: 12.01078 Da charge: -0.1 e mass: 15.99943 Da charge: 0.3 e mass: 1.007947 Da charge: -0.02 e mass: 1.007947 Da charge: -0.02 e mass: 1.007947 Da charge: -0.02 e mass: 1.007947 Da charge: -0.01 e mass: 1.007947 Da charge: -0.01 e mass: 1.007947 Da charge: 0.08 e
Does this require smirks notation of each part of the polymer as in the example you provided? Or does SMARTS suffice, as you suggest in your mail.
We use SMARTS. Somehow we made a typo early on and said "SMIRKS" when we meant to say "SMARTS", and then the whole joke with "SMIRNOFF" got started, and "SMARNOFF" wouldn't be very funny, so some of our older documentation says "SMIRKS" when we really mean "SMARTS". We're trying to correct this everywhere we can. To the best of my knowledge, our software only ever uses SMARTS.
Dear Mr. Wagner,
We managed to compute the charges for the species (monomer, initiator and terminator) and create an offxml file with the LibraryCharges. It consists out of three lines: one line for the initiator, one line for the monomer and one line for the terminator after which the charges are defined, as can be seen below.
xml_pethoxtot_library_charges_ff = '''
'''
For our application, the initiator and terminator have the same structure but different partial charges. Hence we named each line in the offxml file. In the molecule, we assigned a residue name to the atoms (column 'resName' in the topology). We successfully could create a system with 10 monomers and check the partial charges afterwards as you mentioned above. Unfortunately, the system fails for 50 monomers. So once the number of atoms exceeds a certain limit (which I believe is 300 for both antechambers and openeye), the 'Unable to assign charges' error appears again. This is however strange as the charges are taken from the LibraryCharges offxml file. It is however possible to change the maximum allowed number of atoms in the case of open eye (oequacpac.OEAssignCharges(oemol, oequacpac. OEAM1BCCCharges(maxAtoms=1400))) but as this method should not be called in the first place we wonder which is causing the error.
As a suggestion, maybe an extra argument can be added to the ff.create_openmm_system function, which indicates the maximum number of atoms?
Furthermore, we are unsure about how the partial charges are assigned upon creation of the system with the command ff.create_openmm_system(topology). Because there does not seem to be a link with the assigned residue names, but with the SMIRKS notation instead. Is it possible to link the calculated LibraryCharges with the molecule by the 'resName' in the topology?
Thank you for your help.
Kind regards, Karolien Creemers
CODE: off_forcefield = ForceField('openff-1.0.0.offxml', 'xml_pethoxtot_library_chargesff.offxml') polymer = app.PDBFile(f"polymer{number_of_units}units_res.pdb") uniq_molecules = [Molecule.from_smiles(polymer_smiles)] off_polymer_topology = Topology.from_openmm(polymer.topology,unique_molecules=uniq_molecules) off_polymer_system = off_forcefield.create_openmm_system(off_polymer_topology)
ERROR: Exception Traceback (most recent call last)
Thanks for reporting back!
One note: Your initiator and terminator SMARTS expressions are the same, so only the second one will ever match (so "INE" is never applied, "CTR" applies to both ends). You can fix this by expanding the initiator and terminator SMARTS to include some non-tagged atoms, but which include enough chemical environment to distinguish the beginning from the end.
As for the charge assignment problem: I'm optimistic that #509 will have fixed the max-size issue. This fix will come out in our 0.7.0 release, scheduled for early April. Basically, both RDKit and OpenEye toolkits have a maximum number of SMARTS matches they'll return, which we've found we need to increase when dealing with larger molecules. You can apply a manual patch right now to see if that fixes the issue for you: In your~/anaconda3/envs/myenv/lib/python3.7/site-packages/openforcefield/utils/toolkits.py
, apply the following change:
Please let me know if that helps, otherwise we can try to figure out what else might be happening.
Dear Mr. Wagner,
I created a unique SMARTS notation for the three different residues and added them together with the corresponding charges to the LibraryCharges offxml file. By adapting the toolkits.py file, the above mentioned error did disappear. So we were able to parameterize the polymer correctly! Many thanks for your help and patience. We are currently setting up a few test systems. Hopefully everything will run smoothly from now on.
Kind regards, Karolien Creemers
Op do 12 mrt. 2020 om 16:56 schreef Jeff Wagner notifications@github.com:
Thanks for reporting back!
One note: Your initiator and terminator SMARTS expressions are the same, so only the second one will ever match (so "INE" is never applied, "CTR" applies to both ends). You can fix this by expanding the initiator and terminator SMARTS to include some non-tagged atoms, but which include enough chemical environment to distinguish the beginning from the end.
As for the charge assignment problem: I'm optimistic that #509 https://github.com/openforcefield/openforcefield/pull/509 will have fixed the max-size issue. This fix will come out in our 0.7.0 release, scheduled for early April. Basically, both RDKit and OpenEye toolkits have a maximum number of SMARTS matches they'll return, which we've found we need to increase when dealing with larger molecules. You can apply a manual patch right now to see if that fixes the issue for you: In your ~/anaconda3/envs/myenv/lib/python3.7/site-packages/openforcefield/utils/toolkits.py, apply the following change:
Please let me know if that helps, otherwise we can try to figure out what else might be happening.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openforcefield/openforcefield/issues/528#issuecomment-598266694, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANQHY5TJQT6CMANHDCFFJXDRHEA35ANCNFSM4K3JBCGQ .
Describe the bug To study the interaction between a polymer and an API, we want to parameterize polymers ranging from 50 to 600 monomer units. This to perform a molecular dynamics calculation with openff. However from the moment we reach +300 atoms, calculations of the partial charges is no longer possible.
To Reproduce The setup of the system is done using smiles codes and openbabel (and pybel). It is implemented via the following way: off_polymer_system = off_forcefield.create_openmm_system(off_polymer_topology)
Output Exception Traceback (most recent call last)