`create_openmm_system` very slow for large (MW > 500) small molecules

pyeguy commented 5 years ago

when creating an openmm system via ForceField.create_openmm_system the time required seems to drastically increase with MW. Currently I'm still waiting for MW ~900 molecule to finish after 30+min

To Reproduce

import openforcefield as off
from rdkit import Chem
from rdkit.Chem import AllChem
from simtk import openmm, unit
from simtk.openmm import app
from openforcefield.topology import Topology
from openforcefield.topology import Molecule
from openforcefield.typing.engines.smirnoff import ForceField
# loaded from smirnoff99Frosst package
ff = ForceField('smirnoff99Frosst-1.0.9.offxml')

# smiles for venetoclax
rdmol = Chem.MolFromSmiles("CC1(CCC(=C(C1)CN2CCN(CC2)C3=CC=C(C=C3)C(=O)NS(=O)(=O)C4=CC(=C(C=C4)N[C@H](CCN5CCOCC5)CSC6=CC=CC=C6)S(=O)(=O)C(F)(F)F)C7=CC=C(C=C7)Cl)C")

ofmol = Molecule.from_rdkit(rdmol)
topology = ofmol.to_topology()
org_system = ff.create_openmm_system(topology)

Output In AmberToolsToolkitwrapper.computer_partial_charges_am1bcc: Molecule '' has more than one conformer, but this function will only generate charges for the first one. warning get's thrown after ~10 seconds but the parameterization is still running...

Computing environment (please complete the following information):

Operating system Ubuntu 18.04

Output of running conda list


# packages in environment at /home/cpye/anaconda3/envs/openmm:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
alabaster                 0.7.12                     py_0    conda-forge
ambermini                 16.16.0                       7    omnia
asn1crypto                0.24.0                py37_1003    conda-forge
attrs                     19.1.0                     py_0    conda-forge
babel                     2.7.0                      py_0    conda-forge
backcall                  0.1.0                      py_0    conda-forge
bleach                    3.1.0                      py_0    conda-forge
blosc                     1.17.0               he1b5a44_0    conda-forge
bson                      0.5.8                      py_0    conda-forge
bzip2                     1.0.8                h516909a_0    conda-forge
ca-certificates           2019.6.16            hecc5488_0    conda-forge
cairo                     1.16.0            h18b612c_1001    conda-forge
certifi                   2019.6.16                py37_1    conda-forge
cffi                      1.12.3           py37h8022711_0    conda-forge
chardet                   3.0.4                 py37_1003    conda-forge
cryptography              2.7              py37h72c5cf5_0    conda-forge
cycler                    0.10.0                     py_1    conda-forge
cython                    0.29.13          py37he1b5a44_0    conda-forge
dbus                      1.13.6               he372182_0    conda-forge
decorator                 4.4.0                      py_0    conda-forge
defusedxml                0.5.0                      py_1    conda-forge
docutils                  0.15.2                   py37_0    conda-forge
entrypoints               0.3                   py37_1000    conda-forge
expat                     2.2.5             he1b5a44_1003    conda-forge
fftw3f                    3.3.4                         2    omnia
fontconfig                2.13.1            he4413a7_1000    conda-forge
freetype                  2.10.0               he983fc9_0    conda-forge
gettext                   0.19.8.1          hc5be6a0_1002    conda-forge
glib                      2.58.3            h6f030ca_1002    conda-forge
gst-plugins-base          1.14.5               h0935bb2_0    conda-forge
gstreamer                 1.14.5               h36ae1b5_0    conda-forge
hdf5                      1.10.5          nompi_h3c11f04_1100    conda-forge
icu                       58.2              hf484d3e_1000    conda-forge
idna                      2.8                   py37_1000    conda-forge
imagesize                 1.1.0                      py_0    conda-forge
ipykernel                 5.1.1            py37h5ca1d4c_0    conda-forge
ipymol                    0.5                      pypi_0    pypi
ipython                   7.7.0            py37h5ca1d4c_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.5.1                      py_0    conda-forge
jedi                      0.14.1                   py37_0    conda-forge
jinja2                    2.10.1                     py_0    conda-forge
jpeg                      9c                h14c3975_1001    conda-forge
json5                     0.8.5                      py_0    conda-forge
jsonschema                3.0.2                    py37_0    conda-forge
jupyter                   1.0.0                      py_2    conda-forge
jupyter_client            5.3.1                      py_0    conda-forge
jupyter_console           6.0.0                      py_0    conda-forge
jupyter_core              4.4.0                      py_0    conda-forge
jupyterlab                1.0.4                    py37_0    conda-forge
jupyterlab_server         1.0.0                      py_1    conda-forge
kiwisolver                1.1.0            py37hc9558a2_0    conda-forge
libblas                   3.8.0               11_openblas    conda-forge
libboost                  1.67.0               h46d08c1_4  
libcblas                  3.8.0               11_openblas    conda-forge
libffi                    3.2.1             he1b5a44_1006    conda-forge
libgcc                    7.2.0                h69d50b8_2    conda-forge
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libiconv                  1.15              h516909a_1005    conda-forge
liblapack                 3.8.0               11_openblas    conda-forge
libopenblas               0.3.6                h6e990d7_6    conda-forge
libpng                    1.6.37               hed695b0_0    conda-forge
libsodium                 1.0.17               h516909a_0    conda-forge
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.0.10            h57b8799_1003    conda-forge
libuuid                   2.32.1            h14c3975_1000    conda-forge
libxcb                    1.13              h14c3975_1002    conda-forge
libxml2                   2.9.9                h13577e0_2    conda-forge
line_profiler             2.1.2           py37h516909a_1003    conda-forge
lz4-c                     1.8.3             he1b5a44_1001    conda-forge
lzo                       2.10              h14c3975_1000    conda-forge
markupsafe                1.1.1            py37h14c3975_0    conda-forge
matplotlib                3.1.1                    py37_0    conda-forge
matplotlib-base           3.1.1            py37hfd891ef_0    conda-forge
mdtraj                    1.9.3            py37h00575c5_0    conda-forge
mistune                   0.8.4           py37h14c3975_1000    conda-forge
mock                      3.0.5                    py37_0    conda-forge
msgpack-python            0.6.1            py37h6bb024c_0    conda-forge
nbconvert                 5.5.0                      py_0    conda-forge
nbformat                  4.4.0                      py_1    conda-forge
ncurses                   6.1               hf484d3e_1002    conda-forge
networkx                  2.3                        py_0    conda-forge
nose                      1.3.7                 py37_1002    conda-forge
notebook                  6.0.0                    py37_0    conda-forge
numexpr                   2.6.9           py37h637b7d7_1000    conda-forge
numpy                     1.17.0           py37h95a1406_0    conda-forge
numpydoc                  0.9.1                      py_0    conda-forge
olefile                   0.46                       py_0    conda-forge
openforcefield            0.4.1                    py37_0    omnia
openmm                    7.3.1           py37_cuda92_rc_2    omnia
openmoltools              0.8.3                    py37_1    omnia
openssl                   1.1.1c               h516909a_0    conda-forge
packaging                 19.0                       py_0    conda-forge
pandas                    0.25.0           py37hb3f55d8_0    conda-forge
pandoc                    2.7.3                         0    conda-forge
pandocfilters             1.4.2                      py_1    conda-forge
parmed                    3.2.0                    py37_0    omnia
parso                     0.5.1                      py_0    conda-forge
patsy                     0.5.1                      py_0    conda-forge
pcre                      8.41              hf484d3e_1003    conda-forge
pexpect                   4.7.0                    py37_0    conda-forge
pickleshare               0.7.5                 py37_1000    conda-forge
pillow                    6.1.0            py37h6b7be26_1    conda-forge
pip                       19.2.1                   py37_0    conda-forge
pixman                    0.38.0            h516909a_1003    conda-forge
prometheus_client         0.7.1                      py_0    conda-forge
prompt_toolkit            2.0.9                      py_0    conda-forge
pthread-stubs             0.4               h14c3975_1001    conda-forge
ptyprocess                0.6.0                   py_1001    conda-forge
py-boost                  1.67.0           py37h04863e7_4  
pycparser                 2.19                     py37_1    conda-forge
pygments                  2.4.2                      py_0    conda-forge
pyopenssl                 19.0.0                   py37_0    conda-forge
pyparsing                 2.4.2                      py_0    conda-forge
pyqt                      5.9.2            py37hcca6a23_2    conda-forge
pyrsistent                0.15.4           py37h516909a_0    conda-forge
pysocks                   1.7.0                    py37_0    conda-forge
pytables                  3.5.2            py37h9f153d1_2    conda-forge
python                    3.7.3                h33d41f4_1    conda-forge
python-dateutil           2.8.0                      py_0    conda-forge
pytz                      2019.2                     py_0    conda-forge
pyyaml                    5.1.2            py37h516909a_0    conda-forge
pyzmq                     18.0.2           py37h1768529_2    conda-forge
qt                        5.9.7                h52cfd70_2    conda-forge
qtconsole                 4.5.2                      py_0    conda-forge
rdkit                     2019.03.3.0      py37hc20afe1_1    rdkit
readline                  8.0                  hf8c457e_0    conda-forge
requests                  2.22.0                   py37_1    conda-forge
scipy                     1.3.0            py37h921218d_1    conda-forge
seaborn                   0.9.0                      py_1    conda-forge
send2trash                1.5.0                      py_0    conda-forge
setuptools                41.0.1                   py37_0    conda-forge
sip                       4.19.8          py37hf484d3e_1000    conda-forge
six                       1.12.0                py37_1000    conda-forge
smirnoff99frosst          1.1.0                    py37_1    omnia
snowballstemmer           1.9.0                      py_0    conda-forge
sphinx                    2.1.2                      py_0    conda-forge
sphinxcontrib-applehelp   1.0.1                      py_0    conda-forge
sphinxcontrib-devhelp     1.0.1                      py_0    conda-forge
sphinxcontrib-htmlhelp    1.0.2                      py_0    conda-forge
sphinxcontrib-jsmath      1.0.1                      py_0    conda-forge
sphinxcontrib-qthelp      1.0.2                      py_0    conda-forge
sphinxcontrib-serializinghtml 1.1.1                      py_0    conda-forge
sqlite                    3.29.0               hcee41ef_0    conda-forge
statsmodels               0.10.1           py37hc1659b7_0    conda-forge
terminado                 0.8.2                    py37_0    conda-forge
testpath                  0.4.2                   py_1001    conda-forge
tk                        8.6.9             hed695b0_1002    conda-forge
toml                      0.10.0                     py_0    conda-forge
tornado                   6.0.3            py37h516909a_0    conda-forge
traitlets                 4.3.2                 py37_1000    conda-forge
urllib3                   1.25.3                   py37_0    conda-forge
wcwidth                   0.1.7                      py_1    conda-forge
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.33.4                   py37_0    conda-forge
widgetsnbextension        3.5.1                    py37_0    conda-forge
xmltodict                 0.12.0                     py_0    conda-forge
xorg-kbproto              1.0.7             h14c3975_1002    conda-forge
xorg-libice               1.0.10               h516909a_0    conda-forge
xorg-libsm                1.2.3             h84519dc_1000    conda-forge
xorg-libx11               1.6.8                h516909a_0    conda-forge
xorg-libxau               1.0.9                h14c3975_0    conda-forge
xorg-libxdmcp             1.1.3                h516909a_0    conda-forge
xorg-libxext              1.3.4                h516909a_0    conda-forge
xorg-libxrender           0.9.10            h516909a_1002    conda-forge
xorg-renderproto          0.11.1            h14c3975_1002    conda-forge
xorg-xextproto            7.3.0             h14c3975_1002    conda-forge
xorg-xproto               7.0.31            h14c3975_1007    conda-forge
xz                        5.2.4             h14c3975_1001    conda-forge
yaml                      0.1.7             h14c3975_1001    conda-forge
zeromq                    4.3.2                he1b5a44_2    conda-forge
zlib                      1.2.11            h516909a_1005    conda-forge
zstd                      1.4.0                h3b9ef0a_0    conda-forge

pyeguy commented 5 years ago

Finished w/ Wall time: 50min 57s

j-wags commented 5 years ago

Thanks for the detailed issue report, @pyeguy . This is unfortunately expected behavior. The AM1 semiempirical quantum calculations are computationally expensive, and they scale poorly. On the backend, our open-source stack uses sqm from the AmberTools suite. Depending on your situation, you may be able to get an academic license for the OpenEye toolkits, which offer a higher-performance semiempirical quantum chemistry package.

Also, if you already have a desired set of partial charges calculated for your atoms, you can skip the charge generation step using the charge_from_molecules kwarg to create_openmm_system.

davidlmobley commented 5 years ago

100% agree with Jeff here. Until we have a general fragmentation scheme that can break larger molecules up into pieces and parameterize them consistently before stitching them back together, or an alternative charging scheme (ML-based, perhaps) which is adequate for larger molecules, we are stuck in this world. We're still running a QM calculation on the whole molecule so it's going to be slow.

jchodera commented 5 years ago

We are working on several strategies to accelerate this, but it will likely be a few months before we can replace toolkit AM1-BCC charges with something significantly faster.

pyeguy commented 5 years ago

Thanks for the quick replies and all the good work here!

I think @davidlmobley 's approach would probably work great for my application where I have a series of highly related molecules I would like to paramaterize and then simulate.

in the meantime I can use the Gastier approximations from rdkit but I assume those are rather dreadful...

I'm sure this is a lot of work as well but would switching to a more parallel QM opensource stack help with performance ie CP2K interfaced via pycp2k

jaimergp commented 5 years ago

Once I had to parametrize a ~400 atom ligand, so sqm obviously choked on that for hours and hours, trying to minimize the structure. In the end, I minimized the ligand using Gaussian with a semiempirical method, and then supplied those coordinates to the Antechamber stack.

Maybe you can use ase to minimize the structure with some other program before passing it to the sqm via openforcefield?

j-wags commented 5 years ago

@pyeguy Thanks for putting CP2K on our radar. Right now we need to stick to dependencies which are conda-installable to make our deployment fast and easy, but I'll keep an eye on that to see if it becomes easier to install. Based on some work last week, I've found that conda does offer access to gfortran and gcc, but IIRC they were version 4.5.X on mac, which falls short of CP2K's requirements..

Again, thanks for the feedback. If CP2K gets into a conda package, I'd love to try out their implementation!

openforcefield / openff-toolkit

`create_openmm_system` very slow for large (MW > 500) small molecules #395