Closed peastman closed 10 months ago
I haven't tried using TorchMD-Net in C++ so I don't know. You could try breaking the model down and exporting submodules to narrow down the issue. Maybe also trying just a small message passing pyg example to see if that's the issue. The implementation uses rather basic PyTorch functionalities except pyg's message passing implementation and custom kernels.
For what it's worth, gdb shows the error happens inside torch_cluster.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff6e567f1 in __GI_abort () at abort.c:79
#2 0x00007fffc228e036 in __gnu_cxx::__verbose_terminate_handler ()
at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x00007fffc228c524 in __cxxabiv1::__terminate (handler=<optimized out>)
at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#4 0x00007fffc228c576 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:58
#5 0x00007fffc228c768 in __cxxabiv1::__cxa_throw (obj=0x5555597a3f50,
tinfo=0x7fffc2380278 <typeinfo for std::bad_alloc>,
dest=0x7fffc228b0e4 <std::bad_alloc::~bad_alloc()>)
at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:98
#6 0x00007fffc228cb95 in operator new (sz=140734623278748)
at /home/conda/feedstock_root/build_artifacts/gcc_compilers_1652324151713/work/build/x86_64-conda-linux-gnu/libstdc++-v3/libsupc++/new:64
#7 0x00007fffb9dc442e in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*> ()
from /home/peastman/miniconda3/envs/torchmd-net/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#8 0x00007fffb9dc52e6 in c10::RegisterOperators::inferSchemaFromKernels_(c10::OperatorName const&, c10::RegisterOperators::Options const&) ()
from /home/peastman/miniconda3/envs/torchmd-net/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#9 0x00007fffb9dc8152 in c10::RegisterOperators::checkSchemaAndRegisterOp_(c10::RegisterOperators::Options&&) ()
from /home/peastman/miniconda3/envs/torchmd-net/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#10 0x00007fff553a469d in std::enable_if<c10::guts::is_function_type<long ()>::value&&(!std::is_same<long (), void (c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*)>::value), c10::RegisterOperators&&>::type c10::RegisterOperators::op<long ()>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long (*)(), c10::RegisterOperators::Options&&) && ()
from /home/peastman/miniconda3/envs/torchmd-net/lib/python3.9/site-packages/torch_cluster/_version_cuda.so
#11 0x00007fff553a217d in ?? ()
from /home/peastman/miniconda3/envs/torchmd-net/lib/python3.9/site-packages/torch_cluster/_version_cuda.so
#12 0x00007ffff7de38d3 in call_init (env=0x555557f25be0, argv=0x7fffffffbb48, argc=2,
l=<optimized out>) at dl-init.c:72
#13 _dl_init (main_map=main_map@entry=0x5555598792d0, argc=2, argv=0x7fffffffbb48,
env=0x555557f25be0) at dl-init.c:119
#14 0x00007ffff7de839f in dl_open_worker (a=a@entry=0x7fffffff1a70) at dl-open.c:522
#15 0x00007ffff6f7d16f in __GI__dl_catch_exception (exception=0x7fffffff1a50,
operate=0x7ffff7de7f60 <dl_open_worker>, args=0x7fffffff1a70) at dl-error-skeleton.c:196
#16 0x00007ffff7de796a in _dl_open (
file=0x7fff553b9440 "/home/peastman/miniconda3/envs/torchmd-net/lib/python3.9/site-packages/torch_cluster/_version_cuda.so", mode=-2147483646, caller_dlopen=0x7ffff7e7df3e <py_dl_open+142>,
nsid=<optimized out>, argc=2, argv=<optimized out>, env=0x555557f25be0) at dl-open.c:605
(continuing on up to stack frame #491)
Sorry, it looks like I misdiagnosed what the problem is. The error actually occurs as soon as I import torchmdnet, and it's caused by upgrading to PyTorch 1.12. I create an environment like this:
mamba env create -f environment.yml
conda activate torchmd-net
pip install -e .
At that point things work correctly. So now upgrade PyTorch:
mamba install -c conda-forge pytorch=1.12
and execute the command
python -c "from torchmdnet.models.model import load_model"
It fails with
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted
This is on Ubuntu 20.04. Here's the complete environment.
# packages in environment at /home/peastman/miniconda3/envs/torchmd-net:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
absl-py 1.2.0 pyhd8ed1ab_0 conda-forge
aiohttp 3.8.3 py39hb9d737c_0 conda-forge
aiosignal 1.2.0 pyhd8ed1ab_0 conda-forge
alsa-lib 1.2.3.2 h166bdaf_0 conda-forge
async-timeout 4.0.2 pyhd8ed1ab_0 conda-forge
attrs 22.1.0 pyh71513ae_1 conda-forge
blinker 1.4 py_1 conda-forge
brotli 1.0.9 h166bdaf_7 conda-forge
brotli-bin 1.0.9 h166bdaf_7 conda-forge
brotlipy 0.7.0 py39hb9d737c_1004 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.18.1 h7f98852_0 conda-forge
ca-certificates 2022.9.14 ha878542_0 conda-forge
cached-property 1.5.2 hd8ed1ab_1 conda-forge
cached_property 1.5.2 pyha770c72_1 conda-forge
cachetools 5.2.0 pyhd8ed1ab_0 conda-forge
certifi 2022.9.14 pyhd8ed1ab_0 conda-forge
cffi 1.15.1 py39he91dace_0 conda-forge
charset-normalizer 2.1.1 pyhd8ed1ab_0 conda-forge
click 8.1.3 py39hf3d152e_0 conda-forge
colorama 0.4.5 pyhd8ed1ab_0 conda-forge
contourpy 1.0.5 py39hf939315_0 conda-forge
coverage 6.4.4 py39hb9d737c_0 conda-forge
cryptography 37.0.1 py39h9ce1e76_0
cudatoolkit 11.7.0 hd8887f6_10 conda-forge
cudnn 8.4.1.50 hed8a83a_0 conda-forge
cycler 0.11.0 pyhd8ed1ab_0 conda-forge
dbus 1.13.18 hb2f20db_0
expat 2.4.9 h27087fc_0 conda-forge
flake8 5.0.4 pyhd8ed1ab_0 conda-forge
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 hab24e00_0 conda-forge
fontconfig 2.14.0 hc2a2eb6_1 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
fonttools 4.37.3 py39hb9d737c_0 conda-forge
freetype 2.12.1 hca18f0e_0 conda-forge
frozenlist 1.3.1 py39hb9d737c_0 conda-forge
fsspec 2022.8.2 pyhd8ed1ab_0 conda-forge
future 0.18.2 py39hf3d152e_5 conda-forge
gettext 0.21.0 hf68c758_0
glib 2.72.1 h6239696_0 conda-forge
glib-tools 2.72.1 h6239696_0 conda-forge
google-auth 2.11.0 pyh6c4a22f_0 conda-forge
google-auth-oauthlib 0.4.1 py_2 conda-forge
googledrivedownloader 0.4 pyhd3deb0d_1 conda-forge
grpc-cpp 1.48.1 hc2bec63_1 conda-forge
grpcio 1.48.1 py39hfaff5cf_1 conda-forge
gst-plugins-base 1.20.2 hcf0ee16_0 conda-forge
gstreamer 1.20.3 hd4edc92_2 conda-forge
h5py 3.7.0 nompi_py39hd51670d_101 conda-forge
hdf5 1.12.2 nompi_h4df4325_100 conda-forge
html5lib 1.1 pyh9f0ad1d_0 conda-forge
icu 69.1 h9c3ff4c_0 conda-forge
idna 3.4 pyhd8ed1ab_0 conda-forge
importlib-metadata 4.11.4 py39hf3d152e_0 conda-forge
importlib_metadata 4.11.4 hd8ed1ab_0 conda-forge
iniconfig 1.1.1 pyh9f0ad1d_0 conda-forge
intel-openmp 2022.1.0 h9e868ea_3769
isodate 0.6.1 pyhd8ed1ab_0 conda-forge
jinja2 3.1.2 pyhd8ed1ab_1 conda-forge
joblib 1.2.0 pyhd8ed1ab_0 conda-forge
jpeg 9e h166bdaf_2 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
kiwisolver 1.4.4 py39hf939315_0 conda-forge
krb5 1.19.3 h08a2579_0 conda-forge
lark-parser 0.12.0 pyhd8ed1ab_0 conda-forge
lcms2 2.12 hddcbb42_0 conda-forge
ld_impl_linux-64 2.38 h1181459_1
lerc 4.0.0 h27087fc_0 conda-forge
libabseil 20220623.0 cxx17_h48a1fff_4 conda-forge
libblas 3.9.0 16_linux64_mkl conda-forge
libbrotlicommon 1.0.9 h166bdaf_7 conda-forge
libbrotlidec 1.0.9 h166bdaf_7 conda-forge
libbrotlienc 1.0.9 h166bdaf_7 conda-forge
libcblas 3.9.0 16_linux64_mkl conda-forge
libclang 13.0.1 default_hc23dcda_0 conda-forge
libcurl 7.83.1 h2283fc2_0 conda-forge
libdeflate 1.14 h166bdaf_0 conda-forge
libedit 3.1.20210910 h7f8727e_0
libev 4.33 h516909a_1 conda-forge
libevent 2.1.10 h28343ad_4 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 12.1.0 h8d9b700_16 conda-forge
libgfortran-ng 12.1.0 h69a702a_16 conda-forge
libgfortran5 12.1.0 hdcd56e2_16 conda-forge
libglib 2.72.1 h2d90d5f_0 conda-forge
libgomp 12.1.0 h8d9b700_16 conda-forge
libiconv 1.16 h516909a_0 conda-forge
liblapack 3.9.0 16_linux64_mkl conda-forge
libllvm13 13.0.1 hf817b99_2 conda-forge
libnghttp2 1.47.0 hff17c54_1 conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libogg 1.3.5 h27cfd23_1
libopus 1.3.1 h7f98852_1 conda-forge
libpng 1.6.38 h753d276_0 conda-forge
libpq 14.5 he2d8382_0 conda-forge
libprotobuf 3.20.1 h6239696_4 conda-forge
libsqlite 3.39.3 h753d276_0 conda-forge
libssh2 1.10.0 hf14f497_3 conda-forge
libstdcxx-ng 12.1.0 ha89aaad_16 conda-forge
libtiff 4.4.0 h55922b4_4 conda-forge
libuuid 2.32.1 h14c3975_1000 conda-forge
libvorbis 1.3.7 he1b5a44_0 conda-forge
libwebp-base 1.2.4 h166bdaf_0 conda-forge
libxcb 1.13 h7f98852_1004 conda-forge
libxkbcommon 1.0.3 he3ba5ed_0 conda-forge
libxml2 2.9.12 h885dcf4_1 conda-forge
libzlib 1.2.12 h166bdaf_3 conda-forge
magma 2.5.4 h6103c52_2 conda-forge
markdown 3.4.1 pyhd8ed1ab_0 conda-forge
markupsafe 2.1.1 py39hb9d737c_1 conda-forge
matplotlib 3.6.0 py39hf3d152e_0 conda-forge
matplotlib-base 3.6.0 py39hf9fd14e_0 conda-forge
mccabe 0.7.0 pyhd8ed1ab_0 conda-forge
mkl 2022.1.0 hc2b9512_224
multidict 6.0.2 py39hb9d737c_1 conda-forge
munkres 1.1.4 pyh9f0ad1d_0 conda-forge
mysql-common 8.0.30 h26416b9_1 conda-forge
mysql-libs 8.0.30 hbc51c84_1 conda-forge
nccl 2.14.3.1 h0800d71_0 conda-forge
ncurses 6.3 h27087fc_1 conda-forge
networkx 2.8.6 pyhd8ed1ab_0 conda-forge
ninja 1.11.0 h924138e_0 conda-forge
nnpops 0.2 cuda112py39hcdac82f_5 conda-forge
nspr 4.33 h295c915_0
nss 3.78 h2350873_0 conda-forge
numpy 1.23.3 py39hba7629e_0 conda-forge
oauthlib 3.2.1 pyhd8ed1ab_0 conda-forge
openjpeg 2.5.0 h7d73246_1 conda-forge
openssl 3.0.5 h166bdaf_2 conda-forge
packaging 21.3 pyhd8ed1ab_0 conda-forge
pandas 1.5.0 py39h4661b88_0 conda-forge
pcre 8.45 h9c3ff4c_0 conda-forge
pillow 9.2.0 py39hd5dbb17_2 conda-forge
pip 22.2.2 pyhd8ed1ab_0 conda-forge
pluggy 1.0.0 py39hf3d152e_3 conda-forge
protobuf 3.20.1 py39h5a03fae_0 conda-forge
psutil 5.9.2 py39hb9d737c_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
py 1.11.0 pyh6c4a22f_0 conda-forge
pyasn1 0.4.8 py_0 conda-forge
pyasn1-modules 0.2.8 py_0
pycodestyle 2.9.1 pyhd8ed1ab_0 conda-forge
pycparser 2.21 pyhd8ed1ab_0 conda-forge
pydeprecate 0.3.2 pyhd8ed1ab_0 conda-forge
pyflakes 2.5.0 pyhd8ed1ab_0 conda-forge
pyjwt 2.5.0 pyhd8ed1ab_0 conda-forge
pyopenssl 22.0.0 pyhd8ed1ab_1 conda-forge
pyparsing 3.0.9 pyhd8ed1ab_0 conda-forge
pyqt 5.12.3 py39hf3d152e_8 conda-forge
pyqt-impl 5.12.3 py39hde8b62d_8 conda-forge
pyqt5-sip 4.19.18 py39he80948d_8 conda-forge
pyqtchart 5.12 py39h0fcd23e_8 conda-forge
pyqtwebengine 5.12.1 py39h0fcd23e_8 conda-forge
pysocks 1.7.1 py39hf3d152e_5 conda-forge
pytest 7.1.3 py39hf3d152e_0 conda-forge
pytest-cov 3.0.0 pyhd8ed1ab_0 conda-forge
python 3.9.13 h2660328_0_cpython conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-louvain 0.15 pyhd8ed1ab_1 conda-forge
python_abi 3.9 2_cp39 conda-forge
pytorch 1.12.1 cuda112py39ha0cca9b_200 conda-forge
pytorch-gpu 1.12.1 cuda112py39h1894f8f_200 conda-forge
pytorch-lightning 1.6.3 pyhd8ed1ab_0 conda-forge
pytorch_cluster 1.5.9 py39hbba90f3_0 conda-forge
pytorch_geometric 2.0.3 pyhd8ed1ab_0 conda-forge
pytorch_scatter 2.0.9 cuda112py39h83a068c_0 conda-forge
pytorch_sparse 0.6.15 py39h83a068c_0 conda-forge
pytz 2022.2.1 pyhd8ed1ab_0 conda-forge
pyu2f 0.1.5 pyhd8ed1ab_0 conda-forge
pyyaml 6.0 py39hb9d737c_4 conda-forge
qt 5.12.9 h1304e3e_6 conda-forge
rdflib 6.2.0 pyhd8ed1ab_0 conda-forge
re2 2022.06.01 h27087fc_0 conda-forge
readline 8.1.2 h0f457ee_0 conda-forge
requests 2.28.1 pyhd8ed1ab_1 conda-forge
requests-oauthlib 1.3.1 pyhd8ed1ab_0 conda-forge
rsa 4.9 pyhd8ed1ab_0 conda-forge
scikit-learn 1.1.2 py39he5e8d7e_0 conda-forge
scipy 1.9.1 py39h8ba3f38_0 conda-forge
setuptools 59.5.0 py39hf3d152e_0 conda-forge
setuptools-scm 6.3.2 pyhd8ed1ab_0 conda-forge
setuptools_scm 6.3.2 hd8ed1ab_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
sleef 3.5.1 h28343ad_2 conda-forge
sqlite 3.39.3 h4ff8645_0 conda-forge
tensorboard 2.6.0 py_0
tensorboard-plugin-wit 1.8.1 pyhd8ed1ab_0 conda-forge
threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge
tk 8.6.12 h27826a3_0 conda-forge
toml 0.10.2 pyhd8ed1ab_0 conda-forge
tomli 2.0.1 pyhd8ed1ab_0 conda-forge
torchani 2.2.2 cuda112py39h527ec63_6 conda-forge
torchmd-net 0.2.4 dev_0 <develop>
torchmetrics 0.8.2 pyhd8ed1ab_0 conda-forge
tornado 6.2 py39hb9d737c_0 conda-forge
tqdm 4.64.1 pyhd8ed1ab_0 conda-forge
typing-extensions 4.3.0 hd8ed1ab_0 conda-forge
typing_extensions 4.3.0 pyha770c72_0 conda-forge
tzdata 2022c h191b570_0 conda-forge
unicodedata2 14.0.0 py39hb9d737c_1 conda-forge
urllib3 1.26.11 pyhd8ed1ab_0 conda-forge
webencodings 0.5.1 py_1 conda-forge
werkzeug 2.2.2 pyhd8ed1ab_0 conda-forge
wheel 0.37.1 pyhd8ed1ab_0 conda-forge
xorg-libxau 1.0.9 h14c3975_0 conda-forge
xorg-libxdmcp 1.1.3 h516909a_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yaml 0.2.5 h7f98852_2 conda-forge
yarl 1.8.1 py39h5eee18b_0
zipp 3.8.1 pyhd8ed1ab_0 conda-forge
zlib 1.2.12 h166bdaf_3 conda-forge
zstd 1.5.2 h6239696_4 conda-forge
Torch-cluster is indeed the source of the problem. I replaced the conda-forge build with one from PyPI with
pip install --force torch_cluster
and the segfault went away.
Do you install the same version of torch_cluser
with conda
and pip
?
They're slightly different versions. The PyPI version is 1.6.0, but the most recent version on conda-forge is 1.5.9.
Closing this since torch_clusted is not a dependency anymore.
I have a model created with TorchMD-Net. I want to use it for running a simulation in OpenMM. That involves compiling to TorchScript, saving to a file, and loading it with the PyTorch C++ API. When I try to do that, it crashes with a
bad_alloc
down inside libtorch.Is this expected to work? Or do some of the packages like pyg and torch-cluster not support that workflow? If it's known not to work right now, what would need to happen to make it work?