Closed mikemhenry closed 1 month ago
Also sorry @peastman I didn't realize you will still debugging this!
Feel free to help debug it if you want! At the moment I'm looking into the Mac builds, which fail with a linker error. otool -l _openmm.cpython-310-darwin.so
reveals
Load command 16
cmd LC_RPATH
cmdsize 48
path /Users/runner/openmm-install/lib (offset 12)
That was the path to the libraries on the build machine. delocate
should have replaced it...
I fixed the Mac problem. I'll look at Linux next.
Here are the commands I executed:
mamba create -c conda-forge --name test python=3.10
conda activate test
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ openmm[cuda12]
python -m openmm.testInstallation
It finds all four platforms, and they all work correctly. Possibly that's because it's linking to the libraries installed globally rather than in the environment?
$ ldd ~/miniconda3/envs/test/lib/python3.10/site-packages/OpenMM.libs/lib/plugins/libOpenMMCUDA.so
linux-vdso.so.1 (0x00007ffd5ebe8000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fec1fe3c000)
libOpenMM.so => not found
libcuda.so.1 => /lib/x86_64-linux-gnu/libcuda.so.1 (0x00007fec1dc9e000)
libcufft.so.11 => /usr/local/cuda/lib64/libcufft.so.11 (0x00007fec0cc00000)
libnvrtc.so.12 => /usr/local/cuda/lib64/libnvrtc.so.12 (0x00007fec08e00000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fec1dc97000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fec1dc92000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fec08bd4000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fec1dbab000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fec1db8b000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fec089ab000)
/lib64/ld-linux-x86-64.so.2 (0x00007fec200ac000)
I'm never sure how to interpret that. The Python interpreter alters library linking, so the ones found within that process may not be the ones found by ldd
.
It looks like there's no RPATH specified in libOpenMMCUDA.so. The CUDA libraries installed with pip get put in a bunch of folders: site-packages/nvidia/cufft/lib
, site-packages/nvidia/cuda_nvrtc/lib
, etc. If we want that to work, we need to make it search all of those locations.
Can you test the latest build from #13? It sets the RPATH, and I verified that it's present in the installed library:
$ objdump -x ~/miniconda3/envs/test/lib/python3.10/site-packages/OpenMM.libs/lib/plugins/libOpenMMCUDA.so | grep PATH
RUNPATH $ORIGIN/..:$ORIGIN/../../../nvidia/cufft/lib:$ORIGIN/../../../nvidia/cuda_nvrtc/lib
When I run ldd on it, it reports
libOpenMM.so => /home/peastman/miniconda3/envs/test/lib/python3.10/site-packages/OpenMM.libs/lib/plugins/../libOpenMM.so (0x00007fb1c77c9000)
libcuda.so.1 => /lib/x86_64-linux-gnu/libcuda.so.1 (0x00007fb1c562b000)
libcufft.so.11 => /usr/local/cuda/lib64/libcufft.so.11 (0x00007fb1b4600000)
libnvrtc.so.12 => /usr/local/cuda/lib64/libnvrtc.so.12 (0x00007fb1b0800000)
It's successfully finding libOpenMM.so
, but it's linking to globally installed versions of libcufft.so
and libnvrtc.so
. Presumably it's because that location is higher in the search path, and it would use the ones from the environment if there weren't global ones?
How do I delete the exiting wheel from the test server and replace it with the fixed one? If I try to delete the existing one, it warns me I won't be able to upload a new file with the same name.
PyPI doesn't let you "overwrite" existing releases, so we need to change the version to something like 8.2.0rc1
or something, really whatever convention you like from https://packaging.python.org/en/latest/specifications/version-specifiers/#examples-of-compliant-version-schemes
Also #13 worked!
OpenMM Version: 8.2
Git Revision: f5fc52ffd757a86aa1d05bd35a21108deff9eda1
There are 4 Platforms available:
1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
3 CUDA - Successfully computed forces
4 OpenCL - Successfully computed forces
Median difference in forces between platforms:
Reference vs. CPU: 6.29719e-06
Reference vs. CUDA: 6.74822e-06
CPU vs. CUDA: 7.47712e-07
Reference vs. OpenCL: 6.75018e-06
CPU vs. OpenCL: 7.6531e-07
CUDA vs. OpenCL: 1.78763e-07
All differences are within tolerance.
I made an empty env with just python=3.10 and pip:
$ micromamba create -n openmm82-pypi-cuda pip python=3.10
I then activated the environment and installed the cuda package variant:
$ pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ openmm[cuda12]
Running test installation didn't show the cuda platform:
So I checked for plugin loading failures:
The file is there, but there seems to be a linking issue?