OpenCL tests failing on build from source

jharrymoore commented 7 months ago

I am attempting to build openmm-torch from source - the compilation proceeds OK following Stephen's notes here, however the OpenCL tests are all segfaulting for me (the reference and CUDA tests all pass). In the GH CI all the recent runs seem to fail before running the c++ tests - could I check whether you can reproduce this on your end or whether it's an issue with my build? I have attached ctest logs and the build cache - let me know if there's anything else that would be helpful.

CMakeCache.txt LastTest.log

peastman commented 7 months ago

Segfaults usually indicate some sort of a binary incompatibility. You need OpenMM, OpenMM-Torch, and PyTorch to all be compiled in compatible ways (same ABI, etc.). You said you're compiling OpenMM-Torch from source. What about the others? Where do you install them from?

If it's only the OpenCL tests that segfault, that also suggests it may be an incompatibility with the OpenCL implementation. Which one do you use?

Try running one of the tests inside gdb. Let it run until it segfaults, then type bt to get a stack trace for where it happened. Post the full trace here. It might provide a clue.

jharrymoore commented 7 months ago

right you are - I was using OpenCL that was installed with conda. Switching to the compiler-provided OpenCL.so does the trick. Many thanks for the pointer!

openmm / openmm-torch

OpenCL tests failing on build from source #139