openmm / openmm-plumed

OpenMM plugin to interface with PLUMED
59 stars 23 forks source link

Error building #60

Closed k2o0r closed 1 year ago

k2o0r commented 2 years ago

Hello, I have now successfully built and installed both PLUMED and OpenMM into a conda environment but now run into some difficulties with the build of this plugin. The error message I get is quite long so I included just the start and end of it below.

Screenshot 2022-07-22 at 17 19 01 Screenshot 2022-07-22 at 17 19 35

The ccmake settings used were as below:

Screenshot 2022-07-22 at 17 23 17

Any help is much appreciated, thanks!

raimis commented 2 years ago

Is any reason why want to build from the source rather than installing with conda?

k2o0r commented 2 years ago

Yes, use of plumed modules not included in conda build.

peastman commented 2 years ago

Can you copy the full output of the build into a text file and attach it? I suspect there may have been some additional information earlier in the build pointing to the root cause of the problem.

What version of OpenMM are you building against?

k2o0r commented 2 years ago

build_output.txt Like this?

I'm using openmm-7.7.0 from https://github.com/openmm/openmm/releases

peastman commented 2 years ago

Thanks! That's a really confusing error. It claims not to know about the Lepton namespace. But right at the top of that file are several includes,

#include "lepton/CustomFunction.h"
#include "lepton/ExpressionTreeNode.h"
#include "lepton/ParsedExpression.h"

I expected the log would contain some earlier message about not being able to find those headers. But it doesn't, which leaves me confused about what could be the cause of the error.

k2o0r commented 2 years ago

Could it be that the builds of OpenMM and PLUMED are not compatible somehow?

I did build openmm (and this attempted build) on a compute node, and PLUMED on a login node. Does it seem possible this is the source of the error?

peastman commented 2 years ago

I thought about it, but I can't see how that would cause this error. It's an error parsing the source code, not something in binary files. The error also doesn't seem to be anything specific to the PLUMED plugin. It's down in the OpenMM headers.

Is it possible your OpenMM install is somehow corrupt? Could some of the header files be missing or truncated?

k2o0r commented 2 years ago

When I run make test in OpenMM build directory everything passes (though it does seem quite slow for some reason), but not sure if this means that it's definitely not corrupt.

peastman commented 2 years ago

That's looking at compiled binaries. The important thing here is the header files that get installed when you build. Try looking at the specific files mentioned in the build output:

In file included from /home/kor20/miniconda3/envs/build_openmm/include/openmm/opencl/OpenCLExpressionUtilities.h:30,
                 from /home/kor20/miniconda3/envs/build_openmm/include/openmm/opencl/OpenCLContext.h:56,
                 from /home/kor20/openmm/openmm-plumed-1.0/platforms/opencl/src/OpenCLPlumedKernels.h:37,
                 from /home/kor20/openmm/openmm-plumed-1.0/platforms/opencl/src/OpenCLPlumedKernelFactory.cpp:35:

Compare those files to the ones that came from the source repository and make sure they're identical. Also check for the Lepton headers included at the top of OpenCLExpressionUtilities.h. Make sure they're present and not corrupt.

k2o0r commented 2 years ago

Sorry for post spam, but I've had some progress now.

Similar to issue https://github.com/openmm/openmm-plumed/issues/41, it seems there was some path confusion between include/lepton and include/plumed/lepton, as I understand it relates to having OpenMM and PLUMED installed in the same conda environment? Renaming the latter to include/plumed/lepton.bak solved the issues above related to the Lepton headers and now the build reaches 100% but some errors do appear and the plugin doesn't seem to be working. I've attached the new build output and the output from python import openmmplumed, which both have errors, below.

build_output_2.txt

py_output.txt

Thanks!

Edit: This output from ld -lcuda --verbose may be useful.

ld_verbose.txt

peastman commented 2 years ago

The linker output indicates it can't find libcuda.so. That library comes with the driver, not the toolkit, so make sure it's installed. For example, if you're working on a cluster that has GPUs on the compute nodes but not the login nodes, it's possible the driver hasn't been installed on the login nodes. If so, you won't be able to compile CUDA programs on them.

If that's not the problem, it may have to do with how you're compiling against a system tree installed inside your conda environment. It only seems to be looking in directories that are inside /home/kor20/miniconda3/envs/build_openmm/x86_64-conda-linux-gnu/sysroot. That isn't a location where the driver would have installed anything. For example, on my computer it's installed in /usr/lib/x86_64-linux-gnu. Can you find where it is on your computer? Set the CMake variable CUDA_cuda_driver_LIBRARY (found under advanced options) to point to it.

k2o0r commented 2 years ago

Huh, this is a bit weird. So it seems you are right and that compilation can't find libcuda.so. When I open up the ccmake GUI this is what I see.

CUDA_CUDART_LIBRARY /home/kor20/miniconda3/envs/build_openmm/lib/libcudart.so

CUDA_CUDA_LIBRARY /usr/lib64/libcuda.so

So, it's trying to access the libcuda.so in /usr/lib64 of the the cluster which is not the version of CUDA that this was compiled with. I wanted to find a version of libcuda.so in my conda environment, and there is /home/kor20/miniconda3/envs/build_openmm/lib/stubs/libcuda.so so I thought simply setting CUDA_CUDA_LIBRARY to this in ccmake GUI would work, but every time I configure after updating the path to libcuda.so (i.e. changing to CUDA_CUDA_LIBRARY /home/kor20/miniconda3/envs/build_openmm/lib/stubs/libcuda.so) it resets to /usr/lib64/libcuda.so, which then cannot be found during compilation.

peastman commented 2 years ago

I'm not sure exactly how it selects the value. https://cmake.org/cmake/help/latest/module/FindCUDA.html describes the logic it goes through to locate libraries, but CUDA_CUDA_LIBRARY is not one of the variables they list as being set, so I'm not sure what's going on. Still, that might help.

ndonyapour commented 1 year ago

Hi, Removing cuda from this line fixed the issue for me.