can't find CUB library - Githubissues

ccccclw commented 1 year ago

Hi @jlmaccal, I was able to successfully build latest MELD in OpenMM 8.0, but the "make test" failed with

exception: Error compiling program: default_program(689): catastrophic error: could not open source file "cub/cub.cuh" (no directories in search list)

include <cub/cub.cuh>

It seems like you also ran into this issue before. I searched around but couldn't solve it. Can you give some tips? Here is the set up:

cuda/12.2.2 gcc/12.2.0 openmpi/4.1.5 python/3.11 cmake/3.26.4 swig/3.0.8 doxygen/1.8.3.1 netcdf/4.2

I can see the path seems to be included already in the build folders /../plugin/build_cuda12/platforms/cuda/CMakeFiles/MeldPluginCUDA.dir/flags.make:

compile CXX with /apps/mpi/cuda/12.2.2/gcc/12.2.0/openmpi/4.1.5/bin/mpicxx

CXX_DEFINES = -DMeldPluginCUDA_EXPORTS

CXX_INCLUDES = ... -I/apps/compilers/cuda/12.2.2/include/cub -I/apps/compilers/cuda/12.2.2/include -isystem /apps/eigen3/3.3.3/include/eigen3 ...

CXX_FLAGS = -std=gnu++11 -fPIC -DOPENMM_BUILDING_SHARED_LIBRARY

jlmaccal commented 1 year ago

From what I remember, we need to force the use of the command line compiler by specifying a path on the command line. I can't remember if it's CUDA_COMPILER or OPENMM_CUDA_COMPILER, but it's something like that.

Alternatively, we could consider just packing the relevant cub files at the top of our kernel. In the OpenMM Peter Eastman suggests using the preprocessor to expand all of the relevant headers. These could then be inserted at the top of the kernel source. This could happen during build so that it uses whatever version of cub installed on the build machine.

Honestly, the whole cuda kernel could use some attention. There are some limitation on the number of restraints, etc. These could be worked around, but I don't have the energy to really focus on the code, so I'm reluctant to change anything.

ccccclw commented 1 year ago

Could you give more hints for how to do this "Alternatively, we could consider just packing the relevant cub files at the top of our kernel. In the OpenMM Peter Eastman suggests using the preprocessor to expand all of the relevant headers. These could then be inserted at the top of the kernel source. This could happen during build so that it uses whatever version of cub installed on the build machine."?

I tried to find all headers the sorting function MELD need and then append them in the createModule, but whichever header is added/loaded first, the headers needed inside still couldn't be located.

std::string cubcubHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/cub.cuh");
std::string cubblock_reduceHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/block/block_reduce.cuh");
std::string cubblock_reduce_rakingHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/block/specializations/block_reduce_raking.cuh");
std::string cubblock_reduce_raking_commutative_onlyHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/block/specializations/block_reduce_raking_commutative_only.cuh");
std::string cubblock_reduce_warp_reductionsHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/block/specializations/block_reduce_warp_reductions.cuh");
std::string cubconfigHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/config.cuh");
std::string cubutil_ptxHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_ptx.cuh");
std::string cubutil_typeHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_type.cuh");
std::string cubthread_operatorsHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/thread/thread_operators.cuh");
std::string cubutil_cpp_dialectHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_cpp_dialect.cuh");
std::string cubuninitialized_copyHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/detail/uninitialized_copy.cuh");
std::string cubutil_archHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_arch.cuh");
std::string cubutil_compilerHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_compiler.cuh");
std::string cubutil_deprecatedHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_deprecated.cuh");
std::string cubutil_macroHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_macro.cuh");
std::string cubutil_namespaceHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_namespace.cuh");
std::string cubutil_debugHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_debug.cuh");
std::string cubwarp_reduceHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/warp/warp_reduce.cuh");
std::string cubthread_reduceHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/thread/thread_reduce.cuh");
std::string cubtype_traitsHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/detail/type_traits.cuh");
std::string cubblock_raking_layoutHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/block/block_raking_layout.cuh");

CUmodule module = cu.createModule(cu.replaceStrings(cubconfigHeaderContents
                                                    + cubblock_reduceHeaderContents 
                                                    + cubblock_reduce_rakingHeaderContents
                                                    + cubblock_reduce_raking_commutative_onlyHeaderContents
                                                    + cubblock_reduce_warp_reductionsHeaderContents
                                                    + cubutil_ptxHeaderContents
                                                    + cubutil_typeHeaderContents
                                                    + cubthread_operatorsHeaderContents
                                                    + cubutil_cpp_dialectHeaderContents
                                                    + cubuninitialized_copyHeaderContents
                                                    + cubutil_archHeaderContents
                                                    + cubutil_compilerHeaderContents
                                                    + cubutil_deprecatedHeaderContents
                                                    + cubutil_macroHeaderContents
                                                    + cubutil_namespaceHeaderContents
                                                    + cubutil_debugHeaderContents
                                                    + cubwarp_reduceHeaderContents
                                                    + cubthread_reduceHeaderContents
                                                    + cubtype_traitsHeaderContents
                                                    + cubblock_raking_layoutHeaderContents
                                                    + CudaMeldKernelSources::vectorOps + CudaMeldKernelSources::computeMeld, replacements), defines);

jlmaccal commented 1 year ago

To be honest, I'm not totally sure how to do this.

But, what does running cpp on #include <cub/cub.cuh> give? Does cpp need to be invoked recursively?

ccccclw commented 1 year ago

I tried cpp and it gave the header not founder error, which I think is because it's a cuda header? Then using nvcc -E preprocessor can generate a large file containing the definitions needed. BTW, meld now works for us (which is a little surprise to me since I haven't incorporate them into createModule) except some initial slurm complaints which doesn't affect simulation running.

mselensky commented 6 months ago

Hi @ccccclw , I've found myself encountering the same issue you describe. Just to clarify, did you preprocess computeMeld.cu via nvcc -E without modifying CUmodule in MeldCudaKernels.cpp? Or did you end up doing both?

I ask because when I try to simply preprocess computeMeld.cu and run a test, I'm at least able to get over the initial "cub/cub.cuh" (no directories in search list) problem, but then I get the following kind of error during runtime :

error: "cudaErrorLaunchFileScopedSurf" has already been declared in the current scope

I imagine it's because I didn't modify CUmodule and there is some kind of duplication that it takes care of that I'm ignoring. The reason I didn't change CUmodule following your example is because I get error: ‘LoadHeaderFile’ was not declared in this scope when I attempt to include similar headers in my build - would you mind sharing how you defined that? Sorry if that is a naive question, I'm new-ish to CUDA and very new to compiling CUDA-enabled code. Thanks in advance for any insight, and thanks @jlmaccal for providing a great library!

ccccclw commented 6 months ago

Hi @mselensky , I never run into that error when compiling meld in our local cluster. For the error I got, as mentioned above, I didn't modify the CUmodule and it somehow worked, then I didn't pay more attention after that. I was able to track back to which part of the code causes it, but not sure where you can modify in your case. If you can provide more details about how did you compile the program, what computer system does it compile on and compiler versions etc, I might have a better idea about where is the issue.

mselensky commented 5 months ago

Hi @ccccclw , thanks so much for your response. Here are the details you requested:

System: H100 NVIDIA GPU node running RHEL 8.8
Compilers: GNU Make 4.2.1 (and nvcc from cudatoolkit 12.3.107)
General approach: The goal is to run meld from inside of a custom conda environment. To that end, I first create a conda environment with Python=3.10, Metis, doxygen, libiconv, ambertools, numpy, scipy, scikit-learn, cython, swig, and netcdf4 as dependencies. Next, I compile OpenMM into the conda environment using our cuda/12.3 software module without issue (it passes all unit tests). I also install the Eigen3 headers into the same conda environment without a problem. From there, I install meld as directed. Where things go wrong is when I try to actually run something, and I get the notorious "cub/cub.cuh" (no directories in search list). I have narrowed the problem down to the runtime compilation of computeMeld.cu, because when I manually add in an absolute path to the cub/cub.cuh from our CUDA module like I do below and re-install, that specific error goes away, and I predictably get the similar error "config.cuh" (no directories in search list), as I did not provide the absolute path to that one. (And obviously pasting in the absolute paths for every single possible header is not a real solution, I just wanted to verify that's where things are going wrong.)

cuda_inc_prefix=$CUDA_ROOT_DIR/include/
sed -i "s|#include <cub/cub.cuh|#include <"$cuda_inc_prefix"/cub/cub.cuh|" ../platforms/cuda/src/kernels/computeMeld.cu

So to reiterate, you ran nvcc -E to preprocess computeMeld.cu , and then things worked? If so, that would make sense to me, as I'd think all of the CUDA code would just be directly accessible in that file and wouldn't require any jit compilation or modification of the CUmodule. The puzzling thing in my case is that I can preprocess the code without an issue, but then I get a ton of 'error: "THING" has already been declared in the current scope' messages when trying to run the application, which suggests some kind of duplication in the scope, but I'm struggling to see where it's coming from, and why I am seeing those errors but you did not.

Thanks in advance, I really appreciate your help and look forward to hearing your thoughts! Let me know if I can send you anything else that might be helpful.

ccccclw commented 5 months ago

Hi @mselensky, thanks for providing the detailed build process, what I did here is that I only ran the nvcc -E on #include <cub/cub.cuh> which printed all the dependent headers. I think it's similar issue for your case, so a possible solution is the one suggested by @jlmaccal

"Alternatively, we could consider just packing the relevant cub files at the top of our kernel. In the OpenMM Peter Eastman suggests using the preprocessor to expand all of the relevant headers. These could then be inserted at the top of the kernel source. This could happen during build so that it uses whatever version of cub installed on the build machine."

mselensky commented 5 months ago

Hi @ccccclw , thanks very much for your response and suggestions. To clarify, when you say you only preprocessed #include <cub/cub.cuh>, did you do something like this? If not, would you kindly share the command you used?

nvcc -E -Dinclude="#include <cub/cub.cuh>" kernels/computeMeld.cu ...

I unfortunately still get the fatal compilation error '"THING" has already been declared in the current scope' when trying this.

Thanks again in advance!

ccccclw commented 4 months ago

Yes, I cannot remember the exact command I used back then, but should be the same with what you have here. I didn't have this error, and the output is the paths for all dependent headers of <cub/cub.cuh>.

maccallumlab / meld

can't find CUB library #158

include <cub/cub.cuh>

compile CXX with /apps/mpi/cuda/12.2.2/gcc/12.2.0/openmpi/4.1.5/bin/mpicxx