maccallumlab / meld

Modeling with limited data
http://meldmd.org
Other
54 stars 28 forks source link

can't find CUB library #158

Open ccccclw opened 1 year ago

ccccclw commented 1 year ago

Hi @jlmaccal, I was able to successfully build latest MELD in OpenMM 8.0, but the "make test" failed with

exception: Error compiling program: default_program(689): catastrophic error: could not open source file "cub/cub.cuh" (no directories in search list)

include <cub/cub.cuh>

It seems like you also ran into this issue before. I searched around but couldn't solve it. Can you give some tips? Here is the set up:

cuda/12.2.2 gcc/12.2.0 openmpi/4.1.5 python/3.11 cmake/3.26.4 swig/3.0.8 doxygen/1.8.3.1 netcdf/4.2

I can see the path seems to be included already in the build folders /../plugin/build_cuda12/platforms/cuda/CMakeFiles/MeldPluginCUDA.dir/flags.make:

compile CXX with /apps/mpi/cuda/12.2.2/gcc/12.2.0/openmpi/4.1.5/bin/mpicxx

CXX_DEFINES = -DMeldPluginCUDA_EXPORTS

CXX_INCLUDES = ... -I/apps/compilers/cuda/12.2.2/include/cub -I/apps/compilers/cuda/12.2.2/include -isystem /apps/eigen3/3.3.3/include/eigen3 ...

CXX_FLAGS = -std=gnu++11 -fPIC -DOPENMM_BUILDING_SHARED_LIBRARY

jlmaccal commented 1 year ago

From what I remember, we need to force the use of the command line compiler by specifying a path on the command line. I can't remember if it's CUDA_COMPILER or OPENMM_CUDA_COMPILER, but it's something like that.

Alternatively, we could consider just packing the relevant cub files at the top of our kernel. In the OpenMM Peter Eastman suggests using the preprocessor to expand all of the relevant headers. These could then be inserted at the top of the kernel source. This could happen during build so that it uses whatever version of cub installed on the build machine.

Honestly, the whole cuda kernel could use some attention. There are some limitation on the number of restraints, etc. These could be worked around, but I don't have the energy to really focus on the code, so I'm reluctant to change anything.

ccccclw commented 1 year ago

Could you give more hints for how to do this "Alternatively, we could consider just packing the relevant cub files at the top of our kernel. In the OpenMM Peter Eastman suggests using the preprocessor to expand all of the relevant headers. These could then be inserted at the top of the kernel source. This could happen during build so that it uses whatever version of cub installed on the build machine."?

I tried to find all headers the sorting function MELD need and then append them in the createModule, but whichever header is added/loaded first, the headers needed inside still couldn't be located.

std::string cubcubHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/cub.cuh");
std::string cubblock_reduceHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/block/block_reduce.cuh");
std::string cubblock_reduce_rakingHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/block/specializations/block_reduce_raking.cuh");
std::string cubblock_reduce_raking_commutative_onlyHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/block/specializations/block_reduce_raking_commutative_only.cuh");
std::string cubblock_reduce_warp_reductionsHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/block/specializations/block_reduce_warp_reductions.cuh");
std::string cubconfigHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/config.cuh");
std::string cubutil_ptxHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_ptx.cuh");
std::string cubutil_typeHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_type.cuh");
std::string cubthread_operatorsHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/thread/thread_operators.cuh");
std::string cubutil_cpp_dialectHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_cpp_dialect.cuh");
std::string cubuninitialized_copyHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/detail/uninitialized_copy.cuh");
std::string cubutil_archHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_arch.cuh");
std::string cubutil_compilerHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_compiler.cuh");
std::string cubutil_deprecatedHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_deprecated.cuh");
std::string cubutil_macroHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_macro.cuh");
std::string cubutil_namespaceHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_namespace.cuh");
std::string cubutil_debugHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/util_debug.cuh");
std::string cubwarp_reduceHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/warp/warp_reduce.cuh");
std::string cubthread_reduceHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/thread/thread_reduce.cuh");
std::string cubtype_traitsHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/detail/type_traits.cuh");
std::string cubblock_raking_layoutHeaderContents = LoadHeaderFile("/apps/compilers/cuda/12.2.2/include/cub/block/block_raking_layout.cuh");

CUmodule module = cu.createModule(cu.replaceStrings(cubconfigHeaderContents
                                                    + cubblock_reduceHeaderContents 
                                                    + cubblock_reduce_rakingHeaderContents
                                                    + cubblock_reduce_raking_commutative_onlyHeaderContents
                                                    + cubblock_reduce_warp_reductionsHeaderContents
                                                    + cubutil_ptxHeaderContents
                                                    + cubutil_typeHeaderContents
                                                    + cubthread_operatorsHeaderContents
                                                    + cubutil_cpp_dialectHeaderContents
                                                    + cubuninitialized_copyHeaderContents
                                                    + cubutil_archHeaderContents
                                                    + cubutil_compilerHeaderContents
                                                    + cubutil_deprecatedHeaderContents
                                                    + cubutil_macroHeaderContents
                                                    + cubutil_namespaceHeaderContents
                                                    + cubutil_debugHeaderContents
                                                    + cubwarp_reduceHeaderContents
                                                    + cubthread_reduceHeaderContents
                                                    + cubtype_traitsHeaderContents
                                                    + cubblock_raking_layoutHeaderContents
                                                    + CudaMeldKernelSources::vectorOps + CudaMeldKernelSources::computeMeld, replacements), defines);
jlmaccal commented 1 year ago

To be honest, I'm not totally sure how to do this.

But, what does running cpp on #include <cub/cub.cuh> give? Does cpp need to be invoked recursively?

ccccclw commented 1 year ago

I tried cpp and it gave the header not founder error, which I think is because it's a cuda header? Then using nvcc -E preprocessor can generate a large file containing the definitions needed. BTW, meld now works for us (which is a little surprise to me since I haven't incorporate them into createModule) except some initial slurm complaints which doesn't affect simulation running.

mselensky commented 6 months ago

Hi @ccccclw , I've found myself encountering the same issue you describe. Just to clarify, did you preprocess computeMeld.cu via nvcc -E without modifying CUmodule in MeldCudaKernels.cpp? Or did you end up doing both?

I ask because when I try to simply preprocess computeMeld.cu and run a test, I'm at least able to get over the initial "cub/cub.cuh" (no directories in search list) problem, but then I get the following kind of error during runtime :

error: "cudaErrorLaunchFileScopedSurf" has already been declared in the current scope

I imagine it's because I didn't modify CUmodule and there is some kind of duplication that it takes care of that I'm ignoring. The reason I didn't change CUmodule following your example is because I get error: ‘LoadHeaderFile’ was not declared in this scope when I attempt to include similar headers in my build - would you mind sharing how you defined that? Sorry if that is a naive question, I'm new-ish to CUDA and very new to compiling CUDA-enabled code. Thanks in advance for any insight, and thanks @jlmaccal for providing a great library!

ccccclw commented 5 months ago

Hi @mselensky , I never run into that error when compiling meld in our local cluster. For the error I got, as mentioned above, I didn't modify the CUmodule and it somehow worked, then I didn't pay more attention after that. I was able to track back to which part of the code causes it, but not sure where you can modify in your case. If you can provide more details about how did you compile the program, what computer system does it compile on and compiler versions etc, I might have a better idea about where is the issue.

mselensky commented 5 months ago

Hi @ccccclw , thanks so much for your response. Here are the details you requested:

cuda_inc_prefix=$CUDA_ROOT_DIR/include/
sed -i "s|#include <cub/cub.cuh|#include <"$cuda_inc_prefix"/cub/cub.cuh|" ../platforms/cuda/src/kernels/computeMeld.cu 

So to reiterate, you ran nvcc -E to preprocess computeMeld.cu , and then things worked? If so, that would make sense to me, as I'd think all of the CUDA code would just be directly accessible in that file and wouldn't require any jit compilation or modification of the CUmodule. The puzzling thing in my case is that I can preprocess the code without an issue, but then I get a ton of 'error: "THING" has already been declared in the current scope' messages when trying to run the application, which suggests some kind of duplication in the scope, but I'm struggling to see where it's coming from, and why I am seeing those errors but you did not.

Thanks in advance, I really appreciate your help and look forward to hearing your thoughts! Let me know if I can send you anything else that might be helpful.

ccccclw commented 4 months ago

Hi @mselensky, thanks for providing the detailed build process, what I did here is that I only ran the nvcc -E on #include <cub/cub.cuh> which printed all the dependent headers. I think it's similar issue for your case, so a possible solution is the one suggested by @jlmaccal

"Alternatively, we could consider just packing the relevant cub files at the top of our kernel. In the OpenMM Peter Eastman suggests using the preprocessor to expand all of the relevant headers. These could then be inserted at the top of the kernel source. This could happen during build so that it uses whatever version of cub installed on the build machine."

mselensky commented 4 months ago

Hi @ccccclw , thanks very much for your response and suggestions. To clarify, when you say you only preprocessed #include <cub/cub.cuh>, did you do something like this? If not, would you kindly share the command you used?

nvcc -E -Dinclude="#include <cub/cub.cuh>" kernels/computeMeld.cu ...

I unfortunately still get the fatal compilation error '"THING" has already been declared in the current scope' when trying this.

Thanks again in advance!

ccccclw commented 4 months ago

Yes, I cannot remember the exact command I used back then, but should be the same with what you have here. I didn't have this error, and the output is the paths for all dependent headers of <cub/cub.cuh>.