Open svenevs opened 2 years ago
We are working on updating CMake; first, we are trying to get it to work with GitHub actions. It looks to me that the installation instructions are not quite right. I've removed the use of find_package(CUDA), but it still has an issue finding/linking the cuda libraries. I'm wondering, though, if we should be using FindCUDAToolkit instead.
Currently, we are experimenting with it at, no successful build yet: https://github.com/brtnfld/vfd-gds
We are working on updating CMake; first, we are trying to get it to work with GitHub actions.
Always an adventure :slightly_smiling_face: Are CMake changes needed to get them running at all? It could be an environment problem, I don't see anything overtly wrong with the current system -- it's just the old cmake style.
I've removed the use of find_package(CUDA), but it still has an issue finding/linking the cuda libraries.
The process is a bit more involved, partly because you have to adopt some target-based approaches related to the packaging.
From what I can see this library needs to be able to (1) compile and link against the cuda runtime and (2) specifically cufile
. In the previous versions of CMake tactics you'd accumulate a list of include directories, libraries to link, and compiler definitions and store them all in a list. Since
is going to ultimately include / link against
where CUDA_INCLUDE_DIRS
and CUDA_LIBRARIES
came from find_package(CUDA)
. So to replace it using the target-based approach, you instead do (untested not on cuda machine)
find_package(CUDAToolkit)
# ... at some point after `add_library` aka this target exists ...
# note: you can `add_library(target-name "")` at the beginning of your CMake with no sources
# so that the target exists immediately and then do `target_sources` later on, mentioning because
# your dependencies search happens before the `add_subdirectory`
# https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html#cuda-toolkit-rt-lib
target_link_libraries(hdf5_vfd_gds PUBLIC CUDA::cudart) # or CUDA::cudart_static but less common
Similarly for cufile
you can CUDA::cuFile
(but this was added in 3.25 so that is your minimum version). If that's too recent you can keep the find_library
there but instead of
https://github.com/hpc-io/vfd-gds/blob/4906c76616ac1742f66176721052746181777828/CMakeLists.txt#L73
just do it after finding CUDAToolkit
and use CUDAToolkit_LIBRARY_DIR
for the search path (CUDA_TOOLKIT_ROOT_DIR
also comes from find_package(CUDA)
). That option (set(...CACHE)
) can probably just be removed?
Note that once you start linking against targets in your build system, consumers need it in your config file. find_dependency(CUDAToolkit)
, more info here.
Hope that helps clarify some!
Indeed, if I don't use PR #5 CMake changes, it compiles successfully. But when I try to run the tests it fails because it is looking for libcuda.so.1, but what is installed is:
libOpenCL.so libOpenCL.so.1 libOpenCL.so.1.0 libOpenCL.so.1.0.0 libcudadevrt.a libcudart.so libcudart.so.11.0 libcudart.so.11.8.89 libcudart_static.a libcufile.so libcufile.so.0 libcufile.so.1.4.0 libcufile_rdma.so libcufile_rdma.so.1 libcufile_rdma.so.1.4.0 libcufile_rdma_static.a libcufile_static.a libculibos.a
But when I try to run the tests it fails because it is looking for libcuda.so.1
Hmm, that does seem like an installation problem. That's a pretty critical library to be missing (that's the driver api, as opposed to the runtime api cudart
).
I'm not sure about the ubuntu packaging and the -nvcc
package specifically, the docs say to install the meta package. I think cuda-11-8
will also be available (worth testing in a docker image rather than debugging package names via GH actions :wink:).
Alternatively if you wanted to get fancy you might be able to download the runfile and cache the download rather than reinstalling the cuda packages each time. Or possibly use one of the cuda docker images?
it compiles successfully. But when I try to run the tests it fails
All that said, even if libcuda.so
ends up being installed / that is resolved -- can you run GPU / CUDA anything in the GH hosted runners? I don't think they have GPUs attached, and if a GPU is required to actually run the tests you'd need to have a CI setup that has one or host it.
Right, I don't think it works without a GPU attached. So we decided to leave GitHub actions to test if it builds only.
I'm not aware of any ECP machines which support GPU storage direct. Is there currently a system that E4S will be installed on that supports it that you know of?
I'm not aware of any ECP machines which support GPU storage direct. Is there currently a system that E4S will be installed on that supports it that you know of?
I am so sorry, it turns out this package was more in the "nice to have" category. I mistakenly thought one of the sites did have GDS enabled which is why I got so excited about just stamping a new release w/o any cmake changes :upside_down_face:
Redirect from https://github.com/hpc-io/vfd-gds/issues/8#issuecomment-1284521787
This project should consider
find_package(CUDA)
. You only need the cuda runtime for this project from a quick glance (so no need to haveproject(... LANGUAGES CUDA)
since no CUDA source files. The update process may be painful depending on what you were or were not using fromfind_package(CUDA)
. You should only need to link againstCUDA::cudart
from a cursory glance.hdf5-shared
shows up in the generated cmake config files that are installed. That may be problematic for an application that is consuminghdf5-static
? Unclear what the right solution is.NOTE 4
at the bottom of the linked code.