hpc-io / vfd-gds

Other
16 stars 11 forks source link

cmake updates worth considering #9

Open svenevs opened 2 years ago

svenevs commented 2 years ago

Redirect from https://github.com/hpc-io/vfd-gds/issues/8#issuecomment-1284521787

This project should consider

  1. Stop using the deprecated find_package(CUDA). You only need the cuda runtime for this project from a quick glance (so no need to have project(... LANGUAGES CUDA) since no CUDA source files. The update process may be painful depending on what you were or were not using from find_package(CUDA). You should only need to link against CUDA::cudart from a cursory glance.
  2. Right now a hard dependency on hdf5-shared shows up in the generated cmake config files that are installed. That may be problematic for an application that is consuming hdf5-static? Unclear what the right solution is.
  3. The header file directory is not included in the interface library include directories for the generated cmake config files that are installed -- see NOTE 4 at the bottom of the linked code.
brtnfld commented 2 years ago

We are working on updating CMake; first, we are trying to get it to work with GitHub actions. It looks to me that the installation instructions are not quite right. I've removed the use of find_package(CUDA), but it still has an issue finding/linking the cuda libraries. I'm wondering, though, if we should be using FindCUDAToolkit instead.

Currently, we are experimenting with it at, no successful build yet: https://github.com/brtnfld/vfd-gds

svenevs commented 2 years ago

We are working on updating CMake; first, we are trying to get it to work with GitHub actions.

Always an adventure :slightly_smiling_face: Are CMake changes needed to get them running at all? It could be an environment problem, I don't see anything overtly wrong with the current system -- it's just the old cmake style.

I've removed the use of find_package(CUDA), but it still has an issue finding/linking the cuda libraries.

The process is a bit more involved, partly because you have to adopt some target-based approaches related to the packaging.

From what I can see this library needs to be able to (1) compile and link against the cuda runtime and (2) specifically cufile. In the previous versions of CMake tactics you'd accumulate a list of include directories, libraries to link, and compiler definitions and store them all in a list. Since

https://github.com/hpc-io/vfd-gds/blob/4906c76616ac1742f66176721052746181777828/src/CMakeLists.txt#L36

is going to ultimately include / link against

https://github.com/hpc-io/vfd-gds/blob/4906c76616ac1742f66176721052746181777828/CMakeLists.txt#L85-L93

where CUDA_INCLUDE_DIRS and CUDA_LIBRARIES came from find_package(CUDA). So to replace it using the target-based approach, you instead do (untested not on cuda machine)

find_package(CUDAToolkit)
# ... at some point after `add_library` aka this target exists ...
# note: you can `add_library(target-name "")` at the beginning of your CMake with no sources
# so that the target exists immediately and then do `target_sources` later on, mentioning because
# your dependencies search happens before the `add_subdirectory`
# https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html#cuda-toolkit-rt-lib
target_link_libraries(hdf5_vfd_gds PUBLIC CUDA::cudart)  # or CUDA::cudart_static but less common

Similarly for cufile you can CUDA::cuFile (but this was added in 3.25 so that is your minimum version). If that's too recent you can keep the find_library there but instead of

https://github.com/hpc-io/vfd-gds/blob/4906c76616ac1742f66176721052746181777828/CMakeLists.txt#L73

just do it after finding CUDAToolkit and use CUDAToolkit_LIBRARY_DIR for the search path (CUDA_TOOLKIT_ROOT_DIR also comes from find_package(CUDA)). That option (set(...CACHE)) can probably just be removed?

Note that once you start linking against targets in your build system, consumers need it in your config file. find_dependency(CUDAToolkit), more info here.

Hope that helps clarify some!

brtnfld commented 2 years ago

Indeed, if I don't use PR #5 CMake changes, it compiles successfully. But when I try to run the tests it fails because it is looking for libcuda.so.1, but what is installed is:

libOpenCL.so libOpenCL.so.1 libOpenCL.so.1.0 libOpenCL.so.1.0.0 libcudadevrt.a libcudart.so libcudart.so.11.0 libcudart.so.11.8.89 libcudart_static.a libcufile.so libcufile.so.0 libcufile.so.1.4.0 libcufile_rdma.so libcufile_rdma.so.1 libcufile_rdma.so.1.4.0 libcufile_rdma_static.a libcufile_static.a libculibos.a

svenevs commented 2 years ago

But when I try to run the tests it fails because it is looking for libcuda.so.1

Hmm, that does seem like an installation problem. That's a pretty critical library to be missing (that's the driver api, as opposed to the runtime api cudart).

https://github.com/hpc-io/vfd-gds/blob/69d0c16e7c09704724bf5a6d9e8452baf52e5371/.github/workflows/main.yml#L25

I'm not sure about the ubuntu packaging and the -nvcc package specifically, the docs say to install the meta package. I think cuda-11-8 will also be available (worth testing in a docker image rather than debugging package names via GH actions :wink:).

Alternatively if you wanted to get fancy you might be able to download the runfile and cache the download rather than reinstalling the cuda packages each time. Or possibly use one of the cuda docker images?

it compiles successfully. But when I try to run the tests it fails

All that said, even if libcuda.so ends up being installed / that is resolved -- can you run GPU / CUDA anything in the GH hosted runners? I don't think they have GPUs attached, and if a GPU is required to actually run the tests you'd need to have a CI setup that has one or host it.

brtnfld commented 2 years ago

Right, I don't think it works without a GPU attached. So we decided to leave GitHub actions to test if it builds only.

I'm not aware of any ECP machines which support GPU storage direct. Is there currently a system that E4S will be installed on that supports it that you know of?

svenevs commented 2 years ago

I'm not aware of any ECP machines which support GPU storage direct. Is there currently a system that E4S will be installed on that supports it that you know of?

I am so sorry, it turns out this package was more in the "nice to have" category. I mistakenly thought one of the sites did have GDS enabled which is why I got so excited about just stamping a new release w/o any cmake changes :upside_down_face: