rapidsai / rmm

RAPIDS Memory Manager
https://docs.rapids.ai/api/rmm/stable/
Apache License 2.0
485 stars 195 forks source link

[FEA] rmm should stop prescribing cudart linking #1320

Open vyasr opened 1 year ago

vyasr commented 1 year ago

Is your feature request related to a problem? Please describe. Currently when librmm is built, it accepts a CMake option CUDA_STATIC_RUNTIME to determine whether to link to cudart statically or dynamically. However, librmm is header-only, so really what this controls is the behavior of the CMake link interface when dependent libraries consume librmm. The flag is sufficient if a user is building their entire stack locally, but if they are building some library that depends on librmm but they want to use a prepackaged version, the current setup means that they will be constrained by librmm's requirement unless they manually patch the rmm target in their own CMake.

Describe the solution you'd like Since librmm is header-only, there is no reason for it to define a linking behavior here. That logic should be removed entirely from librmm and be left up to consumers.

Describe alternatives you've considered None

Additional context This change will break consumers who were previously not specifying any cudart linking behavior at all and were relying on librmm's settings. Prior to making this change, we should audit potential consumers for that issue. Some likely candidates for needing this change are xgboost and ucx.

jakirkham commented 1 year ago

cc @robertmaynard @hcho3 (as we discussed this previously)

trivialfis commented 1 year ago

From a packaging standpoint, cudart is a propagated input from RMM for downstream projects. Since cudart is part of RMM's dependencies, then RMM should specify as such in my opinion. For instance, numpy is a dependency of XGBoost, we won't remove it from the toml project specification even if there are users who want to have a customized numpy that conflicts with numpy. Python scripts can be considered "header-only" since the source is distributed and can be re-compiled to byte code. IMHO, this is more an issue that should be resolved by the dependency manager, which decides what dependency to use/prioritize as default in the software distribution.

That said, I'm fine with whatever decision for improving RMM's usability. It's just a small conceptual issue that I want to raise before merging the change.

robertmaynard commented 1 year ago

I have been looking if we could propagate the cudart requirements only if the consuming library has not specified what version it wants via linking to CUDA::cudart or CUDA::cudart_static. It looks possible!

So that would allow us to effectively state a default ( I vote for static ), which also allowing consumers to override. Would we want to go with this approach instead?

vyasr commented 1 year ago

Since cudart is part of RMM's dependencies, then RMM should specify as such in my opinion.

I mostly agree with this. However:

IMHO, this is more an issue that should be resolved by the dependency manager, which decides what dependency to use/prioritize as default in the software distribution.

The problem we're trying to solve is that something like this is not viable. If rmm specifies static linking that propagates unconditionally to a consuming library. To allow the dependency manager to resolve this, you would need to build two separate rmm packages so that consuming libraries could request that the correct one be downloaded by a package manager during dependency resolution before the CMake build starts (at which point it's already too late).

Another option would be to instead provide multiple targets like rmm::rmm_cuda_static and rmm::rmm_cuda_dynamic so that rmm still encapsulates the necessary information but passes the choice onto the user. If we want to default to static, we could have rmm::rmm be static and then only need to add one extra target rmm::rmm_cuda_dynamic. Thoughts?

I have been looking if we could propagate the cudart requirements only if the consuming library has not specified what version it wants via linking to CUDA::cudart or CUDA::cudart_static. It looks possible!

That sounds even better. Is there any ordering imposed there though? i.e. would it matter whether find_package(rmm) was executed before or after the consuming library specified target_link_libraries(CUDA::cudart)?

jakirkham commented 1 year ago

So that would allow us to effectively state a default ( I vote for static ), which also allowing consumers to override. Would we want to go with this approach instead?

How would this look? What do consumers need to do to override? Also which consumers (other CMake users, package builders, etc.)?

trivialfis commented 1 year ago

To allow the dependency manager to resolve this, you would need to build two separate rmm packages so that consuming libraries could request that the correct one be downloaded by a package manager during dependency resolution before the CMake build starts (at which point it's already too late).

Thank you for the reply. Actually, this seems reasonable to me. It's not uncommon for package distributions to have multiple compile-time variants. You can optionally enable components in llvm during compilation, including whether to build MLIR, whether to have runtime type information, etc. A distribution (like debian, nix) would usually choose a default configuration for all other dependencies within that distribution. Emacs has a tons of variants.

For the case of RMM, I think it can continue to keep the compile time CMake option CUDA_STATIC_RUNTIME for specifying which cudart to use, then conda packagers choose a default (like static) for all downstream projects within the conda-forge and rapidsai channel. A variant with shared cudart can be built and uploaded if necessary, but as a leaf package and not used for any other default conda packages (like the cuDF in the rapidsai channel).

robertmaynard commented 1 year ago

That sounds even better. Is there any ordering imposed there though? i.e. would it matter whether find_package(rmm) was executed before or after the consuming library specified target_link_libraries(CUDA::cudart)?

No ordering issues, as the evaluation occurs entirely in CMake generator expressions. The issue is that we can only scan for a direct usage of CUDA::cudart ( or CUDA::cudart_static ).

How would this look? What do consumers need to do to override? Also which consumers (other CMake users, package builders, etc.)?

Any consumer that uses CMake would get this behavior since the logic would be encoded in the CMake rmm::rmm target. Basically what we can do is tell CMake that if any consuming library uses rmm::rmm and doesn't also use CUDA::cudart or CUDA::cudart_static it should get the 'default' cudart version that rmm::rmm was built with.

Approach

  find_package(rmm REQUIRED)
  find_package(CUDAToolkit REQUIRED) # order doesn't matter

  add_executable(uses_cudart_static memory_test.cu)
  target_link_libraries(uses_cudart_static PRIVATE rmm::rmm  CUDA::cudart_static) # order doesn't matter

  add_executable(uses_cudart_shared memory_test.cu)
  target_link_libraries(uses_cudart_shared PRIVATE rmm::rmm  CUDA::cudart_shared) # order doesn't matter

Since the executables are linking to both rmm:: and CUDA::cudart* targets the generator expression embedded in rmm::rmm detects this and ensures it doesn't add a CUDA::cudart dependency ( or CUDA::cudart_static ).

Issue with this approach

The issue with this approach is that it only works with direct usages of CUDA::cudart or CUDA::cudart_static. By that I mean that the following use-case won't work and rmm::rmm would think no linking to CUDA::cudart or CUDA::cudart_static is occuring and would inject it own.


add_library(cudart_via_proxy INTERFACE)
target_link_libraries(cudart_via_proxy INTERFACE CUDA::cudart)

add_executable(uses_cudart memory_test.cu)
target_link_libraries(uses_cudart PRIVATE rmm::rmm cudart_via_proxy)
robertmaynard commented 1 month ago

I have been reviewing this again. The biggest issue I see is how dynamic_load_runtime.hpp optimizes the function binding so that in static builds we can remove a function indirection.

This is a big deal as a pure static build of a project won't have a libcudart.so so we currently use compile time defines to determine what code paths to take. I believe to properly drop CUDA_STATIC_RUNTIME we/I need to first refactor dynamic_load_runtime.hpp to work without any defines and figure out at runtime when libcudart.a was used by checking for already loaded CUDA runtime symbols.

vyasr commented 4 weeks ago

Presumably the idea would be to use dlsym(..., RTLD_DEFAULT) to check whether symbols are already available in the image without any dlopen?

I'm not sure how limiting the proxy issue you indicated above is really going to be. I don't think anything in RAPIDS is specifically relying on that kind of usage right now, but it does seem like a pretty common use case.

vyasr commented 4 weeks ago

Presumably the idea would be to use dlsym(..., RTLD_DEFAULT) to check whether symbols are already available in the image without any dlopen?

1667 tells me the answer is yes 😄