Open Akshay-Venkatesh opened 4 years ago
@Akshay-Venkatesh can you point me intercepting code in these libs? I could not find it.
probably cuda runtime func symbol names might have mangled by C++ compiler in these libs. ucx might have failed to intercept these mangled sym names with dlsym(). can you check if cuda* symbol names mangled in these libraries?
@Akshay-Venkatesh can you point me intercepting code in these libs? I could not find it.
@marsaev can you point to the interception code?
probably cuda runtime func symbol names might have mangled by C++ compiler in these libs. ucx might have failed to intercept these mangled sym names with dlsym(). can you check if cuda* symbol names mangled in these libraries?
Will try and find and reply back here
can you also check if these libs are statically linked to libcudart_static.a? I see following in the instructions
Build and install librmm using cmake & make. CMake depends on the nvcc executable being on your path or defined in $CUDACXX.
AFAIK, nvcc statically links cuda runtime, if yes, we have known issues in intercepting in these cases
@Akshay-Venkatesh https://github.com/NVIDIA/AMGX/blob/b3101ffaaddee71c32ad53a151cf0e87a31b59a8/base/include/global_thread_handle.h#L232 Just to be clear - we intercept calls on source level in the separate namespace, internally we call same cudaMalloc from global namespace that is intended to call CUDA runtime (like this: https://github.com/NVIDIA/AMGX/blob/b3101ffaaddee71c32ad53a151cf0e87a31b59a8/base/src/global_thread_handle.cu#L892). We do not redefine symbols on global level or do LD_PRELOAD. Static CUDA runtime is linked by default (handled by cmake).
I tried to link AMGX with the shared CUDA Runtime and can confirm that this is indeed the issue. When the static CUDA Runtime is avoided the UCX pointer cache no longer causes issues with AMGX.
can you also check if these libs are statically linked to libcudart_static.a? I see following in the instructions
Build and install librmm using cmake & make. CMake depends on the nvcc executable being on your path or defined in $CUDACXX.
AFAIK, nvcc statically links cuda runtime, if yes, we have known issues in intercepting in these cases
@bureddy As @jirikraus confirmed, this does fall into the known issue case. Can you link that github issue here so that we can close both when the issue gets fixed?
related to https://github.com/openucx/ucx/issues/3210
@bureddy please bring this up during f2f. The issue of static builds is not going to disappear .....
@bureddy @yosefe
Users of applications which have their own memory pools (which internally intercept cudaMalloc for example) have seen crashes with memtype cache. Some examples include users of rapidsai/rmm, and AMGX. I know that recent changes to memtype cache is supposed to handle allocations before ucp_init but are interceptions by other agents also handled?
cc @abellina