Closed J-StrawHat closed 3 months ago
@J-StrawHat such asymmetric configuration is currently not supported: the client is not supporting cuda memory so it's not able to figure a right response to the RTR message. We will aim to improve it in further releases.
Thank you for your response. Additionally, I would like to inquire about the best practices for handling this asymmetric configuration(host memory -> CUDA managed memory) in the current release version. I tested the Stream API and it seems to support this configuration. Any further recommendations or insights would be greatly appreciated.
I'd suggest trying to set UCX_RNDV_SCHEME=get_zcopy
or UCX_RNDV_THRESH=inf
Thank you so much
@J-StrawHat just to clarify, did any of the suggestion help, and if yes, which one?
After conducting several tests, I found that setting UCX_RNDV_SCHEME=get_zcopy
still results in the same error. However, setting UCX_RNDV_THRESH=inf
allows the program to run correctly. Additionally, compared to the Stream API, it demonstrates lower latency for large data transfers.
Describe the bug
Is it possible to use Rendezvous protocol to transfer data from host memory on a node (without GPUs) to CUDA managed memory on another node (with GPUs)?
Steps to Reproduce
Run codes: examples/ucp_client_server.c
Server (with GPUs)
Client (with GPUs)
UCX version used (release v1.16.0) + UCX configure flags
without setting
UCX_TLS
Setup and versions
Ubuntu 20.04.6 LTS
Linux xfusion5 5.4.0-174-generic #193-Ubuntu SMP Thu Mar 7 14:29:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
MLNX_OFED_LINUX-24.01-0.3.3.1
Tesla V100-PCIE-32GB
cuda_11.7.r11.7/compiler.31442593_0
550.54.15