openucx / ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
http://www.openucx.org
Other
1.14k stars 424 forks source link

UCX Perftest doesn't support CUDA memory even with UCX_PROTO_ENABLE=y #6571

Open Seth5141 opened 3 years ago

Seth5141 commented 3 years ago

Describe the bug

This is admittedly an enhancement request rather than a bug. On UCX version 1.10 where put/get operations support CUDA memory (when UCX_PROTO_ENABLE=y), the UCX perftest fails to run with CUDA memory.

Steps to Reproduce

./src/tools/perf/ucx_perftest [addr] -t ucp_put_bw -c 0 -m cuda -s 262144

Setup and versions

UCX 1.10 CUDA 11.2

Additional information (depending on the issue)

Apologies on the weird formatting here. Not sure what's causing it.

There appears to be a two-pronged issue here. First, there is a check in the code itself that prevents you from trying to run with CUDA memory. if ((params->api == UCX_PERF_API_UCP) && ((params->send_mem_type != UCS_MEMORY_TYPE_HOST) || (params->recv_mem_type != UCS_MEMORY_TYPE_HOST)) && ((params->command == UCX_PERF_CMD_PUT) || (params->command == UCX_PERF_CMD_GET) || (params->command == UCX_PERF_CMD_ADD) || (params->command == UCX_PERF_CMD_FADD) || (params->command == UCX_PERF_CMD_SWAP) || (params->command == UCX_PERF_CMD_CSWAP))) { / TODO: remove when support for non-HOST memory types will be added / if (params->flags & UCX_PERF_TEST_FLAG_VERBOSE) { ucs_error("UCP doesn't support RMA/AMO for \"%s\"<->\"%s\" memory types", ucs_memory_type_names[params->send_mem_type], ucs_memory_type_names[params->recv_mem_type]); } return UCS_ERR_INVALID_PARAM; } When this check is removed, I end up with the following error: +--------------+--------------+-----------------------------+---------------------+-----------------------+ | | | overhead (usec) | bandwidth (MB/s) | message rate (msg/s) | +--------------+--------------+---------+---------+---------+----------+----------+-----------+-----------+ | Stage | # iterations | typical | average | overall | average | overall | average | overall | +--------------+--------------+---------+---------+---------+----------+----------+-----------+-----------+ [gc02:20940:0:20940] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8) ==== backtrace (tid: 20940) ==== 0 /home/showell/ucx/src/ucs/.libs/libucs.so.0(ucs_handle_error+0x10c) [0x7f96af8c2f2c] 1 /home/showell/ucx/src/ucs/.libs/libucs.so.0(+0x282ac) [0x7f96af8c32ac] 2 /home/showell/ucx/src/ucs/.libs/libucs.so.0(+0x28524) [0x7f96af8c3524] 3 /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7f96af68e890] 4 /home/showell/ucx/src/ucp/.libs/libucp.so.0(ucp_rkey_pack+0x55) [0x7f96afd46895] 5 /home/showell/ucx/src/tools/perf/.libs/ucx_perftest(+0xb6c1) [0x5591c589f6c1] 6 /home/showell/ucx/src/tools/perf/.libs/ucx_perftest(+0xc1be) [0x5591c58a01be] 7 /home/showell/ucx/src/tools/perf/.libs/ucx_perftest(+0x72ad) [0x5591c589b2ad] 8 /home/showell/ucx/src/tools/perf/.libs/ucx_perftest(+0x4e7f) [0x5591c5898e7f] 9 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f96af07db97] 10 /home/showell/ucx/src/tools/perf/.libs/ucx_perftest(+0x540a) [0x5591c589940a]

bureddy commented 3 years ago

RMA with Cuda memory is not fully supported. can you with ucx master using flags -x UCX_ZCOPY_THRESH=0 -x UCX_TLS=rc

Seth5141 commented 3 years ago

@bureddy Thanks for your quick response.

Just a couple of follow up questions:

  1. Can you clarify a little bit more about -x UCX_ZCOPY_THRESH=0 -x UCX_TLS=rc are those arguments to ucx_perftest or configure?
  2. Can you elaborate on "RMA with Cuda memory is not fully supported"? While not completely cogent to this specific issue, I am really curious to know to what extent I should expect RMA with CUDA memory to work with UCX_PROTO_ENABLE=y on both master and v1.10.0. My end goal is to create a UCX transport for NVSHMEM that relies on ucp_put_nbx and ucp_get_nbx supporting CUDA memory.
bureddy commented 3 years ago

@bureddy Thanks for your quick response.

Just a couple of follow up questions:

  1. Can you clarify a little bit more about -x UCX_ZCOPY_THRESH=0 -x UCX_TLS=rc are those arguments to ucx_perftest or configure?

these are workaround to skip inline/bcopy protocols RMA put/get and use only IB WRIRE and READ.

  1. Can you elaborate on "RMA with Cuda memory is not fully supported"? While not completely cogent to this specific issue, I am really curious to know to what extent I should expect RMA with CUDA memory to work with UCX_PROTO_ENABLE=y on both master and v1.10.0. My end goal is to create a UCX transport for NVSHMEM that relies on ucp_put_nbx and ucp_get_nbx supporting CUDA memory. @yosefe ?
yosefe commented 3 years ago

UCX_PROTO_ENABLE=y is activating the experimental implementation of new protocols which would eventually have support for Cuda memory without specific tweaks. However, this is still in development, so not everything is supported yet. I would expect it to materialize towards the end of this year.