openucx / ucc

Unified Collective Communication Library
https://openucx.github.io/ucc/
BSD 3-Clause "New" or "Revised" License
204 stars 96 forks source link

ucc compilation is getting failed on master branch #1031

Open shashank-parsi opened 2 weeks ago

shashank-parsi commented 2 weeks ago

Hello All, I see there is an compilation issue on ucc with master branch.

steps followed:

  1. git clone https://github.com/openucx/ucc.git
  2. cd ucc
  3. ./autogen.sh
  4. ./configure --prefix= --with-ucx= --with-rocm= --enable-gtest
  5. make -j

issue seen: make[3]: Entering directory '/home/master/rastra/rocm_tests/hipmpi/ucc/src/components/tl/ucp' CC libucc_tl_ucp_la-tl_ucp_dpu_offload.lo tl_ucp_dpu_offload.c: In function ‘ucc_tl_ucp_allreduce_sliding_window_register’: tl_ucp_dpu_offload.c:18:35: error: ‘UCP_MEM_MAP_PARAM_FIELD_EXPORTED_MEMH_BUFFER’ undeclared (first use in this function) 18 | params.field_mask = UCP_MEM_MAP_PARAM_FIELD_EXPORTED_MEMH_BUFFER; | ^~~~~~~~~~~~ tl_ucp_dpu_offload.c:18:35: note: each undeclared identifier is reported only once for each function it appears in tl_ucp_dpu_offload.c:19:11: error: ‘ucp_mem_map_params_t’ {aka ‘struct ucp_mem_map_params’} has no member named ‘exported_memh_buffer’ 19 | params.exported_memh_buffer = packed_memh; | ^ make[3]: [Makefile:1242: libucc_tl_ucp_la-tl_ucp_dpu_offload.lo] Error 1 make[3]: Leaving directory '/home/master/rastra/rocm_tests/hipmpi/ucc/src/components/tl/ucp' make[2]: [Makefile:1592: install-recursive] Error 1 make[2]: Leaving directory '/home/master/rastra/rocm_tests/hipmpi/ucc/src/components/tl/ucp' make[1]: [Makefile:1409: install-recursive] Error 1 make[1]: Leaving directory '/home/master/rastra/rocm_tests/hipmpi/ucc/src' make: [Makefile:576: install-recursive] Error 1

NOTE: issue is not seen with branch v1.3.x

Test enviromnent:

  1. Distro: RHEL 9.4/SLES 15 SP5
  2. OS: Linux
  3. AMD ROCm stack installed
Sergei-Lebedev commented 2 weeks ago

What version of UCX do you use? @nsarka I think we don't check for ucp mem_map param features when building dpu plugin, can you please check? cc @janjust

shashank-parsi commented 2 weeks ago

i'm cloning ucx version as below git clone https://github.com/openucx/ucx.git -b v1.13.x

Sergei-Lebedev commented 2 weeks ago

i'm cloning ucx version as below git clone https://github.com/openucx/ucx.git -b v1.13.x

Thanks, we will work on the fix. Meanwhile you can use v1.15 or newer

nsarka commented 2 weeks ago

PR open to fix this issue here: https://github.com/openucx/ucc/pull/1032

shashank-parsi commented 1 week ago

Hello @nsarka , may i know when this PR will be merged?