openucx / xccl

Other
22 stars 14 forks source link

NVCC cannot find UCS header files #117

Closed kingchc closed 3 years ago

kingchc commented 3 years ago

Error during make

...
nvcc -c xccl_cuda_reduce.cu -I/home/chchu/xccl-exp/src -I/home/chchu/xccl-exp/src/core --compiler-options -fno-rtti,-fno-exceptions -arch=sm_50 -gencode=arch=compute_37,code=sm_37 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -Xcompiler -fPIC -o .libs/xccl_cuda_reduce.o
In file included from /home/chchu/xccl-exp/src/api/xccl.h:11:0,
                 from xccl_cuda_reduce.cu:1:
/home/chchu/xccl-exp/src/api/xccl_tls.h:9:30: fatal error: ucs/config/types.h: No such file or directory
 #include <ucs/config/types.h>
                              ^
compilation terminated.
make[4]: *** [xccl_cuda_reduce.lo] Error 1

Command for building XCCL

$ ./autogen.sh && ./configure --prefix=$PWD/install \
    --with-ucx=${UCX_INSTALL_PATH}/install --with-cuda=/usr/local/cuda \
    CFLAGS="-I${UCX_INSTALL_PATH}/include" \
    CPPFLAGS="-I${UCX_INSTALL_PATH}/include" \
    CXXFLAGS="-I${UCX_INSTALL_PATH}/include" \
    && make

UCX was installed in a local directory, config options: ../configure --disable-logging --disable-debug --disable-assertions --disable-params-check --prefix=/home/chchu/tools/ucx/build/install --with-cuda=/usr/local/cuda

Workaround: adding ${UCX_CPPFLAGS} to NVCCFLAGS in Makefile.am and recompile.

diff --git a/src/utils/cuda/kernels/Makefile.am b/src/utils/cuda/kernels/Makefile.am
index c0ec059..f91a7d9 100644
--- a/src/utils/cuda/kernels/Makefile.am
+++ b/src/utils/cuda/kernels/Makefile.am
@@ -8,7 +8,7 @@
 #

 NVCC = nvcc
-NVCCFLAGS = "-I${XCCL_TOP_SRCDIR}/src -I${XCCL_TOP_SRCDIR}/src/core" --compiler-options -fno-rtti,-fno-exceptions
+NVCCFLAGS = "-I${XCCL_TOP_SRCDIR}/src -I${XCCL_TOP_SRCDIR}/src/core" --compiler-options -fno-rtti,-fno-exceptions ${UCX_CPPFLAGS}
 NV_ARCH_FLAGS = -arch=sm_50 \
                -gencode=arch=compute_37,code=sm_37 \
                -gencode=arch=compute_50,code=sm_50 \

Is it because the UCX is not installed in a default path? For such scenarios, should we apply a patch shown here or perhaps add a config time option to allow users specify addtional NVCCFLGAS?

Sergei-Lebedev commented 3 years ago

Hi @kingchc, you are right, but cuda component depends on UCS (not UCX). Can you please try following patch

diff --git a/src/utils/cuda/kernels/Makefile.am b/src/utils/cuda/kernels/Makefile.am
index c0ec059..c8014d2 100644
--- a/src/utils/cuda/kernels/Makefile.am
+++ b/src/utils/cuda/kernels/Makefile.am
@@ -8,7 +8,7 @@
 #

 NVCC = nvcc
-NVCCFLAGS = "-I${XCCL_TOP_SRCDIR}/src -I${XCCL_TOP_SRCDIR}/src/core" --compiler-options -fno-rtti,-fno-exceptions
+NVCCFLAGS = ${CPPFLAGS} --compiler-options -fno-rtti,-fno-exceptions
 NV_ARCH_FLAGS = -arch=sm_50 \
                -gencode=arch=compute_37,code=sm_37 \
                -gencode=arch=compute_50,code=sm_50 \
kingchc commented 3 years ago

@Sergei-Lebedev - Thanks for the quick response and explanation. I can confirm the patch you posted works for me.

kingchc commented 3 years ago

125 should resolve this.