Closed KimBioInfoStudio closed 2 years ago
No. The torch_ccl should not link to the mkl library. Maybe the link option is polluted by linking to the ipex library.
Can you share the information of your setting up? Which version of the ipex is used? and the tool chain of the DPCPP you used.
python: intelpython_core python=3.7.11 pytroch: 1.7.0a0+14820ce ipex gpu version: 0.2.0gpu ipex gpu git sha: 7f7dce58d62d38750fe271a8d70827a144a7e165\ mkl: 2021.3 dpcppd: 2021.6.19 submodule onDNN was used
if i use ldd to check torch_ccl linked library, we can see mkl lib was linked to it
ldd _C.cpython-37m-x86_64-linux-gnu.so
linux-vdso.so.1 (0x00007f3bbc97e000)
libtorch_ccl.so => /home/kim/miniconda3/envs/exdpcpp/lib/python3.7/site-packages/torch_ccl-1.2.0+b0f9d1e-py3.7-linux-x86_64.egg/torch_ccl/./lib/libtorch_ccl.so (0x00007f3bbc90f000)
libc10.so => not found
libtorch.so => not found
libtorch_cpu.so => not found
libtorch_python.so => not found
libstdc++.so.6 => /home/kim/miniconda3/envs/exdpcpp/lib/libstdc++.so.6 (0x00007f3bbc788000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3bbc639000)
libgcc_s.so.1 => /home/kim/miniconda3/envs/exdpcpp/lib/libgcc_s.so.1 (0x00007f3bbc625000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f3bbc602000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3bbc410000)
libccl.so => /home/kim/miniconda3/envs/exdpcpp/lib/python3.7/site-packages/torch_ccl-1.2.0+b0f9d1e-py3.7-linux-x86_64.egg/torch_ccl/./lib/libccl.so (0x00007f3bbc0f2000)
libtorch.so => not found
libc10.so => not found
libtorch_cpu.so => not found
libmkl_intel_ilp64.so.1 => /home/kim/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_ilp64.so.1 (0x00007f3bbb3f8000)
libmkl_intel_thread.so.1 => /home/kim/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_thread.so.1 (0x00007f3bb7ade000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3bbc980000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f3bb7ad6000)
libfabric.so.1 => /home/kim/miniconda3/envs/exdpcpp/lib/python3.7/site-packages/torch_ccl-1.2.0+b0f9d1e-py3.7-linux-x86_64.egg/torch_ccl/./lib/libfabric.so.1 (0x00007f3bb788f000)
libmpi.so.12 => /home/kim/miniconda3/envs/exdpcpp/lib/python3.7/site-packages/torch_ccl-1.2.0+b0f9d1e-py3.7-linux-x86_64.egg/torch_ccl/./lib/libmpi.so.12 (0x00007f3bb5ef7000)
libsycl.so.5 => /home/kim/intel/oneapi/compiler/2021.6.19/compiler/linux/lib/libsycl.so.5 (0x00007f3bb5c3e000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f3bb5c31000)
libOpenCL.so.1 => /home/kim/intel/oneapi/compiler/2021.6.19/compiler/linux/lib/libOpenCL.so.1 (0x00007f3bb5c22000)
libsvml.so => /home/kim/intel/oneapi/compiler/2021.6.19/compiler/linux/compiler/lib/intel64_lin/libsvml.so (0x00007f3bb40a1000)
libirng.so => /home/kim/intel/oneapi/compiler/2021.6.19/compiler/linux/compiler/lib/intel64_lin/libirng.so (0x00007f3bb3d37000)
libimf.so => /home/kim/intel/oneapi/compiler/2021.6.19/compiler/linux/compiler/lib/intel64_lin/libimf.so (0x00007f3bb36af000)
libintlc.so.5 => /home/kim/intel/oneapi/compiler/2021.6.19/compiler/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x00007f3bb3437000)
I also faced a similar issue (unable to find mkl libraries while building)... I found that it tried to find mkl config for caffe2 dependencies and tries to link with mkl if cmake can find MKLConfig.cmake Found a workaround to disable it. export below env before building toch-ccl and it disables linking with mkl
export CMAKE_DISABLE_FIND_PACKAGE_MKL=TRUE
I also faced a similar issue (unable to find mkl libraries while building)... I found that it tried to find mkl config for caffe2 dependencies and tries to link with mkl if cmake can find MKLConfig.cmake Found a workaround to disable it. export below env before building toch-ccl and it disables linking with mkl
export CMAKE_DISABLE_FIND_PACKAGE_MKL=TRUE
Thanks for the hint about this. Need to check if the caffe2 should not export its internal dependencies to the depend chain because the torch-ccl only depends on torch.
Hi , @chengjunlu, we met a error when using ccl_torch1.7 with dpcpp as compute backend, did u met this before?