intel / torch-ccl

oneCCL Bindings for Pytorch*
BSD 3-Clause "New" or "Revised" License
86 stars 25 forks source link

mkl undefined symbol #29

Closed KimBioInfoStudio closed 2 years ago

KimBioInfoStudio commented 3 years ago

Hi , @chengjunlu, we met a error when using ccl_torch1.7 with dpcpp as compute backend, did u met this before?

Traceback (most recent call last):
  File "./demo.py", line 1, in <module>
    import torch_ccl
  File "/home/kim/miniconda3/envs/exdpcpp/lib/python3.7/site-packages/torch_ccl-1.2.0+b0f9d1e-py3.7-linux-x86_64.egg/torch_ccl/__init__.py", line 12, in <module>
    from . import _C as ccl_lib
ImportError: /home/kim/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_thread.so.1: undefined symbol: __kmpc_reduce_nowait
chengjunlu commented 3 years ago

No. The torch_ccl should not link to the mkl library. Maybe the link option is polluted by linking to the ipex library.

Can you share the information of your setting up? Which version of the ipex is used? and the tool chain of the DPCPP you used.

KimBioInfoStudio commented 3 years ago

python: intelpython_core python=3.7.11 pytroch: 1.7.0a0+14820ce ipex gpu version: 0.2.0gpu ipex gpu git sha: 7f7dce58d62d38750fe271a8d70827a144a7e165\ mkl: 2021.3 dpcppd: 2021.6.19 submodule onDNN was used

if i use ldd to check torch_ccl linked library, we can see mkl lib was linked to it

ldd _C.cpython-37m-x86_64-linux-gnu.so
    linux-vdso.so.1 (0x00007f3bbc97e000)
    libtorch_ccl.so => /home/kim/miniconda3/envs/exdpcpp/lib/python3.7/site-packages/torch_ccl-1.2.0+b0f9d1e-py3.7-linux-x86_64.egg/torch_ccl/./lib/libtorch_ccl.so (0x00007f3bbc90f000)
    libc10.so => not found
    libtorch.so => not found
    libtorch_cpu.so => not found
    libtorch_python.so => not found
    libstdc++.so.6 => /home/kim/miniconda3/envs/exdpcpp/lib/libstdc++.so.6 (0x00007f3bbc788000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3bbc639000)
    libgcc_s.so.1 => /home/kim/miniconda3/envs/exdpcpp/lib/libgcc_s.so.1 (0x00007f3bbc625000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f3bbc602000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3bbc410000)
    libccl.so => /home/kim/miniconda3/envs/exdpcpp/lib/python3.7/site-packages/torch_ccl-1.2.0+b0f9d1e-py3.7-linux-x86_64.egg/torch_ccl/./lib/libccl.so (0x00007f3bbc0f2000)
    libtorch.so => not found
    libc10.so => not found
    libtorch_cpu.so => not found
    libmkl_intel_ilp64.so.1 => /home/kim/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_ilp64.so.1 (0x00007f3bbb3f8000)
    libmkl_intel_thread.so.1 => /home/kim/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_thread.so.1 (0x00007f3bb7ade000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f3bbc980000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f3bb7ad6000)
    libfabric.so.1 => /home/kim/miniconda3/envs/exdpcpp/lib/python3.7/site-packages/torch_ccl-1.2.0+b0f9d1e-py3.7-linux-x86_64.egg/torch_ccl/./lib/libfabric.so.1 (0x00007f3bb788f000)
    libmpi.so.12 => /home/kim/miniconda3/envs/exdpcpp/lib/python3.7/site-packages/torch_ccl-1.2.0+b0f9d1e-py3.7-linux-x86_64.egg/torch_ccl/./lib/libmpi.so.12 (0x00007f3bb5ef7000)
    libsycl.so.5 => /home/kim/intel/oneapi/compiler/2021.6.19/compiler/linux/lib/libsycl.so.5 (0x00007f3bb5c3e000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f3bb5c31000)
    libOpenCL.so.1 => /home/kim/intel/oneapi/compiler/2021.6.19/compiler/linux/lib/libOpenCL.so.1 (0x00007f3bb5c22000)
    libsvml.so => /home/kim/intel/oneapi/compiler/2021.6.19/compiler/linux/compiler/lib/intel64_lin/libsvml.so (0x00007f3bb40a1000)
    libirng.so => /home/kim/intel/oneapi/compiler/2021.6.19/compiler/linux/compiler/lib/intel64_lin/libirng.so (0x00007f3bb3d37000)
    libimf.so => /home/kim/intel/oneapi/compiler/2021.6.19/compiler/linux/compiler/lib/intel64_lin/libimf.so (0x00007f3bb36af000)
    libintlc.so.5 => /home/kim/intel/oneapi/compiler/2021.6.19/compiler/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x00007f3bb3437000)
ddkalamk commented 2 years ago

I also faced a similar issue (unable to find mkl libraries while building)... I found that it tried to find mkl config for caffe2 dependencies and tries to link with mkl if cmake can find MKLConfig.cmake Found a workaround to disable it. export below env before building toch-ccl and it disables linking with mkl

export CMAKE_DISABLE_FIND_PACKAGE_MKL=TRUE

chengjunlu commented 2 years ago

I also faced a similar issue (unable to find mkl libraries while building)... I found that it tried to find mkl config for caffe2 dependencies and tries to link with mkl if cmake can find MKLConfig.cmake Found a workaround to disable it. export below env before building toch-ccl and it disables linking with mkl

export CMAKE_DISABLE_FIND_PACKAGE_MKL=TRUE

Thanks for the hint about this. Need to check if the caffe2 should not export its internal dependencies to the depend chain because the torch-ccl only depends on torch.