oneapi-src / oneMKL

oneAPI Math Kernel Library (oneMKL) Interfaces
Apache License 2.0
619 stars 158 forks source link

Compilation error when building cuRAND or cuBLAS tests #91

Closed sbalint98 closed 3 years ago

sbalint98 commented 3 years ago

Summary

When compiling oneMKL with both tests and cuRAND or cuBLAS enabled a compilation error occurs.

Version

The current oneMKL develop head is used eg: 1ed12c7270a68b78c29178ccc582d0239a4bd050

Environment

Steps to reproduce

git clone https://github.com/oneapi-src/oneMKL.git
mkdir build && cd build

LD_LIBRARY_PATH=/root/hipSYCL-main/dpc++-hand/llvm/build/install/lib/:$LD_LIBRARY_PATH \
CXX=/root/hipSYCL-main/dpc++-hand/llvm/build/install/bin/clang++ \
CC=/root/hipSYCL-main/dpc++-hand/llvm/build/install/bin/clang \
cmake -G Ninja \
-DCMAKE_BUILD_TYPE=Debug \
-DTBB_ROOT=/root/hipSYCL-main/dpc++/tbb/latest \
-DMKL_ROOT=/root/hipSYCL-main/dpc++/mkl/latest \
-DREF_BLAS_ROOT=/root/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.1/openblas-0.3.14-npb5lv7dhfygc3lgh6zx3x6chlyt4kth/ \
-DENABLE_CUBLAS_BACKEND=OFF \
-DENABLE_CURAND_BACKEND=ON \
-DENABLE_MKLGPU_BACKEND=OFF ..

LD_LIBRARY_PATH=/root/hipSYCL-main/dpc++-hand/llvm/build/install/lib/:$LD_LIBRARY_PATH ninja

Observed behavior

When either of ENABLE_CURAND_BACKEND and ENABLE_CUBLAS_BACKEND is defined the compilation fails. I believe this can be traced back to the following issues:

Expected behavior

All combination should compile successfully. I believe a possible fix might be to use a single Cuda backend selector instead of separate cuBLAS and cuRAND?

vrpascuzzi commented 3 years ago

All combination should compile successfully. I believe a possible fix might be to use a single Cuda backend selector instead of separate cuBLAS and cuRAND?

This is what I had in mind as well, some time ago. I think the decision to keep RNG and BLAS domains separate was kept. Indeed, I found similar issues as you've reported; when building cuRAND alone without BLAS, a BLAS header was included in an installed file. I needed to tweak this by-hand.

In short, this can be fixed up a bit.

mmeterel commented 3 years ago

@sbalint98 Hmmm, I cannot reproduce what you are seeing. I tried several combinations locally and here is what I observe.

You are using the latest develop branch, right?

TARGET_DOMAIN ENABLE_CUBLAS_BACKEND ENABLE_CURAND_BACKEND Build PASS/FAIL
blas ON OFF PASS
rng ON OFF PASS
blas, rng ON OFF PASS
       
rng OFF ON PASS
blas OFF ON PASS
blas, rng OFF ON PASS
       
blas ON ON BLAS test is being compiled for cuRAND backend ==> Fail
rng ON ON RNG test is being compiled for cuBLAS backend ==> Fail
blas, rng ON ON RNG test is being compiled for cuBLAS backend ==> Fail
sbalint98 commented 3 years ago

Thank you for your investigation, unfortunately, I still have the same issue as described above. I am using the current HEAD of develop.

I will try to do some more detailed exploration over the weekend. Could you provide the details of how you have executed cmake?

mmeterel commented 3 years ago

Thank you for your investigation, unfortunately, I still have the same issue as described above. I am using the current HEAD of develop.

I will try to do some more detailed exploration over the weekend. Could you provide the details of how you have executed cmake?

Hello @sbalint98 , sorry for the delay. Here are my steps to build. Please let me know if that helps.

llvm=llvm_version lnx_cuda=/project/mmeterel/tools/dpc++/opensource/${llvm}/lnx_cuda export CXX=${lnx_cuda}/compiler/bin/clang++ export LD_LIBRARY_PATH=${lnx_cuda}/compiler/lib:${LD_LIBRARY_PATH}

export REF_BLAS_ROOT=/project/mmeterel/home/.conan/data/lapack/3.7.1/conan/stable/package/9a51b86f574d44d4c0a85f83edb89eaaf9159b86 export NETLIB_ROOT=/project/mmeterel/home/.conan/data/lapack/3.7.1/conan/stable/package/9a51b86f574d44d4c0a85f83edb89eaaf9159b86 export BUILD_SHARED_LIB=ON export BUILD_DOC=OFF export ENABLE_CUBLAS_BACKEND=ON export ENABLE_CURAND_BACKEND=OFF export ENABLE_MKLGPU_BACKEND=OFF export ENABLE_MKLCPU_BACKEND=OFF export ENABLE_NETLIB_BACKEND=OFF export TARGET_DOMAINS=blas

export CMAKE_ROOT=/project/mmeterel/tools/cmake-3.16.0-Linux-x86_64 export PATH="${CMAKE_ROOT}/bin:${PATH}"

mkdir build_nvidia cd build_nvidia

cmake .. -DMKL_ROOT=${MKL_ROOT} -DREF_BLAS_ROOT=${REF_BLAS_ROOT} -DBUILD_SHARED_LIBS=${BUILD_SHARED_LIB} -DENABLE_CUBLAS_BACKEND=${ENABLE_CUBLAS_BACKEND} -DENABLE_MKLGPU_BACKEND=${ENABLE_MKLGPU_BACKEND} -DENABLE_MKLCPU_BACKEND=${ENABLE_MKLCPU_BACKEND} -DENABLE_NETLIB_BACKEND=${ENAB LE_NETLIB_BACKEND} -DTARGET_DOMAINS=${TARGET_DOMAINS} -DENABLE_CURAND_BACKEND=${ENABLE_CURAND_BACKEND} cmake --build . -j8

mmeterel commented 3 years ago

@sbalint98 Did you get a chance to try the commands I sent? Do you still see the issue?

sbalint98 commented 3 years ago

Sorry, for the very long delay, I will try the commands very soon, and get back to you.

sbalint98 commented 3 years ago

My problem was that I didn't use the -DTARGET_DOMAINS flag, and therefore all the domains have been added. Using that flag has solved my problem.

The issue I see right now, and the one that I have probably have encountered previously and you also noted in your table, arises when both rng and blas are targeted, and either the cuRAND or cuBLAS backends are enabled. In that case, the compilation failed for me with every combination of cuRAND and cuBLAS backends enabled. If this is not a legal configuration, would you agree that it would be nice to have the error shown during the configuration?

Michoumichmich commented 3 years ago

You can try the setup script I use there, but the combination is not supported yet for testing

sbalint98 commented 3 years ago

I understand, thanks for the link and your help :)