Closed sbalint98 closed 3 years ago
All combination should compile successfully. I believe a possible fix might be to use a single Cuda backend selector instead of separate cuBLAS and cuRAND?
This is what I had in mind as well, some time ago. I think the decision to keep RNG and BLAS domains separate was kept. Indeed, I found similar issues as you've reported; when building cuRAND
alone without BLAS, a BLAS header was included in an installed file. I needed to tweak this by-hand.
In short, this can be fixed up a bit.
@sbalint98 Hmmm, I cannot reproduce what you are seeing. I tried several combinations locally and here is what I observe.
You are using the latest develop branch, right?
TARGET_DOMAIN | ENABLE_CUBLAS_BACKEND | ENABLE_CURAND_BACKEND | Build PASS/FAIL |
---|---|---|---|
blas | ON | OFF | PASS |
rng | ON | OFF | PASS |
blas, rng | ON | OFF | PASS |
rng | OFF | ON | PASS |
blas | OFF | ON | PASS |
blas, rng | OFF | ON | PASS |
blas | ON | ON | BLAS test is being compiled for cuRAND backend ==> Fail |
rng | ON | ON | RNG test is being compiled for cuBLAS backend ==> Fail |
blas, rng | ON | ON | RNG test is being compiled for cuBLAS backend ==> Fail |
Thank you for your investigation, unfortunately, I still have the same issue as described above. I am using the current HEAD of develop.
I will try to do some more detailed exploration over the weekend. Could you provide the details of how you have executed cmake?
Thank you for your investigation, unfortunately, I still have the same issue as described above. I am using the current HEAD of develop.
I will try to do some more detailed exploration over the weekend. Could you provide the details of how you have executed cmake?
Hello @sbalint98 , sorry for the delay. Here are my steps to build. Please let me know if that helps.
llvm=llvm_version lnx_cuda=/project/mmeterel/tools/dpc++/opensource/${llvm}/lnx_cuda export CXX=${lnx_cuda}/compiler/bin/clang++ export LD_LIBRARY_PATH=${lnx_cuda}/compiler/lib:${LD_LIBRARY_PATH}
export REF_BLAS_ROOT=/project/mmeterel/home/.conan/data/lapack/3.7.1/conan/stable/package/9a51b86f574d44d4c0a85f83edb89eaaf9159b86 export NETLIB_ROOT=/project/mmeterel/home/.conan/data/lapack/3.7.1/conan/stable/package/9a51b86f574d44d4c0a85f83edb89eaaf9159b86 export BUILD_SHARED_LIB=ON export BUILD_DOC=OFF export ENABLE_CUBLAS_BACKEND=ON export ENABLE_CURAND_BACKEND=OFF export ENABLE_MKLGPU_BACKEND=OFF export ENABLE_MKLCPU_BACKEND=OFF export ENABLE_NETLIB_BACKEND=OFF export TARGET_DOMAINS=blas
export CMAKE_ROOT=/project/mmeterel/tools/cmake-3.16.0-Linux-x86_64 export PATH="${CMAKE_ROOT}/bin:${PATH}"
mkdir build_nvidia cd build_nvidia
cmake .. -DMKL_ROOT=${MKL_ROOT} -DREF_BLAS_ROOT=${REF_BLAS_ROOT} -DBUILD_SHARED_LIBS=${BUILD_SHARED_LIB} -DENABLE_CUBLAS_BACKEND=${ENABLE_CUBLAS_BACKEND} -DENABLE_MKLGPU_BACKEND=${ENABLE_MKLGPU_BACKEND} -DENABLE_MKLCPU_BACKEND=${ENABLE_MKLCPU_BACKEND} -DENABLE_NETLIB_BACKEND=${ENAB LE_NETLIB_BACKEND} -DTARGET_DOMAINS=${TARGET_DOMAINS} -DENABLE_CURAND_BACKEND=${ENABLE_CURAND_BACKEND} cmake --build . -j8
@sbalint98 Did you get a chance to try the commands I sent? Do you still see the issue?
Sorry, for the very long delay, I will try the commands very soon, and get back to you.
My problem was that I didn't use the -DTARGET_DOMAINS
flag, and therefore all the domains have been added. Using that flag has solved my problem.
The issue I see right now, and the one that I have probably have encountered previously and you also noted in your table, arises when both rng and blas are targeted, and either the cuRAND or cuBLAS backends are enabled. In that case, the compilation failed for me with every combination of cuRAND and cuBLAS backends enabled. If this is not a legal configuration, would you agree that it would be nice to have the error shown during the configuration?
You can try the setup script I use there, but the combination is not supported yet for testing
I understand, thanks for the link and your help :)
Summary
When compiling oneMKL with both tests and cuRAND or cuBLAS enabled a compilation error occurs.
Version
The current oneMKL develop head is used eg: 1ed12c7270a68b78c29178ccc582d0239a4bd050
Environment
HW you use Intel Gold 6130 CPU with Nvidia gtx1080 GPUs
Backend library version Cuda 10.0 MKL, and TBB obtained via intel installer version 2021.1.1
OS name and version Ubuntu 20.04
Compiler version dpc++ compiler cloned from develop with hash: 4e26734cb87c451e0562559d5d6f83b7eabcaea3
CMake cmake_log_cublas.log cmake_log_curand.log
Steps to reproduce
Observed behavior
When either of
ENABLE_CURAND_BACKEND
andENABLE_CUBLAS_BACKEND
is defined the compilation fails. I believe this can be traced back to the following issues:In case
ENABLE_CURAND_BACKEND=ON ENABLE_CUBLAS_BACKEND=OFF
The compilation terminates with an error. I suspect that this is caused by the code in test_helper.hpp 70-81. In caseENABLE_CURAND_BACKEND
is defined the compilation will fail sinceTEST_RUN_NVIDIAGPU_CURAND_SELECT
will be defined with the backend selectoroneapi::mkl::backend::curand
and there are no such blas functions defined inblas_ct_backends.hpp
compile_error_curand.logIn case
ENABLE_CURAND_BACKEND=OFF ENABLE_CUBLAS_BACKEND=ON
The compilation fails since the cuRAND tests are attempted to be compiled with the cublas backend selector. compile_error_cublas.log A possible workaround in case only cuBLAS is of interest is to comment out adding therng
domain in the root levelCMakelists.txt
. In that case, the compilation is successful. only a few warnings about SYCL 2020 depreciation warnings are displayed.Expected behavior
All combination should compile successfully. I believe a possible fix might be to use a single Cuda backend selector instead of separate cuBLAS and cuRAND?