When submitting level 1 and higher-level kernels in the same queue, for the cublas backend CUDA_ERROR_ILLEGAL_ADDRESS runtime error is thrown.
I believe this is due to the fact that for some of the level1 functions the pointer mode is set to CUBLAS_POINTER_MODE_DEVICE but it is never set back to the default value, CUBLAS_POINTER_MODE_HOST, therefore the device setting remains active for all subsequent calls with that cublas handle, which seems to cause problems. Adding the line cublasSetPointerMode(handle, CUBLAS_POINTER_MODE_HOST); to the respective functions resolves the issue.
The tests create a queue for every BLAS function, therefore this issue hasn't surfaced there, but it can be triggered with a simple test program.
Version
The current oneMKL develop head is used eg: 1ed12c7
Environment
HW you use
Intel Gold 6130 CPU with Nvidia gtx1080 GPUs
Backend library version
Cuda 10.0
MKL, and TBB obtained via intel installer version 2021.2.0
OS name and version
Ubuntu 20.04 (fakeroot singularity container)
Compiler version
dpc++ compiler cloned from develop with hash: 4e26734cb87c451e0562559d5d6f83b7eabcaea3
compiled with:
buildbot/configure.py --cudaand buildbot/compile.py
Singularity> LD_LIBRARY_PATH=/home/sbalint/hipSYCL-main/dpc++-hand/llvm/build/install/lib/:/opt/hipSYCL/cuda/lib64:/home/sbalint/hipSYCL-main/oneMKL-install/lib/:$LD_LIBRARY_PATH ./a.out
Hello
PI CUDA ERROR:
Value: 700
Name: CUDA_ERROR_ILLEGAL_ADDRESS
Description: an illegal memory access was encountered
Function: cuda_piEnqueueMemBufferRead
Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:2199
PI CUDA ERROR:
Value: 700
Name: CUDA_ERROR_ILLEGAL_ADDRESS
Description: an illegal memory access was encountered
Function: wait
Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:447
PI CUDA ERROR:
Value: 700
Name: CUDA_ERROR_ILLEGAL_ADDRESS
Description: an illegal memory access was encountered
Function: wait
Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:447
PI CUDA ERROR:
Value: 700
Name: CUDA_ERROR_ILLEGAL_ADDRESS
Description: an illegal memory access was encountered
Function: wait
Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:447
PI CUDA ERROR:
Value: 700
Name: CUDA_ERROR_ILLEGAL_ADDRESS
Description: an illegal memory access was encountered
Function: enqueueEventWait
Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:473
PI CUDA ERROR:
Value: 700
Name: CUDA_ERROR_ILLEGAL_ADDRESS
Description: an illegal memory access was encountered
Function: _pi_event
Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:331
PI CUDA ERROR:
Value: 700
Name: CUDA_ERROR_ILLEGAL_ADDRESS
Description: an illegal memory access was encountered
Function: wait
Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:447
Summary
When submitting level 1 and higher-level kernels in the same queue, for the cublas backend
CUDA_ERROR_ILLEGAL_ADDRESS
runtime error is thrown.I believe this is due to the fact that for some of the level1 functions the pointer mode is set to
CUBLAS_POINTER_MODE_DEVICE
but it is never set back to the default value,CUBLAS_POINTER_MODE_HOST
, therefore the device setting remains active for all subsequent calls with that cublas handle, which seems to cause problems. Adding the linecublasSetPointerMode(handle, CUBLAS_POINTER_MODE_HOST);
to the respective functions resolves the issue.The tests create a queue for every BLAS function, therefore this issue hasn't surfaced there, but it can be triggered with a simple test program.
Version
The current oneMKL develop head is used eg: 1ed12c7
Environment
buildbot/configure.py --cuda
and buildbot/compile.py
Steps to reproduce
Use the following simple test program:
compile:
LD_LIBRARY_PATH=/home/sbalint/hipSYCL-main/dpc++-hand/llvm/build/install/lib/:/opt/hipSYCL/cuda/lib64:$LD_LIBRARY_PATH /home/sbalint/hipSYCL-main/dpc++-hand/llvm/build/install/bin/clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice -I /home/sbalint/hipSYCL-main/oneMKL-install/include/ -L/home/sbalint/hipSYCL-main/oneMKL-install/lib/ -lonemkl_blas_cublas test.cpp
and run:LD_LIBRARY_PATH=/home/sbalint/hipSYCL-main/dpc++-hand/llvm/build/install/lib/:/opt/hipSYCL/cuda/lib64:/home/sbalint/hipSYCL-main/oneMKL-install/lib/:$LD_LIBRARY_PATH ./a.out
Observed behavior
The following runtime error is displayed:
Expected behavior
The program executes without errors