oneapi-src / oneMKL

oneAPI Math Kernel Library (oneMKL) Interfaces
Apache License 2.0
619 stars 158 forks source link

CUDA_ERROR_ILLEGAL_ADDRESS when using level1 and higher level rutines in the same queue #92

Closed sbalint98 closed 3 years ago

sbalint98 commented 3 years ago

Summary

When submitting level 1 and higher-level kernels in the same queue, for the cublas backend CUDA_ERROR_ILLEGAL_ADDRESS runtime error is thrown.

I believe this is due to the fact that for some of the level1 functions the pointer mode is set to CUBLAS_POINTER_MODE_DEVICE but it is never set back to the default value, CUBLAS_POINTER_MODE_HOST, therefore the device setting remains active for all subsequent calls with that cublas handle, which seems to cause problems. Adding the line cublasSetPointerMode(handle, CUBLAS_POINTER_MODE_HOST); to the respective functions resolves the issue.

The tests create a queue for every BLAS function, therefore this issue hasn't surfaced there, but it can be triggered with a simple test program.

Version

The current oneMKL develop head is used eg: 1ed12c7

Environment

Steps to reproduce

Use the following simple test program:

#include "oneapi/mkl.hpp"
#include <iostream>
#include <CL/sycl.hpp>

int main(){
  std::vector<double> M = {1, 1, 1, 1};
  std::vector<double> y = {3, 4};
  std::vector<double> x = {1, 1};

  std::vector<double> x1 = {1,1};
  std::vector<double> x2 = {2,2};

  double result = -1;

  cl::sycl::buffer<double, 1> M_buffer = cl::sycl::buffer(M.data(), cl::sycl::range<1>(M.size()));
  cl::sycl::buffer<double, 1> y_buffer = cl::sycl::buffer(y.data(), cl::sycl::range<1>(y.size()));
  cl::sycl::buffer<double, 1> x_buffer = cl::sycl::buffer(x.data(), cl::sycl::range<1>(x.size())); 

  cl::sycl::buffer<double, 1> x1_buffer = cl::sycl::buffer(x1.data(), cl::sycl::range<1>(x1.size())); 
  cl::sycl::buffer<double, 1> x2_buffer = cl::sycl::buffer(x1.data(), cl::sycl::range<1>(x1.size())); 

  cl::sycl::buffer<double, 1> result_buffer = cl::sycl::buffer(&result, cl::sycl::range<1>(1)); 

 auto gpu_dev = sycl::device(sycl::gpu_selector());
 sycl::queue gpu_queue(gpu_dev);

 oneapi::mkl::backend_selector<oneapi::mkl::backend::cublas> gpu_selector(gpu_queue);

 oneapi::mkl::blas::column_major::dot(gpu_selector, 2, x1_buffer, 1, x2_buffer, 1, result_buffer);
 oneapi::mkl::blas::column_major::gemv(gpu_selector, oneapi::mkl::transpose::nontrans, 2, 2,
                                   1.0, M_buffer, 2, x_buffer, 1, 1.0, y_buffer, 1.0);
}

compile: LD_LIBRARY_PATH=/home/sbalint/hipSYCL-main/dpc++-hand/llvm/build/install/lib/:/opt/hipSYCL/cuda/lib64:$LD_LIBRARY_PATH /home/sbalint/hipSYCL-main/dpc++-hand/llvm/build/install/bin/clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice -I /home/sbalint/hipSYCL-main/oneMKL-install/include/ -L/home/sbalint/hipSYCL-main/oneMKL-install/lib/ -lonemkl_blas_cublas test.cpp and run: LD_LIBRARY_PATH=/home/sbalint/hipSYCL-main/dpc++-hand/llvm/build/install/lib/:/opt/hipSYCL/cuda/lib64:/home/sbalint/hipSYCL-main/oneMKL-install/lib/:$LD_LIBRARY_PATH ./a.out

Observed behavior

The following runtime error is displayed:

Singularity> LD_LIBRARY_PATH=/home/sbalint/hipSYCL-main/dpc++-hand/llvm/build/install/lib/:/opt/hipSYCL/cuda/lib64:/home/sbalint/hipSYCL-main/oneMKL-install/lib/:$LD_LIBRARY_PATH ./a.out 
Hello

PI CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        cuda_piEnqueueMemBufferRead
        Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:2199

PI CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        wait
        Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:447

PI CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        wait
        Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:447

PI CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        wait
        Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:447

PI CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        enqueueEventWait
        Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:473

PI CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        _pi_event
        Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:331

PI CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        wait
        Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:447

Expected behavior

The program executes without errors

mmeterel commented 3 years ago

@sbalint98 Thanks for creating the issue and providing a reproducer. I will test this on my side and get back to you very soon.