oneapi-src / oneMKL

oneAPI Math Kernel Library (oneMKL) Interfaces
Apache License 2.0
619 stars 158 forks source link

Unit tests on Cuda device deplete the device memory #93

Closed sbalint98 closed 3 years ago

sbalint98 commented 3 years ago

Summary

When running the unit tests on a Cuda device the tests fail since the GPU runs out of memory.

I am trying to run the tests on a gtx1080Ti with 11178MiB of global memory, but after executing the first few tests, a runtime exception is thrown because of insufficient device memory (CUDA_ERROR_OUT_OF_MEMORY) (see log below)

Version

The current oneMKL develop head is used eg: 1ed12c7

Environment

Steps to reproduce

Let the cuda-enabled dpc++ be installed in: <cuda-DPC++-dir> configure, build oneMKL:

LD_LIBRARY_PATH=<cuda-DPC++-dir>/lib/ \
CXX=<cuda-DPC++-dir>/bin/clang++ \
CC=<cuda-DPC++-dir>/bin/clang cmake  \
-DCMAKE_BUILD_TYPE=Debug \
-DTBB_ROOT=/opt/intel/oneapi/tbb/2021.2.0/ \
-DMKL_ROOT=/opt/intel/oneapi/mkl/2021.2.0/ \
-DENABLE_CUBLAS_BACKEND=ON \
-DENABLE_CURAND_BACKEND=OFF \
-DENABLE_MKLGPU_BACKEND=OFF \
-DCMAKE_INSTALL_PREFIX=/home/sbalint/hipSYCL-main/oneMKL-install/ \
..
LD_LIBRARY_PATH=<cuda-DPC++-dir>/lib/ make -j 64
LD_LIBRARY_PATH=<cuda-DPC++-dir>/lib/:$LD_LIBRARY_PATH bin/test_main_blas_ct

Observed behavior

After the first few tests, all GPU test fail because of CUDA_ERROR_OUT_OF_MEMORY. Checking nvidia-smi while running the tests confirms that the allocated memory is continuously increasing over time. Possible memory leak? cuda_test_out.log

Expected behavior

GPU tests shouldn't fail because of a lack of device memory

mmeterel commented 3 years ago

@sbalint98 Again, thanks for creating the issue. I will test this on my side and get back to you.

mmeterel commented 3 years ago

@sbalint98 Here are my observations:

These observations tells me that, the issue is not Cuda backend specific.

sbalint98 commented 3 years ago

Thank you for investigating. I find these results quite surprising I thought ctest is just a wrapper around gtest in this case. However I can confirm that using ctest works fine. Closing now.