sbalint98 commented 3 years ago

Summary

When running the unit tests on a Cuda device the tests fail since the GPU runs out of memory.

I am trying to run the tests on a gtx1080Ti with 11178MiB of global memory, but after executing the first few tests, a runtime exception is thrown because of insufficient device memory (CUDA_ERROR_OUT_OF_MEMORY) (see log below)

Version

The current oneMKL develop head is used eg: 1ed12c7

Environment

HW you use Intel Gold 6130 CPU with Nvidia gtx1080 GPUs
- Backend library version Cuda 10.0 MKL, and TBB obtained via intel installer version 2021.2.0
- OS name and version Ubuntu 20.04 (fakeroot singularity container)
- Compiler version dpc++ compiler cloned from develop with hash: 4e26734cb87c451e0562559d5d6f83b7eabcaea3 compiled with: buildbot/configure.py --cuda and buildbot/compile.py
CMake cmake.md

Steps to reproduce

Let the cuda-enabled dpc++ be installed in: <cuda-DPC++-dir> configure, build oneMKL:

LD_LIBRARY_PATH=<cuda-DPC++-dir>/lib/ \
CXX=<cuda-DPC++-dir>/bin/clang++ \
CC=<cuda-DPC++-dir>/bin/clang cmake  \
-DCMAKE_BUILD_TYPE=Debug \
-DTBB_ROOT=/opt/intel/oneapi/tbb/2021.2.0/ \
-DMKL_ROOT=/opt/intel/oneapi/mkl/2021.2.0/ \
-DENABLE_CUBLAS_BACKEND=ON \
-DENABLE_CURAND_BACKEND=OFF \
-DENABLE_MKLGPU_BACKEND=OFF \
-DCMAKE_INSTALL_PREFIX=/home/sbalint/hipSYCL-main/oneMKL-install/ \
..

LD_LIBRARY_PATH=<cuda-DPC++-dir>/lib/ make -j 64

LD_LIBRARY_PATH=<cuda-DPC++-dir>/lib/:$LD_LIBRARY_PATH bin/test_main_blas_ct

Observed behavior

After the first few tests, all GPU test fail because of CUDA_ERROR_OUT_OF_MEMORY. Checking nvidia-smi while running the tests confirms that the allocated memory is continuously increasing over time. Possible memory leak? cuda_test_out.log

Expected behavior

GPU tests shouldn't fail because of a lack of device memory

mmeterel commented 3 years ago

@sbalint98 Again, thanks for creating the issue. I will test this on my side and get back to you.

mmeterel commented 3 years ago

@sbalint98 Here are my observations:

I observe the same behavior when I test cuBLAS backend on my side as well. However, this only appears when I run gtest directly. I don't see the same issue when ctest is used.
I also tested gtest and ctest on GEN9 GPU as well. For gtest, I observe similar memory leak, but the amount of leak is not large enough to break gtest on my GEN9 system. Ctest on GEN9 did not have leak.

These observations tells me that, the issue is not Cuda backend specific.