wilicc / gpu-burn

Multi-GPU CUDA stress test
BSD 2-Clause "Simplified" License
1.37k stars 295 forks source link

Trying to run gpu-burn without the nvidia drivers loaded doesn't immediately exit with a non-zero code, and console is flooded with "terminate called after..." error messages #60

Open bladernr opened 2 years ago

bladernr commented 2 years ago

$ ./gpu_burn 30
Burning for 30 seconds.
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

terminate called after throwing an instance of 'std::cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'std::
cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'std::cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'std::
cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'std::cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'std::
cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'std::cxx11::basic_string<char, std::char_traits, std::allocator >terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'std::cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'std::
cxx11::basic_string<char, std::char_traits, std::allocator >'
std::cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'std::
cxx11::basic_string<char, std::char_traits, std::allocator >'
'
terminate called after throwing an instance of 'std::cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'std::
cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'std::cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'std::
cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'std::cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::
cxx11::basic_string<char, std::char_traits, std::allocator

'
terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::cxx11::basic_string<char, std::char_traits, std::allocator '
terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::
cxx11::basic_string<char, std::char_traits, std::allocator '
std::cxx11::basic_string<char, std::char_traits, std::allocator >'
std::
cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'std::cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::
cxx11::basic_string<char, std::char_traits, std::allocator std::cxx11::basic_string<char, std::char_traits, std::allocator >'
'
std::
cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'std::cxx11::basic_string<char, std::char_traits, std::allocator >'
terminate called after throwing an instance of 'terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::
cxx11::basic_string<ch ar, std::char_traits, std::allocator >'
std::cxx11::basic_string<char, std::char_traits, std::allocator >' terminate called after throwing an instance of 'terminate called after throwing an instance of 'std::cxx11::basic_string<char, std::char_traits, std::allocator '

I eventually had to CTRL-C out of this. It's on Ubuntu 22.04 with the latest gpu_burn source and cuda toolkit installed. I'm doing some bug testing of a wrapper I am using, when I hit this.

jiaolovekt commented 1 year ago

I've encountered same error with ubuntu2204, cuda11.8/12.1. After debugging, it seems that GPU cannot be initialized and the cuInit(0) returns 999. And soon I realized that this may be cause by the built in NVIDIA drivers. I tried apt install -y nvidia-cuda-toolkit nvidia-modprobe and nvidia-modprobe -u. Then update the Makefile's CUDAPATH and NVCC path. And It works. Hope this will help a little.