Closed Gh0stExp10it closed 3 months ago
Can you try setting a breakpoint and print the value of res (bytes)?
Here is the output of the value "res":
b'\xf8\x95\xa0\x81\x8e\xf8\x91\x80\x81\x89\xf8\x90\x90\x81\x89\xf8\x91\xb0\x80\xa0\xf8\x91\xa0\x81\xa5\xf8\x9c\xa0\x81\xaf\xf8\x99\x90\x81\xa3\xf8\x94\xa0\x80\xa0\xf8\x96\x80\x81\x94\xf8\x8c\xb0\x80\xa0\xf8\x8e\x80\x80\xb00'
That's pretty strange, looks like a random junk data. Not sure why nvmlDeviceGetName
returns that. I think this is a bug of a NVIDIA Driver. Can you try downgrading NVIDIA driver versions?
@wookayin I am experiencing this issue too now on WSL2 Ubuntu 22.04. Everything was fine with my previous cuda version but I updated my Windows drivers to 12.5 yesterday and now when I run gpustat on Ubuntu I get the same error. The output of gpustat --debug
is:
Error on querying NVIDIA devices. Use --debug flag to see more details.
'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gpustat/cli.py", line 58, in print_gpustat
gpu_stats = GPUStatCollection.new_query(debug=debug, id=id)
File "/usr/local/lib/python3.10/dist-packages/gpustat/core.py", line 603, in new_query
gpu_info = get_gpu_info(handle)
File "/usr/local/lib/python3.10/dist-packages/gpustat/core.py", line 456, in get_gpu_info
name = _decode(N.nvmlDeviceGetName(handle))
File "/usr/local/lib/python3.10/dist-packages/pynvml.py", line 2094, in wrapper
return res.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte
The output of nvidia-smi
(on Ubuntu) is:
Wed May 29 09:41:04 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03 Driver Version: 555.85 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX 4000 Ada Gene... On | 00000000:01:00.0 Off | Off |
| 30% 25C P8 6W / 130W | 0MiB / 20475MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Interestingly, the output of nvcc --version
(on Ubuntu) is 12.4:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
Here is some more session info:
- Platform: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
- Python version: 3.10.12
- PyTorch version (GPU?): 2.3.0+cu121 (True)
- gpustat==1.1.1
I think you're right it's to do with this version of the drivers. I don't really want to downgrade at the moment. I wonder if it might be fixed by upgrading nvcc to 12.5 on Ubuntu. I'll see if I have the stomach for it at some point. After I upgraded, torch.cuda.is_available()
returned False
. I was able to get it working again, but if the only thing that doesn't work is gpustat, it's a bit of a loss as it's a really useful utility, but it's not as bad as spending hours having to rebuild my environment.
That's pretty strange, looks like a random junk data. Not sure why
nvmlDeviceGetName
returns that. I think this is a bug of a NVIDIA Driver. Can you try downgrading NVIDIA driver versions?
Thanks for the additional info! I will try it out in the next few days to see if a downgrade of the driver works.
I am also hoping for a customized version of the driver from NVIDIA.
I'm experiencing the exact same issue with gpustat. Same versions of nvidia-smi (555.42.03), driver (555.85), and CUDA (12.5). Also on WSL2 and Ubuntu LTS 22.04.04 and an RTX 3080. Here's my nvcc --version
:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
Pytorch and TF are both running on GPU just fine for me, as is nvidia-smi. It's just gpustat that's having the utf-8 issue.
I have not yet found the time to test a downgrade of the driver. However, I can note that the next update to version 555.99 did not bring any improvement either. Possibly only with the next update to ~556.xx.
Will change the headline once again.
I will now close this issue, as the NVIDIA driver version 560.70 seems to have fixed the problem. Should someone still find an error, a new issue with reference to this one would be the best option. Please check for this update.
Describe the bug
I've simply executed
gpustat
and get the following error response:When executed with --debug option:
Screenshots or Program Output
gpustat --debug
as above.nvidia-smi
Environment information:
Additional context
Thank you in advanced!