Open jsoft88 opened 1 month ago
Same error in WSL2. @rjzamora @XuehaiPan
This repository is a wrong place. It's not where NVIDIA's pynvml lives.
This is weird. I have reproduced this with latest pynvml, latest NVidia drivers, wsl2. I get this for the c_name.value
returned from the call
-> return c_name.value
(Pdb) p [x for x in c_name.value]
[248, 149, 160, 129, 142, 248, 145, 128, 129, 137, 248, 144, 144, 129, 137, 248, 145, 176, 128, 160, 248, 145, 160, 129, 165, 248, 156, 160, 129, 175, 248, 153, 144, 129, 163, 248, 145, 176, 128, 160, 248, 150, 128, 129, 148, 248, 140, 144, 128, 160, 248, 141, 160, 128, 182, 248, 136, 128, 128, 176]
(Pdb) p c_name.value
b'\xf8\x95\xa0\x81\x8e\xf8\x91\x80\x81\x89\xf8\x90\x90\x81\x89\xf8\x91\xb0\x80\xa0\xf8\x91\xa0\x81\xa5\xf8\x9c\xa0\x81\xaf\xf8\x99\x90\x81\xa3\xf8\x91\xb0\x80\xa0\xf8\x96\x80\x81\x94\xf8\x8c\x90\x80\xa0\xf8\x8d\xa0\x80\xb6\xf8\x88\x80\x80\xb0'
(Pdb) len(c_name.value)
60
Note the 5-byte pattern repeating itself, the length of the string is 60. On the host windows I get
(Pdb) [x for x in c_name.value]
[78, 86, 73, 68, 73, 65, 32, 71, 101, 70, 111, 114, 99, 101, 32, 71, 84, 88, 32, 49, 54, 54, 48, 32, 83, 85, 80, 69, 82]
(Pdb) c_name.value
b'NVIDIA GeForce GTX 1660 SUPER'
(Pdb) len(c_name.value)
29
I don't see the connection between the two results. Maybe a bug in the NVidia drivers v555.85 ?
nvidia-smi
on WSL somehow gets the name right:
$ nvidia-smi
Wed May 29 11:59:40 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03 Driver Version: 555.85 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1660 ... On | 00000000:08:00.0 On | N/A |
| 28% 39C P8 16W / 125W | 1945MiB / 6144MiB | 2% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 35 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+
This repository is a wrong place. It's not where NVIDIA's pynvml lives
Right. I can confirm this also happens in gpustat
with nvidia-ml-py-12.550.52
. Is there a place to get NVidia's attention?
$ python -m gpustat --debug
Error on querying NVIDIA devices. Use --debug flag to see more details.
'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte
Traceback (most recent call last):
File "/tmp/venv310/lib/python3.10/site-packages/gpustat/cli.py", line 58, in print_gpustat
gpu_stats = GPUStatCollection.new_query(debug=debug, id=id)
File "/tmp/venv310/lib/python3.10/site-packages/gpustat/core.py", line 603, in new_query
gpu_info = get_gpu_info(handle)
File "/tmp/venv310/lib/python3.10/site-packages/gpustat/core.py", line 456, in get_gpu_info
name = _decode(N.nvmlDeviceGetName(handle))
File "/tmp/venv310/lib/python3.10/site-packages/pynvml.py", line 2094, in wrapper
return res.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte
I posted to a NVidia forum https://forums.developer.nvidia.com/t/nvmldevicegetname-problem-in-wsl-on-windows/294491 but am not optimistic. The other postings there do not see much traffic.
Thanks all for engaging. I'll do my best to find someone who can help - Sorry for the delay.
Small Update: This issue has been escalated to the NVML team and the fix has been merged into the upcoming r560 driver branch. I do not believe there are plans to re-release the short-lived r555 branch.
Running the following code on WSL2 throws the error mentioned in the title:
Stacktrace:
Whereas
nvidia-smi
command returns info without issues:If I try to decode the output of
nvmlDeviceGetName
using utf-16 codec, this is the string:'闸膠\uf88e肑要郸膐\uf889낑ꂀ釸膠\uf8a5ꂜ꾁駸膐\uf8a3ꂔꂀ雸膀\uf894낌ꂀ軸肐グ'
pynvml version 11.5.0