Closed colpachi closed 4 days ago
Hello @colpachi,
Can you please provide me with the return from nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader
.
nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader
returns:
43
You state you setup your iDRAC credentials. For NVIDIA GPU measurements you will need to utilize SSH. Did you also enter your SSH host\credentials?
I'm not sure how to configure it, could you please give me some guidance on this?
You will need to edit the host and click OS
settings tab. You can enter the credentials there. You can choose to utilize just username/password or key based authentication.
Maybe this wont work for my scenario, the GPU is passed through a VM, and the driver is black listed into the hypervisor.
OS are referring to the hypervisor aren't it?
You can set it to whichever host has nvidia-smi
and the gpu exposed. It should work for the hypervisor or vm since it is just using ssh to execute nvidia-smi
.
I will assume you figured out your issue and have things working. If not feel free to open another issue.
Hello Natan, just wanted to open this issue to let you know
Scenario:
Dell PowerEdge T630, virtualized with Proxmox one Nvidia GTX 1660 S installed
Steps to reproduce:
1) configured volumes, for data and logging 2) docker-compose up 3) added hosts Host: 192.168.5.24 username password
4) Configure tab speed controller, set idrac 8 - hit test: sucess cpu temperature sensor, set idrac 8 - hit test: sucess
* At this point the graphic in monitor tab is being updated, all ok **
5) GPU Temperature Sensor, set to Nvidia - hit test: Got error below:
2024-08-21 12:50:22 hush-1 | /usr/local/lib/python3.12/site-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice. 2024-08-21 12:50:22 hush-1 | return _methods._mean(a, axis=axis, dtype=dtype, 2024-08-21 12:50:22 hush-1 | /usr/local/lib/python3.12/site-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide 2024-08-21 12:50:22 hush-1 | ret = ret.dtype.type(ret / rcount) 2024-08-21 12:50:22 hush-1 | 2024-08-21 15:50:22,050-ERROR-hush.tabs.configure-1::configure|123:: cannot convert float NaN to integer 2024-08-21 12:50:22 hush-1 | Traceback (most recent call last): 2024-08-21 12:50:22 hush-1 | File "/app/hush/tabs/configure.py", line 115, in _test 2024-08-21 12:50:22 hush-1 | temperature = await device.get_temp() 2024-08-21 12:50:22 hush-1 | ^^^^^^^^^^^^^^^^^^^^^^^ 2024-08-21 12:50:22 hush-1 | File "/app/hush/hardware/nvidia.py", line 24, in get_temp 2024-08-21 12:50:22 hush-1 | raise e 2024-08-21 12:50:22 hush-1 | File "/app/hush/hardware/nvidia.py", line 19, in get_temp 2024-08-21 12:50:22 hush-1 | temperature = int(np.mean(temperatures)) 2024-08-21 12:50:22 hush-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-08-21 12:50:22 hush-1 | ValueError: cannot convert float NaN to integer