natankeddem / hush

GUI Enabled Docker Based Fan Controller
27 stars 5 forks source link

GPU Temp - Nvidia conversion error #9

Closed colpachi closed 4 days ago

colpachi commented 1 month ago

Hello Natan, just wanted to open this issue to let you know

Scenario:

Dell PowerEdge T630, virtualized with Proxmox one Nvidia GTX 1660 S installed

Steps to reproduce:

1) configured volumes, for data and logging 2) docker-compose up 3) added hosts Host: 192.168.5.24 username password

4) Configure tab speed controller, set idrac 8 - hit test: sucess cpu temperature sensor, set idrac 8 - hit test: sucess

* At this point the graphic in monitor tab is being updated, all ok **

5) GPU Temperature Sensor, set to Nvidia - hit test: Got error below:

2024-08-21 12:50:22 hush-1 | /usr/local/lib/python3.12/site-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice. 2024-08-21 12:50:22 hush-1 | return _methods._mean(a, axis=axis, dtype=dtype, 2024-08-21 12:50:22 hush-1 | /usr/local/lib/python3.12/site-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide 2024-08-21 12:50:22 hush-1 | ret = ret.dtype.type(ret / rcount) 2024-08-21 12:50:22 hush-1 | 2024-08-21 15:50:22,050-ERROR-hush.tabs.configure-1::configure|123:: cannot convert float NaN to integer 2024-08-21 12:50:22 hush-1 | Traceback (most recent call last): 2024-08-21 12:50:22 hush-1 | File "/app/hush/tabs/configure.py", line 115, in _test 2024-08-21 12:50:22 hush-1 | temperature = await device.get_temp() 2024-08-21 12:50:22 hush-1 | ^^^^^^^^^^^^^^^^^^^^^^^ 2024-08-21 12:50:22 hush-1 | File "/app/hush/hardware/nvidia.py", line 24, in get_temp 2024-08-21 12:50:22 hush-1 | raise e 2024-08-21 12:50:22 hush-1 | File "/app/hush/hardware/nvidia.py", line 19, in get_temp 2024-08-21 12:50:22 hush-1 | temperature = int(np.mean(temperatures)) 2024-08-21 12:50:22 hush-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-08-21 12:50:22 hush-1 | ValueError: cannot convert float NaN to integer

natankeddem commented 1 month ago

Hello @colpachi, Can you please provide me with the return from nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader.

colpachi commented 4 weeks ago

nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader

returns:

43

natankeddem commented 4 weeks ago

You state you setup your iDRAC credentials. For NVIDIA GPU measurements you will need to utilize SSH. Did you also enter your SSH host\credentials?

colpachi commented 3 weeks ago

I'm not sure how to configure it, could you please give me some guidance on this?

image

natankeddem commented 3 weeks ago

You will need to edit the host and click OS settings tab. You can enter the credentials there. You can choose to utilize just username/password or key based authentication. image

colpachi commented 3 weeks ago

Maybe this wont work for my scenario, the GPU is passed through a VM, and the driver is black listed into the hypervisor.

OS are referring to the hypervisor aren't it?

natankeddem commented 3 weeks ago

You can set it to whichever host has nvidia-smi and the gpu exposed. It should work for the hypervisor or vm since it is just using ssh to execute nvidia-smi.

natankeddem commented 4 days ago

I will assume you figured out your issue and have things working. If not feel free to open another issue.