trexminer / T-Rex

T-Rex NVIDIA GPU miner with web control monitoring page
2.64k stars 439 forks source link

Fan speed hunting against memory temperature #1262

Open mineranddiner opened 2 years ago

mineranddiner commented 2 years ago

I am having issues using trex to control my GPU fan speed to a stable value.

In my config, I am controlling fan speeds to memory temperature and it seems to be working alright... It does control memory temperatures. However, when running the miner, I can definitely notice a hunting effect between the fans and GPU memory temperature.

e.g.

--fan tm:78 fan @ 50%, memory at 80°C program sees temperature above target sets fan to 80% mem temp drops to 70°C fan drops to 40% mem temp climbs to 85°C fan ramps to 90% mem temp drops to 65°C fan drops to 35% mem temp.... you get it

My issue is that unless I set a prohibitively high minimum fan speed, say 85%, then I run into this poorly tuned feedback loop. If there is a way to tune the loop controlling fan speed then I'm missing it, but I would assume that the PID/PI loop in charge of resetting fan speed is not user tunable. Maybe I am seeing a conflict between trex's control and any onboard GPU (EVGA 3080ti FTW3) logic that's causing this.

I also have a suspicion that the feedback could be between individual GPUs as I am using an open air miner that has two vertically stacked rows of GPUs. My suspicion is that the bottom cards may increase in fan speed, which feeds the upper cards more hot exhaust air, which ramps their fans up, and sets off the cycling described above. Nevertheless, the fan speed response is happening too quickly as there should be a steady-state airflow that produces stable temperatures.

Ideally, I would imagine a time-dependent damping factor could be applied to the fan curve reset (like max 10% change per minute) or otherwise slow down the response between memory temperature and fan speed. Obviously, there needs to be some emergency rip-stop in case of loss of airflow or other cooling the fans aren't hard locked to an unacceptably low speed.

Any advice or other support would be appreciated.

Thanks.

zulu1905 commented 2 years ago

try this >>>> tm:78[80-100]

translation is (target mem temp =78 min fan speed=80 max fan speed=100)