nicehash / excavator

NiceHash's proprietary low-level CUDA miner
https://www.nicehash.com
53 stars 19 forks source link

STOP and RESTART fan issues... Mining begins but fan profile not utilized at all until GPU TARGET is exceeded, GPU overheats before profile can react. #306

Closed DroptheHammer closed 3 years ago

DroptheHammer commented 3 years ago

At the bottom is my fan profile. My desire, is the moment I select QUICKMINER START from the right click menu, that fans IMMEDIATELY go to 75%, even if the desired temp target of 94C VRAM is not yet hit. Mainly because GPU temps can rise from 50C Junction to 100C so quickly that the autofans combined with the time it takes my 3090FTW3 fans to spin up (physical time) takes so long from idle that the card spikes to 110C and crashes before the fans can help. By opening at 75% when mining starts it slows the runaway temperature increase and quickminer auto fans can do the rest very well as my card usually settles between 88C and 96C junction depending on room temp. But this behavior is not consistent from STOP to RESTART. I stop and restart many times a day because I video conference etc and prefer to have the GPU available for my work during that time.

I know my normal fan and temp settings work OK, because if I use this overclock with a fresh boot of quickminer, it is stable for days. This bug happens when i STOP the quickminer/excavator for some time, and RESTART it later manually using right click START in QuickMiner's tray tool.

Below is my command file. I know in my params "params": ["0", "73", "73", "95", "200", "2000", "-3", "0"] the second value should be telling it to begin by going to PWM 73%, and 73% is also my minimum speed in value 3, with my max PWM at 95% if the GPU Target of 94C VRAM is exceeded.

[{ "time": 0, "commands": [] }, { "time": 20, "commands": [{ "id": 1, "method": "workers.reset.all", "params": [] }] }, { "time": 30, "loop": 30, "commands": [{ "id": 1, "method": "worker.print.efficiencies", "params": [] }] }, { "time": 1, "loop": 4, "commands": [{ "id": 1, "method": "devices.smartfan.exec", "params": [] }] }, { "event": "on_quit", "commands": [{ "id": 1, "method": "devices.smartfan.reset", "params": [] }] }, { "event": "on_quickminer.start", "commands": [{ "id": 1, "method": "device.smartfan.set", "params": ["0", "3", "60", "94"] },{ "id": 1, "method": "device.smartfan.set.advanced", "params": ["0", "73", "73", "95", "200", "2000", "-3", "0"] }, { "id": 1, "method": "device.set.oc_profile2", "params": ["0", "1110", "10352"] }] }, { "event": "on_quickminer.stop", "commands": [{ "id": 1, "method": "device.set.oc_reset", "params": ["0"] }, { "id": 1, "method": "device.set.fan.reset", "params": ["0"] }] }]

nicehashdev commented 3 years ago

I am not sure what exactly is the bug here? Can you outline the actual problem you are claiming? I see you know about device.smartfan.set.advanced method already and you know how to tune parameters. This is one thing you can help yourself with. Another one to set frequency of execution of smartfan algorithm. You have it currently at 4 seconds: { "time": 1, "loop": 4, "commands": [{ "id": 1, "method": "devices.smartfan.exec", "params": [] }] } You can make smartfan way more aggressive, if you change this to 1 second: { "time": 1, "loop": 1, "commands": [{ "id": 1, "method": "devices.smartfan.exec", "params": [] }] }

DroptheHammer commented 3 years ago

If I boot nicehash QM Fresh.. and click START. It properly applies my settings.. 73 starting fan speed, 73 minimum speed, 95 max speed and 94C Target. this results in a nice smooth rise in temperature and I never have an issue since the fans start at 73C, even if VRAM is at 35C.

Next I stop the client... I can see it puts everything back to my normal card settings/profile.

When I restart... more than 30% of the time I get the following bug.. Mining begins.. Fans stay at 0, even though I've said 73 fan speed Min and 73 fan speed start. Card rapidly gains temperature shooting towards 110C. Card crashes..

Smartfan SHOULD be ensuring 73 minimum is set at the same moment mining begins, and many times it does not.

I have tried setting from 4 seconds to 1, the problem still remains. The smartfan cycles leave the fans at idle, until VRAM Target is exceeded and I can start to see it barely ramping up for a couple seconds, but most of the time it cannot catch the runaway temperature and the card crashes.

How can we ensure, that Smart fan, always applies the STARTING and MINIMUM fan speed, exactly when the card starts mining, not waiting until it hits VRAM TARGET?

nicehashdev commented 3 years ago

I see what you mean, will be fixed in next version. Set bUpdateRCVersion to true in your config to receive it asap.