nicehash / NiceHashQuickMiner

Super simple & easy Windows 10 cryptocurrency miner made by NiceHash.
https://www.nicehash.com
454 stars 200 forks source link

[BUG] Rig started crashing every hour roughly #549

Closed JJMineTheGap closed 2 years ago

JJMineTheGap commented 3 years ago

Describe the bug A clear and concise description of what the bug is.

About the time Nicehash took the code base offline in GIT, my rig went offline. When trying to start it again, it would start the quickhash miner, then after a few seconds it would reboot the PC. The config file was set to only restart the excavator - not reboot the PC). The PC was stable as long as I didn't start the excavator. As soon as I started NHQM, it would start working and reboot the computer again in less than a minute. It took quite a bit of troubleshooting to get it back to operational. I couldn't re-install 5.1.3 because during install, it looked for the code repository on Git, and since that was offline, it couldn't proceed. So, I copied the NHQM I had backed up as a clean version and started using that. It was able to start, but it didn't provide any card optimizations. I tried applying MSI through the tool, but it essentially ran them at FULL power, but with lite hash rates. Nothing I could do though, so left them running hot for the night. Next morning - Optimization was available again, so set them back to lite or medium as it was configured before the crash.

Now, the whole PC reboots about every hour. Turned on logging per the request of NH Support team and providing that info here. Looking to understand what is causing the reboot of the PC.

To Reproduce Steps to reproduce the behavior:

  1. Start NHQM
  2. Wait about an hour
  3. Whole PC restarts
  4. rinse and repeat

Expected behavior It should just run, not reboot the computer. Cards are all within operating temperatures (less than 62 deg cel on GPU and less than 102 on VRAM).

Screenshots If applicable, add screenshots to help explain your problem.

Version affected (please complete the following information): NHQM_v0.5.1.3 Excavator_v1.7.1d_Build_880

NVIDIA driver version Which one do you use? 27.21.14.6647 46647

Attachments: excavator_log_2021-05-31_09-24-02.000.txt excavator_log_2021-05-31_10-07-26.001.txt excavator_log_2021-05-31_10-52-07.002.txt log-redacted.txt Rig screen shot - rebooting every hour

Hardware Describe GPUs used and how are GPUs connected to the motherboard. CPU? three 3080 and three 3090 Nvidia cards all using risers and USB extension cables

Logs Activate logging first, then record event - bug - then export logs into a .zip file and attach it when submitting this issue. -attached

Additional context Add any other context about the problem here.

Cheemonstonzo commented 3 years ago

This seems to be fixed now? QM now mining stable and not giving mem-oc errors. Gonna see how stable this is overnight.

There's definitely some concerning hidden hand-holding with the online source code for QM to function properly.

JJMineTheGap commented 3 years ago

Thanks for looking at it. No, the rig still reboots about every hour. I can upload more logs if needed, I've been logging for days now.

Cheemonstonzo commented 3 years ago

If it helps, I completely uninstalled QM, then manually downloaded the git source and extracted into old empty folder. When I launched it asked to update which I did and it's been rock solid for ~9 hours.

JJMineTheGap commented 3 years ago

Ok, thanks - I can give that a try. I'd love to know what is causing the instability though, if that is identifiable in the logs.

JJMineTheGap commented 3 years ago

I tried that, no change. Still reboots the entire rig (PC) about every hour. Do the logs indicate any particular reason for this?

Dohtar1337 commented 3 years ago

Is the issue still happening? It sounds like a riser issue to me.

lshaf commented 3 years ago

I have the same problem but it ended with
CUDA error 'unknown error' in func 'cuda_daggerhashimoto::run' line 1220

I'm using manual OC for my 2xInno3D GTX 1660 Super with configuration CoreClock -250 MemClock 750 and power 70. I have no idea what's going on but using medium or efficient lead crash less than 10 minutes.

From my research, I find someone using OC more than mine even on nicehash recommendation here.

Maybe it's related to this bug and maybe not but someone please help me. I have no idea anymore. Even I'm using lite, it will crash after roughly 24 hours.

saeedsahne commented 3 years ago

Having the same issue on a 3090 gpu with the latest drivers and latest QM as of today. mine restarts every randomly, sometimes 10 mins sometimes fee hours, with the exact same temps that used to work fine.

specs: i9 10850k Asus strix z590 EVGA 3090 FTW ULTRA

Dohtar1337 commented 2 years ago

The issue is related to too high OC settings and/or issues with the drivers.

Please lower OC settings and reinstall the drivers following this guide: https://www.nicehash.com/blog/post/how-to-correctly-uninstall-and-install-gpu-drivers