Open Blackskyliner opened 6 years ago
@Blackskyliner Thanks for the detailed bug report. I think the GPU is completely off at some point, 40 Sol/s that you see is coming from your AVX2 CPU's as far as I can tell. Can you run only with -cd 0
to see what happens?
About the remaining 40 Sol/s I also thought it may be my CPU only. But after some testing it only gets up to about 25 Sol/s.
I will watch on my next mining run, going to bed now, if it happens again, on GPU only. Maybe I will also be able to get a good -d 0
when it happens.
Thanks for the fast reaction 😃.
Only CPU (-t 3
)
[21:49:55][0x00007ffff0eb63c0] Using SSE2: YES
[21:49:55][0x00007ffff0eb63c0] Using AVX: YES
[21:49:55][0x00007ffff0eb63c0] Using AVX2: YES
[21:49:55][0x0000700002d02000] stratum | Starting miner
[21:49:55][0x0000700002d02000] stratum | Connecting to stratum server eu1-zcash.flypool.org:3333
[21:49:55][0x0000700002d85000] miner#0 | Starting thread #0 (CPU-XENONCAT-AVX2)
[21:49:55][0x0000700002e08000] miner#1 | Starting thread #1 (CPU-XENONCAT-AVX2)
[21:49:55][0x0000700002e8b000] miner#2 | Starting thread #2 (CPU-XENONCAT-AVX2)
[21:49:55][0x0000700002d02000] stratum | Connected!
[21:49:55][0x0000700002d02000] stratum | Subscribed to stratum server
...
[21:49:57][0x0000700002d02000] stratum | Received new job #b603a427f3b3c89a857f
[21:50:07][0x00007ffff0eb63c0] Speed [15 sec]: 10.9881 I/s, 18.9949 Sols/s
[21:50:19][0x00007ffff0eb63c0] Speed [15 sec]: 13.6 I/s, 25.2 Sols/s
[21:50:30][0x0000700002d02000] stratum | Received new job #02cd11a1f59dce01c632
[21:50:31][0x00007ffff0eb63c0] Speed [15 sec]: 13.5333 I/s, 24.8667 Sols/s
[21:50:42][0x00007ffff0eb63c0] Speed [15 sec]: 12.4 I/s, 24.4667 Sols/s
[21:50:54][0x00007ffff0eb63c0] Speed [15 sec]: 13.4 I/s, 24.4 Sols/s
[21:51:06][0x00007ffff0eb63c0] Speed [15 sec]: 13.4 I/s, 23.4667 Sols/s
[21:51:17][0x00007ffff0eb63c0] Speed [15 sec]: 13.4 I/s, 24.6667 Sols/s
[21:51:18][0x0000700002d02000] stratum | Received new job #abf3270b8d781e771791
[21:51:29][0x00007ffff0eb63c0] Speed [15 sec]: 12.6 I/s, 23.5333 Sols/s
[21:51:40][0x00007ffff0eb63c0] Speed [15 sec]: 13.4667 I/s, 26.1333 Sols/s
[21:51:52][0x00007ffff0eb63c0] Speed [15 sec]: 13.6 I/s, 24.4667 Sols/s
[21:52:04][0x00007ffff0eb63c0] Speed [15 sec]: 13.2 I/s, 23.2 Sols/s
[21:52:15][0x00007ffff0eb63c0] Speed [15 sec]: 12.4667 I/s, 23.8667 Sols/s
[21:52:26][0x00007ffff0eb63c0] Speed [15 sec]: 12.6 I/s, 21.7333 Sols/s
[21:52:37][0x00007ffff0eb63c0] Speed [15 sec]: 13 I/s, 23.6 Sols/s
[21:52:49][0x00007ffff0eb63c0] Speed [15 sec]: 12.8 I/s, 28.1333 Sols/s
Only GPU (-t 0 -cd 0
)
[21:53:43][0x00007ffff0eb63c0] Using SSE2: YES
[21:53:43][0x00007ffff0eb63c0] Using AVX: YES
[21:53:43][0x00007ffff0eb63c0] Using AVX2: YES
[21:53:43][0x000070000b239000] stratum | Starting miner
[21:53:43][0x000070000b239000] stratum | Connecting to stratum server eu1-zcash.flypool.org:3333
[21:53:43][0x000070000b2bc000] miner#0 | Starting thread #0 (CUDA-DJEZO) GeForce GTX 970 (#0) M=1
[21:53:43][0x000070000b239000] stratum | Connected!
[21:53:44][0x000070000b239000] stratum | Subscribed to stratum server
...
[21:53:45][0x000070000b239000] stratum | Received new job #073def2645e4261860b7
[21:53:55][0x00007ffff0eb63c0] Speed [15 sec]: 97.5437 I/s, 176.288 Sols/s
[21:54:06][0x00007ffff0eb63c0] Speed [15 sec]: 119.8 I/s, 226.8 Sols/s
[21:54:17][0x00007ffff0eb63c0] Speed [15 sec]: 119.267 I/s, 222.867 Sols/s
[21:54:29][0x00007ffff0eb63c0] Speed [15 sec]: 118.8 I/s, 221.733 Sols/s
[21:54:30][0x000070000b239000] stratum | Received new job #e32b54494e6f480c4b7d
[21:54:40][0x00007ffff0eb63c0] Speed [15 sec]: 110.933 I/s, 204.733 Sols/s
[21:54:51][0x00007ffff0eb63c0] Speed [15 sec]: 118.6 I/s, 220.4 Sols/s
[21:55:03][0x00007ffff0eb63c0] Speed [15 sec]: 118.733 I/s, 221.933 Sols/s
[21:55:14][0x00007ffff0eb63c0] Speed [15 sec]: 119.4 I/s, 225 Sols/s
[21:55:25][0x00007ffff0eb63c0] Speed [15 sec]: 118.933 I/s, 230.333 Sols/s
[21:55:37][0x00007ffff0eb63c0] Speed [15 sec]: 118.667 I/s, 227.667 Sols/s
[21:55:48][0x00007ffff0eb63c0] Speed [15 sec]: 119.4 I/s, 220.333 Sols/s
[21:56:00][0x00007ffff0eb63c0] Speed [15 sec]: 119.067 I/s, 227.667 Sols/s
[21:56:11][0x00007ffff0eb63c0] Speed [15 sec]: 118.733 I/s, 228.4 Sols/s
[21:56:22][0x00007ffff0eb63c0] Speed [15 sec]: 119.067 I/s, 219.867 Sols/s
[21:56:34][0x00007ffff0eb63c0] Speed [15 sec]: 118.8 I/s, 226.533 Sols/s
[21:56:45][0x00007ffff0eb63c0] Speed [15 sec]: 118.467 I/s, 223.2 Sols/s
[21:56:46][0x000070000b2bc000] stratum | Submitting share #4, nonce 000000000000000000000000000000000000000000000000003ea9
[21:56:47][0x000070000b239000] stratum | Accepted share #4
So it happened again, but I guess the last time I was just lucky, a benchmark did not trigger the reset of that phenomen, I guess it was a lucky one. But it is indeed all-time-slow so my guess would stay that I somehow hit the slow-memory I guess. But why does it not reset though... As at the minute the program gets disconnected the CUDA stack should cleanup after itself or at least free stuff...
EDIT: Added CUDA information.
Running with debug enabled on slow job:
[20:01:45][0x000070000eeb9000] miner#0 | Running Equihash solver with nNonce = 0000000000000000000000000000000000000000000000000000015cfec6ef06
[20:01:45][0x000070000eeb9000] miner#0 | Running Equihash solver with nNonce = 0000000000000000000000000000000000000000000000000000025cfec6ef06
[20:01:45][0x000070000eeb9000] miner#0 | Running Equihash solver with nNonce = 0000000000000000000000000000000000000000000000000000035cfec6ef06
[20:01:45][0x000070000eeb9000] miner#0 | Checking solution against target...
[20:01:45][0x000070000eeb9000] miner#0 | Too large: a6660bad9cbe8c979a06d4b22d209653d45103035253e4d7eebba7035a892a6c
[20:01:45][0x000070000eeb9000] miner#0 | Checking solution against target...
[20:01:45][0x000070000eeb9000] miner#0 | Too large: 8fbd61d30e8f07a58b69e163b2cd2d382cdbec6815c02f2aee215c0a64b0c54a
[20:01:45][0x000070000eeb9000] miner#0 | Checking solution against target...
[20:01:45][0x000070000eeb9000] miner#0 | Too large: 79a296fe365f490fbb937dea594c3aa9d21c366cdc8fc1201b7c284e0ab33011
[20:01:45][0x000070000eeb9000] miner#0 | Checking solution against target...
[20:01:45][0x000070000eeb9000] miner#0 | Too large: 0891bc09e7cdeafe59d8f0753ad3733943766d6131df035fd1d19781b328c60d
[20:01:45][0x000070000eeb9000] miner#0 | Running Equihash solver with nNonce = 0000000000000000000000000000000000000000000000000000045cfec6ef06
[20:01:45][0x000070000eeb9000] miner#0 | Checking solution against target...
[20:01:45][0x000070000eeb9000] miner#0 | Too large: 43c3c788afeefebb0d62b8b476adcd7da4e14edea4ef1e9dbc50e00d79b66446
[20:01:45][0x000070000eeb9000] miner#0 | Checking solution against target...
[20:01:45][0x000070000eeb9000] miner#0 | Too large: b8f4498679d339dace8c9a69235b103a6b8adcd03e1636d79abdec152e684f99
[20:01:45][0x000070000eeb9000] miner#0 | Checking solution against target...
[20:01:45][0x000070000eeb9000] miner#0 | Too large: c77fae45da1ed2cf074fa130191113b2d6c4afe8f01cc4cb5f317ec763c8c9a1
[20:01:45][0x000070000eeb9000] miner#0 | Checking solution against target...
[20:01:45][0x000070000eeb9000] miner#0 | Too large: 7b67f7358c6e28f26184be51428309f127d52f5bda85396387e9a661220d0c73
[20:01:45][0x000070000eeb9000] miner#0 | Checking solution against target...
[20:01:45][0x000070000eeb9000] miner#0 | Too large: 6f23d8379470cc41af4e3a094c58786ef3693031983bc3c3e2f2f234b6a13974
[20:01:45][0x000070000eeb9000] miner#0 | Running Equihash solver with nNonce = 0000000000000000000000000000000000000000000000000000055cfec6ef06
[20:01:45][0x000070000eeb9000] miner#0 | Checking solution against target...
[20:01:45][0x000070000eeb9000] miner#0 | Too large: 08dd3682bd6b45f343a35227e9a491f7fe99bc9783f388b95c722e4ca5dc9f21
[20:01:45][0x000070000eeb9000] miner#0 | Running Equihash solver with nNonce = 0000000000000000000000000000000000000000000000000000065cfec6ef06
[20:01:45][0x000070000eeb9000] miner#0 | Checking solution against target...
[20:01:45][0x000070000eeb9000] miner#0 | Too large: 16c1675f735d6bcb7d1ac0f243b7e07bae87f07fe701f0b25a17d81e170ba31e
[20:01:45][0x000070000eeb9000] miner#0 | Checking solution against target...
[20:01:45][0x000070000eeb9000] miner#0 | Too large: 0f06a6aa3cb511b03eb405aac522561dbdbd153b0170ce1be6681b5472ad9d22
[20:01:45][0x000070000eeb9000] miner#0 | Checking solution against target...
Benchmark (./nheqminer-gpu -b -cd 0 -d 0
):
==================== www.nicehash.com ====================
Equihash CPU&GPU Miner for NiceHash v0.5c
Thanks to Zcash developers for providing base of the code.
Special thanks to tromp, xenoncat and djeZo for providing
optimized CPU and CUDA equihash solvers.
==================== www.nicehash.com ====================
Setting log level to 0
[20:09:52][0x00007fffc14713c0] Using SSE2: YES
[20:09:52][0x00007fffc14713c0] Using AVX: YES
[20:09:52][0x00007fffc14713c0] Using AVX2: YES
[20:09:52][0x00007fffc14713c0] Benchmarking CUDA worker (CUDA-DJEZO) GeForce GTX 970 (#0) M=1
[20:09:52][0x00007000018d7000] Thread #0 started (CUDA-DJEZO)
[20:09:53][0x00007fffc14713c0] Benchmark starting... this may take several minutes, please wait...
[20:09:53][0x00007000018d7000] Testing, nonce = 0100000000000000000000000000000000000000000000000000000000000000
[20:09:54][0x00007000018d7000] Testing, nonce = f3b38e1ae436818f59ab596b43f29908e3393e352ccc61ce2d30f118495b0759
[20:09:54][0x00007000018d7000] Solution found, header = 8bd0fa5fed497679ce7f0a1e8848cb64886200c967b44d7232f1ec5732092901
[20:09:54][0x00007000018d7000] Solution found, header = 5460d845d2d3a97e2974a58e7974c1247992f1c1d4fb1abf5567d74e1a6e7ee8
[20:09:54][0x00007000018d7000] Testing, nonce = f23b4f3015b26fca3f72520601e00d99921590e109f321a62af9d844781afbe2
[20:09:54][0x00007000018d7000] Testing, nonce = f670ef62e626155af6cf9b2f4896a4478b5b7bbd976df8448aaf32efdd70a07c
[20:09:54][0x00007000018d7000] Solution found, header = 9bf95949b69714bcc886a2b6667ec9503443a68be2de1d9eb7f2af28a258c52b
[20:09:54][0x00007000018d7000] Testing, nonce = 6e15c7a9f7c8e7881172bd4c3e32a066ed5eaf1ef53b296cc861666498c376bc
[20:09:54][0x00007000018d7000] Solution found, header = 4aee7c87dec782077a3cf45b3cd2378ed6be97100ab913ea7af2ba62e6fdbc2a
[20:09:54][0x00007000018d7000] Solution found, header = acc78484392ea47d45c9656161208e262ff07fda76abec40d09e5ae598fc1ef0
[20:09:54][0x00007000018d7000] Solution found, header = 6771ad38a46e6825d004de624c9689cd317e1b13ec23017420a0ef19d27634d4
[20:09:54][0x00007000018d7000] Solution found, header = e1e08ef9186c9210adac2a3283c96c25f4cdcdc48161e77fc9ee4f8cbe94f546
[20:09:54][0x00007000018d7000] Solution found, header = 470087063682c77f5bd571e6fb0ebd96e121d7311b54795d6d9b3539440b9302
[20:09:54][0x00007000018d7000] Solution found, header = 148d490ca5e96ac42c8d39ce6d3f75838b9288e690480c9ac7436f29f98040f7
[20:09:54][0x00007000018d7000] Solution found, header = 823c69515143719f6c64c27aeeea5db7780d5055e5bc6f5d674aff99b58e8dbf
[20:09:54][0x00007000018d7000] Testing, nonce = 0e2a6fcf538571938f0cc6e4c7dc826825e1099c107ccd4b8e0e8da69f886fac
[20:09:54][0x00007000018d7000] Solution found, header = e0319433fdc031d34bb1007b50ad9bb4f0f0ba426fd9340499c5e3eec84d6795
[20:09:54][0x00007000018d7000] Solution found, header = 3ccb57e5516d63ed750ff007cbfd4ee058671ff7f642656ae2ab8434a71958e8
[20:09:54][0x00007000018d7000] Solution found, header = c45ea2f867baf1932e3e8c49fc1a22b4b84b365d5f031b6346825e79f6145a13
[20:09:54][0x00007000018d7000] Testing, nonce = f788cb509c9a27deb2dc6180a93a4a5f4994c872f89889a6cd9ae6a91f3b2291
[20:09:54][0x00007000018d7000] Solution found, header = 2e69486322ed448d9dceba3607770c1cb96445c882a73e1ca3ec705b2c461cc0
[20:09:54][0x00007000018d7000] Solution found, header = 103d727a839bb4207b7c391eb65e44678f48fe66924a0bcdd4df42b141cb4de1
...
[20:10:37][0x00007fffc14713c0] Speed: 7.60832 I/s
[20:10:37][0x00007fffc14713c0] Speed: 13.8852 Sols/s
CUDA Information (./nheqminer-gpu -ci
)
==================== www.nicehash.com ====================
Equihash CPU&GPU Miner for NiceHash v0.5c
Thanks to Zcash developers for providing base of the code.
Special thanks to tromp, xenoncat and djeZo for providing
optimized CPU and CUDA equihash solvers.
==================== www.nicehash.com ====================
Number of CUDA devices found: 1
#0 GeForce GTX 970 | SM version: 5.2 | SM count: 13
Closed all Applications (e.G. Mail, Firefox) aaaand it's working again. What I don't get I can even do graphics work on this machine while mining, okay the speed will be only at 50% most of the time, but it works. So why does it seem that some application combination can lock the GPU only for those mining relevant tasks... But I got aclue where to debug next time when this is happening. Maybe I can identify the application or combination which is causing that problem...
Also compiled https://github.com/phvu/cuda-smi so next time I can get a view of the Memory usage (or maybe that program resetted the CUDA stack too much unknown variables atm. have to reduce those next time and try/check after each interaction with the system)
Funfact about 42% of the GPU memory is always reserved. Its even reserved now, even after a decent cold reboot -- seems to be normal macOS behavior on my end...
./cuda-smi
Device 0 [PCIe 0:1:0.0]: GeForce GTX 970 (CC 5.2): 2378.2 of 4095.7 MB (i.e. 58.1%) Free
If you find a certain combination I'll try with my setup too (MP 4.1 with GTX960). Memory stats is more or less the same with yours
$ ./cuda-smi
Device 0 [PCIe 0:6:0.0]: GeForce GTX 960 (CC 5.2): 2690.2 of 4095.8 MB (i.e. 65.7%) Free
Operating System: macOS 10.12.6 Sierra Used GPU: GeForce GTX 970 Used Driver: NVIDIA Web Driver: 378.05.05.25f04 Version of Miner: 280fc55 Command used:
./nheqminer-gpu -cd 0 -t 3 -u MyZCashAddress.MinerName -p x -l eu1-zcash.flypool.org:3333
Observed Behavior
Problem Guessing
-d 0
before the bench reset, so we have no advanced debugging in the following logs-d 0
session just to look if it's SW or HW related bug, as if it is the later one we should limit as stated above.Log of miner while incident and before bench-reset:
Log after restarting before bench-reset:
Log of bench-reset:
Log of starting miner after bench-beset: