trexminer / T-Rex

T-Rex NVIDIA GPU miner with web control monitoring page
2.64k stars 439 forks source link

Verify overclock stability after DAG #133

Closed voyagerft closed 3 years ago

voyagerft commented 3 years ago

(ethash) Verify overclock stability after DAG rebuild (Instability detected message is printed in case there are issues)

Check the stability of the overclock after the reconstruction of the DAG, it is blocking, in version 0.19.5 the cards in extreme overclocking work correctly without interruption, after the update to 0.19.7 it signals the instability for OC and freezes without starting to undermine, I think that for the interest of all, it should not be blocking, but only make a report. I had to restore 0.19.5, at the moment 0.19.7 is not reliable

voyagerft commented 3 years ago

Crash Not Freezes!.. crash and restart continuously

trexminer commented 3 years ago

It doesn't stop mining, only reports about the instability. If your machine freezes there must be another reason. What cards are you using? What's the error message when the miner restarts (printed in red). If possible please upload a screenshot or a log file. Thanks.

voyagerft commented 3 years ago

the message is the same one that comes out if a graphics card crashes during mining, in red it says to lower the overclock. the problem is that the DAG creation process does not finish and the watchdog restarts t-rex, again reporting that the Asus P104-100 card has crashed due to illegal memory access, restarting indefinitely. if you lower the overclock, the t-rex starts and then you can restore the overclock as it was originally. In summary, that control of the overclock during the creation of the DAG is not much use and blocks the start of the mining, the previous version 0.19.5 works perfectly, does not give any kind of problem for days, the new version fails to start because it gives illegal access to the memory when creating the DAG and watchdog restarts t-rex indefinitely

taking a screenshot causes me to stop mining if you want more details are available, a tip eliminates the check overclock function when creating the dag

voyagerft commented 3 years ago

20210110 21:02:19 WARN: GPU #0(000300): P104-100, intensity set to 22 20210110 21:02:19 WARN: GPU #1(000500): P104-100, intensity set to 22 20210110 21:02:20 WARN: GPU #2(000700): GeForce GTX 1070 Ti, intensity set to 22 20210110 21:02:20 WARN: GPU #3(000800): P104-100, intensity set to 22 20210110 21:02:20 WARN: GPU #4(000900): Zotac GeForce GTX 1070 Ti, intensity set to 22 20210110 21:02:21 WARN: GPU #5(000a00): P104-100, intensity set to 22 20210110 21:02:21 WARN: GPU #6(000b00): P104-100, intensity set to 22 20210110 21:02:21 WARN: GPU #7(000c00): GeForce GTX 1070 Ti, intensity set to 22 20210110 21:02:21 WARN: GPU #8(000d00): ASUS GeForce GTX 1070 Ti, intensity set to 22 20210110 21:02:22 WARN: GPU #9(000e00): Zotac GeForce GTX 1070 Ti, intensity set to 22 20210110 21:02:22 WARN: GPU #10(000f00): ASUS GeForce GTX 1070 Ti, intensity set to 22 20210110 21:02:24 GPU #10: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:24 GPU #0: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:24 GPU #1: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:24 GPU #2: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:24 GPU #3: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:24 GPU #4: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:24 GPU #5: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:24 GPU #6: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:24 GPU #7: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:24 GPU #8: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:24 GPU #9: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:26 TREX: Can't find nonce with device [ID=3, GPU #3], cuda exception in [encode_light_cache_to_gpu, 391], an illegal memory access was encountered, try to reduce OC to stabilize GPU state 20210110 21:02:26 WARN: Miner is going to shutdown... 20210110 21:02:26 Main loop finished. Cleaning up resources... 20210110 21:02:26 ApiServer: stopped listening on 0.0.0.0:4068 20210110 21:02:27 T-Rex finished. 20210110 21:02:28 WARN: WATCHDOG: T-Rex does not exist anymore, restarting... 20210110 21:02:30 T-Rex NVIDIA GPU miner v0.19.7 - [CUDA v10.0 | Linux] 20210110 21:02:30 r.618737ff569e 20210110 21:02:30 20210110 21:02:30 NVIDIA Driver v410.57 20210110 21:02:30 CUDA devices available: 11 20210110 21:02:30 20210110 21:02:30 WARN: DevFee 1% (ethash) 20210110 21:02:30 20210110 21:02:30 URL : stratum+tcp://eu1.ethermine.org:4444 20210110 21:02:30 USER: 0x067D47D035245891a9D3FB999C744b6d661b73e7.Arca 20210110 21:02:30 PASS: x 20210110 21:02:30 20210110 21:02:30 Starting on: eu1.ethermine.org:4444 20210110 21:02:30 Using protocol: stratum1. 20210110 21:02:30 Authorizing... 20210110 21:02:30 Authorized successfully. 20210110 21:02:30 ethash epoch: 387, block: 11629180, diff: 4.00 Gh 20210110 21:02:30 ApiServer: Telnet server started on 0.0.0.0:4068 20210110 21:02:31 WARN: GPU #0(000300): P104-100, intensity set to 22 20210110 21:02:31 WARN: GPU #1(000500): P104-100, intensity set to 22 20210110 21:02:31 WARN: GPU #2(000700): GeForce GTX 1070 Ti, intensity set to 22 20210110 21:02:31 WARN: GPU #4(000900): Zotac GeForce GTX 1070 Ti, intensity set to 22 20210110 21:02:32 WARN: GPU #5(000a00): P104-100, intensity set to 22 20210110 21:02:32 WARN: GPU #6(000b00): P104-100, intensity set to 22 20210110 21:02:32 WARN: GPU #7(000c00): GeForce GTX 1070 Ti, intensity set to 22 20210110 21:02:33 WARN: GPU #8(000d00): ASUS GeForce GTX 1070 Ti, intensity set to 22 20210110 21:02:33 WARN: GPU #3(000800): P104-100, intensity set to 22 20210110 21:02:33 WARN: GPU #9(000e00): Zotac GeForce GTX 1070 Ti, intensity set to 22 20210110 21:02:33 WARN: GPU #10(000f00): ASUS GeForce GTX 1070 Ti, intensity set to 22 20210110 21:02:36 GPU #10: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:36 GPU #0: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:36 GPU #1: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:36 GPU #2: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:36 GPU #4: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:36 GPU #5: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:36 GPU #6: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:36 GPU #7: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:36 GPU #8: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:36 GPU #3: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:36 GPU #9: generating DAG 4.02 GB for epoch 387 ... 20210110 21:02:37 TREX: Can't find nonce with device [ID=6, GPU #6], cuda exception in [ethash_generate_dag, 441], an illegal memory access was encountered, try to reduce OC to stabilize GPU state 20210110 21:02:37 WARN: Miner is going to shutdown...

voyagerft commented 3 years ago
Schermata 2021-01-10 alle 21 06 44
voyagerft commented 3 years ago

Asus P104-100 is upgraded to 8GB by new firmware but .. With 0.19.5 , same overclock, work fine!

20210110 21:03:12 r.618737ff569e 20210110 21:03:12 20210110 21:03:12 NVIDIA Driver v410.57 20210110 21:03:12 CUDA devices available: 11 20210110 21:03:12 20210110 21:03:12 WARN: DevFee 1% (ethash) 20210110 21:03:12 20210110 21:03:12 URL : stratum+tcp://eu1.ethermine.org:4444 20210110 21:03:12 USER: 0x067D47D035245891a9D3FB999C744b6d661b73e7.Arca 20210110 21:03:12 PASS: x 20210110 21:03:12 20210110 21:03:12 Starting on: eu1.ethermine.org:4444 20210110 21:03:26 T-Rex NVIDIA GPU miner v0.19.5 - [CUDA v10.0 | Linux] 20210110 21:03:26 r.5f0b2f67355c 20210110 21:03:26 20210110 21:03:26 NVIDIA Driver v410.57 20210110 21:03:26 CUDA devices available: 11 20210110 21:03:26 20210110 21:03:26 WARN: DevFee 1% (ethash) 20210110 21:03:26 20210110 21:03:26 URL : stratum+tcp://eu1.ethermine.org:4444 20210110 21:03:26 USER: 0x067D47D035245891a9D3FB999C744b6d661b73e7.Arca 20210110 21:03:26 PASS: x 20210110 21:03:26 20210110 21:03:26 Starting on: eu1.ethermine.org:4444 20210110 21:03:26 Using protocol: stratum1. 20210110 21:03:26 Authorizing... 20210110 21:03:26 Authorized successfully. 20210110 21:03:26 ethash epoch: 387, block: 11629183, diff: 4.00 Gh 20210110 21:03:26 ApiServer: Telnet server started on 0.0.0.0:4068 20210110 21:03:26 WARN: GPU #0(000300): P104-100, intensity set to 22 20210110 21:03:27 WARN: GPU #1(000500): P104-100, intensity set to 22 20210110 21:03:27 WARN: GPU #3(000800): P104-100, intensity set to 22 20210110 21:03:27 WARN: GPU #2(000700): GeForce GTX 1070 Ti, intensity set to 22 20210110 21:03:27 WARN: GPU #4(000900): Zotac GeForce GTX 1070 Ti, intensity set to 22 20210110 21:03:28 WARN: GPU #6(000b00): P104-100, intensity set to 22 20210110 21:03:28 WARN: GPU #5(000a00): P104-100, intensity set to 22 20210110 21:03:28 WARN: GPU #8(000d00): ASUS GeForce GTX 1070 Ti, intensity set to 22 20210110 21:03:28 WARN: GPU #9(000e00): Zotac GeForce GTX 1070 Ti, intensity set to 22 20210110 21:03:29 ethash epoch: 387, block: 11629184, diff: 4.00 Gh 20210110 21:03:29 WARN: GPU #10(000f00): ASUS GeForce GTX 1070 Ti, intensity set to 22 20210110 21:03:29 WARN: GPU #7(000c00): GeForce GTX 1070 Ti, intensity set to 22 20210110 21:03:31 GPU #7: generating DAG 4.02 GB for epoch 387 ... 20210110 21:03:31 GPU #0: generating DAG 4.02 GB for epoch 387 ... 20210110 21:03:31 GPU #1: generating DAG 4.02 GB for epoch 387 ... 20210110 21:03:31 GPU #3: generating DAG 4.02 GB for epoch 387 ... 20210110 21:03:31 ethash epoch: 387, block: 11629185, diff: 4.00 Gh 20210110 21:03:31 GPU #8: generating DAG 4.02 GB for epoch 387 ... 20210110 21:03:31 GPU #6: generating DAG 4.02 GB for epoch 387 ... 20210110 21:03:31 GPU #9: generating DAG 4.02 GB for epoch 387 ... 20210110 21:03:31 GPU #5: generating DAG 4.02 GB for epoch 387 ... 20210110 21:03:31 GPU #2: generating DAG 4.02 GB for epoch 387 ... 20210110 21:03:31 GPU #10: generating DAG 4.02 GB for epoch 387 ... 20210110 21:03:31 GPU #4: generating DAG 4.02 GB for epoch 387 ... 20210110 21:03:37 ethash epoch: 387, block: 11629186, diff: 4.00 Gh 20210110 21:03:39 GPU #5: DAG generated [time: 7740 ms], memory left: 3.80 GB 20210110 21:03:39 GPU #6: DAG generated [time: 7744 ms], memory left: 3.80 GB 20210110 21:03:39 GPU #3: DAG generated [time: 7763 ms], memory left: 3.80 GB 20210110 21:03:40 GPU #7: DAG generated [time: 8958 ms], memory left: 3.78 GB 20210110 21:03:40 GPU #9: DAG generated [time: 9017 ms], memory left: 3.78 GB 20210110 21:03:41 GPU #8: DAG generated [time: 9617 ms], memory left: 3.78 GB 20210110 21:03:41 GPU #10: DAG generated [time: 9645 ms], memory left: 3.78 GB 20210110 21:03:41 GPU #0: DAG generated [time: 9932 ms], memory left: 3.80 GB 20210110 21:03:41 GPU #1: DAG generated [time: 9976 ms], memory left: 3.80 GB 20210110 21:03:43 GPU #2: DAG generated [time: 11787 ms], memory left: 3.78 GB 20210110 21:03:43 ethash epoch: 387, block: 11629187, diff: 4.00 Gh 20210110 21:03:43 GPU #4: DAG generated [time: 12005 ms], memory left: 3.78 GB 20210110 21:03:49 ethash epoch: 387, block: 11629188, diff: 4.00 Gh 20210110 21:03:55 GPU #6: using kernel #4 20210110 21:03:55 GPU #5: using kernel #4 20210110 21:03:55 GPU #3: using kernel #4 20210110 21:03:57 GPU #9: using kernel #5 20210110 21:03:57 GPU #7: using kernel #4 20210110 21:03:57 GPU #0: using kernel #5 20210110 21:03:57 GPU #1: using kernel #4 20210110 21:03:57 GPU #8: using kernel #2 20210110 21:03:57 GPU #10: using kernel #2 20210110 21:03:59 GPU #2: using kernel #5 20210110 21:04:00 GPU #4: using kernel #5 20210110 21:04:09 ethash epoch: 387, block: 11629189, diff: 4.00 Gh 20210110 21:04:26 [ OK ] 1/1 - 388.11 MH/s, 58ms ... GPU #7

trexminer commented 3 years ago

Thanks. We'll fix it in the next version.

trexminer commented 3 years ago

Hi @voyagerft Could you please try this beta build https://www.dropbox.com/s/zdbmav72ivh80i5/t-rex-0.19.8-linux-cuda10.0.tar.gz?dl=1 and let me know if the issue if fixed?

voyagerft commented 3 years ago

Hi @trexminer I have no way to try it the miner is ethosdistro, I install the package through https://github.com/cynixx3/third-party-miner-installer-for-ethos, through the update "sudo miner-manager t-rex update" when the version is available, this in dropbox i would not know where to install it to test it, sorry

trexminer commented 3 years ago

No problem, we'll find another way to test it. Thanks.

MasterG33 commented 3 years ago

=== GPU 0, 01:00.0 GeForce RTX 3080 10015 MB, PL: 100 W, 340 W, 375 W === 00:42:07 SET POWER LIMIT: 235.0 W [Unknown Error] (exicode=123) Max Perf mode: 4 (auto) Attribute 'GPUGraphicsClockOffset' was already set to -200 Attribute 'GPUMemoryTransferRateOffset' was already set to 2600 ERROR: Error assigning value 80 to attribute 'GPUTargetFanSpeed' (Goldroom:0[fan:0]) as specified in assignment '[fan:0]/GPUTargetFanSpeed=80' (Unknown Error). ERROR: Error assigning value 80 to attribute 'GPUTargetFanSpeed' (Goldroom:0[fan:1]) as specified in assignment '[fan:1]/GPUTargetFanSpeed=80' (Unknown Error). Attribute 'GPUFanControlState' (Goldroom:0[gpu:0]) assigned value 1.

20210114 00:43:12 WARN: GPU #0(000100): ASUS GeForce RTX 3080, intensity set to 22 20210114 00:43:12 WARN: GPU #2(000300): ASUS GeForce RTX 3080, intensity set to 22 20210114 00:43:13 WARN: GPU #1(000200): ASUS GeForce RTX 3080, intensity set to 22 20210114 00:43:13 WARN: NVML: can't get fan speed for GPU #0, error code 999 20210114 00:43:13 WARN: GPU #3(000500): ASUS GeForce RTX 3080, intensity set to 22 20210114 00:43:13 WARN: GPU #4(000600): ASUS GeForce RTX 3080, intensity set to 22 20210114 00:43:13 WARN: GPU #5(000700): ASUS GeForce RTX 3080, intensity set to 22 20210114 00:43:14 WARN: GPU #6(000800): ASUS GeForce RTX 3080, intensity set to 22 20210114 00:43:23 GPU #2: generating DAG 4.03 GB for epoch 388 ... 20210114 00:43:24 GPU #4: generating DAG 4.03 GB for epoch 388 ... 20210114 00:43:24 GPU #0: generating DAG 4.03 GB for epoch 388 ... 20210114 00:43:24 GPU #1: generating DAG 4.03 GB for epoch 388 ... 20210114 00:43:24 GPU #6: generating DAG 4.03 GB for epoch 388 ... 20210114 00:43:24 GPU #5: generating DAG 4.03 GB for epoch 388 ... 20210114 00:43:24 GPU #3: generating DAG 4.03 GB for epoch 388 ... 20210114 00:43:27 TREX: Can't find nonce with device [ID=0, GPU #0], cuda exception in [ethash_generate_dag, 482], an illegal memory access was encountered, try to reduce OC to stabilize GPU state 20210114 00:43:27 WARN: Miner is going to shutdown...

Same error for me. happening every 12 hours or so since last update.

MasterG33 commented 3 years ago

@trexminer let me know if there are other logs you might need or whatnot.

trexminer commented 3 years ago

@MasterG33 there seems to be a hardware related problem with GPU#0, try running the miner with GPU#0 excluded -d 1,2,3,4,5,6, and see it works.

MasterG33 commented 3 years ago

@trexminer I have been running version 19.5 for the last 4-5 hours without an issue. Last error i got was for gpu1

20210114 16:32:13 WARN: NVML: can't get fan speed for GPU #0, error code 999 20210114 16:32:13 WARN: GPU #5(000700): ASUS GeForce RTX 3080, intensity set to 22 20210114 16:32:13 WARN: GPU #3(000500): ASUS GeForce RTX 3080, intensity set to 22 20210114 16:32:14 WARN: GPU #4(000600): ASUS GeForce RTX 3080, intensity set to 22 20210114 16:32:14 WARN: GPU #2(000300): ASUS GeForce RTX 3080, intensity set to 22 20210114 16:32:24 GPU #1: generating DAG 4.03 GB for epoch 388 ... 20210114 16:32:24 GPU #6: generating DAG 4.03 GB for epoch 388 ... 20210114 16:32:24 GPU #3: generating DAG 4.03 GB for epoch 388 ... 20210114 16:32:24 GPU #0: generating DAG 4.03 GB for epoch 388 ... 20210114 16:32:24 GPU #2: generating DAG 4.03 GB for epoch 388 ... 20210114 16:32:24 GPU #5: generating DAG 4.03 GB for epoch 388 ... 20210114 16:32:24 GPU #4: generating DAG 4.03 GB for epoch 388 ... 20210114 16:32:30 GPU #1: DAG generated [crc: 49759add, time: 6614 ms], memory left: 5.53 GB 20210114 16:32:30 GPU #2: DAG generated [crc: 49759add, time: 6624 ms], memory left: 5.53 GB 20210114 16:32:30 GPU #3: DAG generated [crc: 49759add, time: 6665 ms], memory left: 5.53 GB 20210114 16:32:30 GPU #6: DAG generated [crc: 49759add, time: 6701 ms], memory left: 5.53 GB 20210114 16:32:30 GPU #4: DAG generated [crc: 49759add, time: 6704 ms], memory left: 5.53 GB 20210114 16:32:30 GPU #5: DAG generated [crc: 49759add, time: 6713 ms], memory left: 5.53 GB 20210114 16:32:36 TREX: Can't stop device [ID=1, GPU #1], cuda exception in [initiate_next_loop, 183], the launch timed out and was terminated 20210114 16:32:36 WARN: Miner is going to shutdown... 20210114 16:32:36 WARN: ApiServer: request took long, 4704.7ms for /summary 20210114 16:32:36 Main loop finished. Cleaning up resources... 20210114 16:32:36 ApiServer: stopped listening on 127.0.0.1:4058 20210114 16:32:39 T-Rex finished.

t-rex exited (exitcode=0), waiting to cooldown a bit

Trying to release TIME_WAIT sockets: tcp 0 0 127.0.0.1:56504 127.0.0.1:4058 TIME_WAIT
tcp 0 0 127.0.0.1:56498 127.0.0.1:4058 TIME_WAIT

20210114 16:32:55 T-Rex NVIDIA GPU miner v0.19.7 - [CUDA v11.10 | Linux] 20210114 16:32:55 r.618737ff569e 20210114 16:32:55 20210114 16:32:55 NVIDIA Driver v460.32.03 20210114 16:32:55 CUDA devices available: 7 20210114 16:32:55 20210114 16:32:55 WARN: DevFee 1% (ethash) 20210114 16:32:55 20210114 16:32:55 === MAIN POOL === | back to main in 10 mins | 20210114 16:32:55 URL : stratum+tcp://eth-us.sparkpool.com:3333 20210114 16:32:55 USER: 0xaA7B55818CcD33b62fE645EA4f1362E33cC78F78 20210114 16:32:55 PASS: x 20210114 16:32:55 WRK : Goldroom 20210114 16:32:55 20210114 16:32:55 URL : stratum+tcp://eth-us.sparkpool.com:13333 20210114 16:32:55 USER: 0xaA7B55818CcD33b62fE645EA4f1362E33cC78F78 20210114 16:32:55 PASS: x 20210114 16:32:55 WRK : Goldroom 20210114 16:32:55 20210114 16:32:55 WARN: Built-in watchdog has been disabled! 20210114 16:32:55 Starting on: eth-us.sparkpool.com:3333 20210114 16:32:56 Using protocol: stratum1. 20210114 16:32:56 Authorizing... 20210114 16:32:56 Authorized successfully. 20210114 16:32:56 ethash epoch: 388, diff: 4.00 Gh 20210114 16:32:56 ApiServer: Telnet server started on 127.0.0.1:4058 20210114 16:32:56 WARN: GPU #0(000100): ASUS GeForce RTX 3080, intensity set to 22 20210114 16:32:56 WARN: GPU #1(000200): ASUS GeForce RTX 3080, intensity set to 22 20210114 16:32:57 WARN: GPU #3(000500): ASUS GeForce RTX 3080, intensity set to 22 20210114 16:32:57 WARN: NVML: can't get fan speed for GPU #0, error code 999 20210114 16:32:57 WARN: GPU #6(000800): ASUS GeForce RTX 3080, intensity set to 22 20210114 16:32:57 WARN: GPU #4(000600): ASUS GeForce RTX 3080, intensity set to 22 20210114 16:32:57 WARN: GPU #5(000700): ASUS GeForce RTX 3080, intensity set to 22 20210114 16:32:58 WARN: GPU #2(000300): ASUS GeForce RTX 3080, intensity set to 22 20210114 16:33:07 GPU #0: generating DAG 4.03 GB for epoch 388 ... 20210114 16:33:07 GPU #3: generating DAG 4.03 GB for epoch 388 ... 20210114 16:33:07 GPU #4: generating DAG 4.03 GB for epoch 388 ... 20210114 16:33:08 GPU #5: generating DAG 4.03 GB for epoch 388 ... 20210114 16:33:08 GPU #1: generating DAG 4.03 GB for epoch 388 ... 20210114 16:33:08 GPU #2: generating DAG 4.03 GB for epoch 388 ... 20210114 16:33:08 GPU #6: generating DAG 4.03 GB for epoch 388 ... 20210114 16:33:14 GPU #1: DAG generated [crc: 49759add, time: 7070 ms], memory left: 5.53 GB 20210114 16:33:15 GPU #6: DAG generated [crc: 49759add, time: 7136 ms], memory left: 5.53 GB 20210114 16:33:15 GPU #5: DAG generated [crc: 49759add, time: 7151 ms], memory left: 5.53 GB 20210114 16:33:15 GPU #2: DAG generated [crc: 49759add, time: 7192 ms], memory left: 5.53 GB 20210114 16:33:15 GPU #4: DAG generated [crc: 49759add, time: 7197 ms], memory left: 5.53 GB 20210114 16:33:15 GPU #3: DAG generated [crc: 49759add, time: 7201 ms], memory left: 5.53 GB

MasterG33 commented 3 years ago

@trexminer this is the latest generated crash running 0.19.5 still think this is a hardware issue? will try and run with hardware remove if you think it's worth a try

Thanks

20210115 00:58:16 WARN: GPU #5(000700): ASUS GeForce RTX 3080, intensity set to 22 20210115 00:58:16 WARN: GPU #6(000800): ASUS GeForce RTX 3080, intensity set to 22 20210115 00:58:17 WARN: GPU #2(000300): ASUS GeForce RTX 3080, intensity set to 22 20210115 00:58:17 WARN: GPU #3(000500): ASUS GeForce RTX 3080, intensity set to 22 20210115 00:58:27 GPU #6: generating DAG 4.03 GB for epoch 388 ... 20210115 00:58:27 GPU #5: generating DAG 4.03 GB for epoch 388 ... 20210115 00:58:27 GPU #0: generating DAG 4.03 GB for epoch 388 ... 20210115 00:58:27 GPU #2: generating DAG 4.03 GB for epoch 388 ... 20210115 00:58:27 GPU #1: generating DAG 4.03 GB for epoch 388 ... 20210115 00:58:27 GPU #4: generating DAG 4.03 GB for epoch 388 ... 20210115 00:58:27 GPU #3: generating DAG 4.03 GB for epoch 388 ... 20210115 00:58:29 ERROR: Can't find nonce with device [ID=0, GPU #0], cuda exception in [ethash_generate_dag, 347], an illegal memory access was encountered 20210115 00:58:29 WARN: Miner is going to shutdown... 20210115 00:58:29 ERROR: Can't find nonce with device [ID=2, GPU #2], cuda exception in [ethash_generate_dag, 347], an illegal memory access was encountered 20210115 00:58:29 ERROR: Can't find nonce with device [ID=1, GPU #1], cuda exception in [ethash_generate_dag, 347], an illegal memory access was encountered 20210115 00:58:29 ERROR: Can't find nonce with device [ID=3, GPU #3], cuda exception in [ethash_generate_dag, 347], an illegal memory access was encountered 20210115 00:58:29 ERROR: Can't find nonce with device [ID=6, GPU #6], cuda exception in [ethash_generate_dag, 347], an illegal memory access was encountered 20210115 00:58:29 ERROR: Can't find nonce with device [ID=5, GPU #5], cuda exception in [ethash_generate_dag, 347], an illegal memory access was encountered 20210115 00:58:29 ERROR: Can't find nonce with device [ID=4, GPU #4], cuda exception in [ethash_generate_dag, 347], an illegal memory access was encountered 20210115 00:58:29 Main loop finished. Cleaning up resources... 20210115 00:58:29 ApiServer: stopped listening on 127.0.0.1:4058 terminate called after throwing an instance of 'CudaException' what(): cuda exception in [initiate_next_loop, 183], an illegal memory access was encountered /hive/miners/t-rex/h-run.sh: line 18: 9131 Aborted (core dumped) t-rex -c config.json 2>&1

t-rex exited (exitcode=134), waiting to cooldown a bit

Trying to release TIME_WAIT sockets: tcp 0 0 127.0.0.1:39726 127.0.0.1:4058 TIME_WAIT
tcp 0 0 127.0.0.1:39722 127.0.0.1:4058 TIME_WAIT

20210115 00:58:48 T-Rex NVIDIA GPU miner v0.19.5 - [CUDA v11.10 | Linux] 20210115 00:58:48 r.5f0b2f67355c 20210115 00:58:48 20210115 00:58:48 NVIDIA Driver v460.32.03 20210115 00:58:48 CUDA devices available: 7 20210115 00:58:48 20210115 00:58:48 WARN: DevFee 1% (ethash) 20210115 00:58:48 20210115 00:58:48 === MAIN POOL === | back to main in 10 mins | 20210115 00:58:48 URL : stratum+tcp://eth-us.sparkpool.com:3333 20210115 00:58:48 USER: 0xaA7B55818CcD33b62fE645EA4f1362E33cC78F78 20210115 00:58:48 PASS: x 20210115 00:58:48 WRK : Goldroom 20210115 00:58:48 20210115 00:58:48 URL : stratum+tcp://eth-us.sparkpool.com:13333 20210115 00:58:48 USER: 0xaA7B55818CcD33b62fE645EA4f1362E33cC78F78 20210115 00:58:48 PASS: x 20210115 00:58:48 WRK : Goldroom 20210115 00:58:48 20210115 00:58:48 WARN: Built-in watchdog has been disabled! 20210115 00:58:48 Starting on: eth-us.sparkpool.com:3333 20210115 00:58:48 Using protocol: stratum1. 20210115 00:58:48 Authorizing... 20210115 00:58:48 Authorized successfully. 20210115 00:58:49 ethash epoch: 388, diff: 4.00 Gh 20210115 00:58:49 ApiServer: Telnet server started on 127.0.0.1:4058 20210115 00:58:49 WARN: GPU #0(000100): ASUS GeForce RTX 3080, intensity set to 22 20210115 00:58:49 WARN: GPU #2(000300): ASUS GeForce RTX 3080, intensity set to 22 20210115 00:58:49 WARN: GPU #5(000700): ASUS GeForce RTX 3080, intensity set to 22 20210115 00:58:50 WARN: NVML: can't get fan speed for GPU #0, error code 999 20210115 00:58:50 WARN: GPU #4(000600): ASUS GeForce RTX 3080, intensity set to 22 20210115 00:58:50 WARN: GPU #3(000500): ASUS GeForce RTX 3080, intensity set to 22 20210115 00:58:50 WARN: GPU #1(000200): ASUS GeForce RTX 3080, intensity set to 22 20210115 00:58:51 WARN: GPU #6(000800): ASUS GeForce RTX 3080, intensity set to 22 20210115 00:59:00 GPU #6: generating DAG 4.03 GB for epoch 388 ... 20210115 00:59:00 GPU #2: generating DAG 4.03 GB for epoch 388 ... 20210115 00:59:00 GPU #4: generating DAG 4.03 GB for epoch 388 ... 20210115 00:59:00 GPU #1: generating DAG 4.03 GB for epoch 388 ... 20210115 00:59:00 GPU #3: generating DAG 4.03 GB for epoch 388 ... 20210115 00:59:00 GPU #5: generating DAG 4.03 GB for epoch 388 ... 20210115 00:59:00 GPU #0: generating DAG 4.03 GB for epoch 388 ... 20210115 00:59:03 GPU #6: DAG generated [time: 2802 ms], memory left: 5.53 GB 20210115 00:59:03 GPU #5: DAG generated [time: 2808 ms], memory left: 5.53 GB 20210115 00:59:03 GPU #2: DAG generated [time: 2810 ms], memory left: 5.53 GB 20210115 00:59:03 GPU #3: DAG generated [time: 2811 ms], memory left: 5.53 GB 20210115 00:59:03 GPU #1: DAG generated [time: 2831 ms], memory left: 5.53 GB 20210115 00:59:03 GPU #4: DAG generated [time: 2839 ms], memory left: 5.53 GB

trexminer commented 3 years ago

20210114 16:32:24 GPU #1: generating DAG 4.03 GB for epoch 388 ... 20210114 16:32:24 GPU #6: generating DAG 4.03 GB for epoch 388 ... 20210114 16:32:24 GPU #3: generating DAG 4.03 GB for epoch 388 ... 20210114 16:32:24 GPU #0: generating DAG 4.03 GB for epoch 388 ... 20210114 16:32:24 GPU #2: generating DAG 4.03 GB for epoch 388 ... 20210114 16:32:24 GPU #5: generating DAG 4.03 GB for epoch 388 ... 20210114 16:32:24 GPU #4: generating DAG 4.03 GB for epoch 388 ... 20210114 16:32:30 GPU #1: DAG generated [crc: 49759add, time: 6614 ms], memory left: 5.53 GB 20210114 16:32:30 GPU #2: DAG generated [crc: 49759add, time: 6624 ms], memory left: 5.53 GB 20210114 16:32:30 GPU #3: DAG generated [crc: 49759add, time: 6665 ms], memory left: 5.53 GB 20210114 16:32:30 GPU #6: DAG generated [crc: 49759add, time: 6701 ms], memory left: 5.53 GB 20210114 16:32:30 GPU #4: DAG generated [crc: 49759add, time: 6704 ms], memory left: 5.53 GB 20210114 16:32:30 GPU #5: DAG generated [crc: 49759add, time: 6713 ms], memory left: 5.53 GB 20210114 16:32:36 TREX: Can't stop device [ID=1, GPU #1], cuda exception in [initiate_next_loop, 183], the launch timed out and was terminated

Even though the error occurred for GPU#1 this time, GPU#0 is the one that didn't print "DAG generated" message. I'm still convinced the miner will start fine if you launch it with -d 1,2,3,4,5,6. Let me know how it goes if you decide to give it a try.

MasterG33 commented 3 years ago

@trexminer just to be clear, the rig starts and run fine for about 12 hours at a time. been failing at around 1am for the last 3-4 days . there seems to be a pattern here. Just trying to wrap my head around this, might this be the overclock causing the issues? i am running high mem clock on all of them. hiveos 2700mhz . i have been reducing mem overclock on gpu0 to try and see if it stabilizes. is that even something i should be looking do to ? any safe setting i could try beside bringning it offline? now at 2550 for gpu0

If i bring it offline and it runs well, any advice? connection or power issue? or just a bad gpu? how do i go from there?

trexminer commented 3 years ago

Yes, reducing OC usually fixes this kind of problems. Once you've confirmed GPU#0 is causing the issue, you can run two instances of the miner: one with -d 0, and another one with -d 1,2,3,4,5,6. This way if the first one crashes, the second one has a better chance of keeping going. Dial OC down a bit on your GPU#0 and find a spot where it stops crashing. Then you can go back to running one instance of t-rex with all the cards.

trexminer commented 3 years ago

Closing due to inactivity