trexminer / T-Rex

T-Rex NVIDIA GPU miner with web control monitoring page
2.64k stars 439 forks source link

Page File keeps going down till miner crashes #735

Open pumbogongles opened 3 years ago

pumbogongles commented 3 years ago

memorypage

Having crashes every 10-12 hours of miner uptime. Afterburner doesn't detect my GPU and a quick reboot fixes the problem. Finally found out why.

I tried custom page file and it has yielded even lower uptimes (3-5hours).

Is there a way to not have the page file amount decrease so quickly, or how do I configure it to run stable? Or is this my desktop not suited for mining?

I only have 1 GPU and that's a 3060Ti LHR.

trexminer commented 3 years ago

What's your video driver version?

pumbogongles commented 3 years ago

What's your video driver version? 471.68

trexminer commented 3 years ago

I'm not sure but in the past we had a similar problem where there was a memory leak in NVML library, so I'm thinking it could be the case for you too. Could you run the miner with --no-nvml parameter and see if the issue persists? Thanks.

pumbogongles commented 3 years ago

I'm not sure but in the past we had a similar problem where there was a memory leak in NVML library, so I'm thinking it could be the case for you too. Could you run the miner with --no-nvml parameter and see if the issue persists? Thanks.

Thank you for the suggestion. I have been running it with --no-nvml parameter and it is on the 11th hour. Unfortunate the Page file has dropped gradually to 14000MB+ similarly like the past albeit a little slower so I am expecting it to crash within the next few hours. (As of the time I am writing up this post, it seems to drain 759MB in the last 10 minutes)

Can I check if having Afterburner turned on/running on background will cause a memory leak since the nvml functions are called up to be displayed on the UI of Afterburner (Correct me if I am connecting the dots wrongly here). I am intending to downgrade my my drivers to an older version as it seems to have no memory leak based off the info I can find on github. Will report back then

trexminer commented 3 years ago

If you suspect something else other than the miner might me leaking, I'd suggest keeping an eye on the amount of memory used by t-rex first, to eliminate a memory leak in the miner. You can use Process Explorer https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer for that. Make a note of what t-rex process memory usage is, say, 10 minutes after you started it, and then a few hours later.

sebek72 commented 3 years ago

Also seeing random miner crashes with segfault: [ 9510.455043] miner4[3772]: segfault at 21 ip 0000000000467ce4 sp 00007faeef7fd750 error 4 in t-rex[400000+19fa000]

The last time it happened after: 20211016 15:08:13 Dev fee mined (1 min 12 secs)

pumbogongles commented 3 years ago

If you suspect something else other than the miner might me leaking, I'd suggest keeping an eye on the amount of memory used by t-rex first, to eliminate a memory leak in the miner. You can use Process Explorer https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer for that. Make a note of what t-rex process memory usage is, say, 10 minutes after you started it, and then a few hours later.

Hey there, reporting back in. Unexpectedly, the miner did not crash this time but my Page file was around 1000+MB and seemed stuck. Miner runs properly with results but on the webUI of Trexminer, it stopped getting shares about 3 hours after my last comment.

I tried to run DDU and install an older known driver that was stable(465.89) and has no reported memory leak but apparently it wasn't compatible so I went with the route of getting the latest driver. Will report back again in 10-12 hours.

trexminer commented 3 years ago

Also seeing random miner crashes with segfault: [ 9510.455043] miner4[3772]: segfault at 21 ip 0000000000467ce4 sp 00007faeef7fd750 error 4 in t-rex[400000+19fa000]

The last time it happened after: 20211016 15:08:13 Dev fee mined (1 min 12 secs)

What arguments are you launching t-rex with? I'll try to reproduce. Also, what version of t-rex are you using?

sebek72 commented 3 years ago

Also seeing random miner crashes with segfault: [ 9510.455043] miner4[3772]: segfault at 21 ip 0000000000467ce4 sp 00007faeef7fd750 error 4 in t-rex[400000+19fa000] The last time it happened after: 20211016 15:08:13 Dev fee mined (1 min 12 secs)

What arguments are you launching t-rex with? I'll try to reproduce. Also, what version of t-rex are you using?

Running 0.24.2 on hiveos. Standard settings (auto-tune) with ETH + ERG dual mine on 3060v2 and 3060ti LHR.

pumbogongles commented 3 years ago

@trexminer hey there. I can confirm there is a memory leak. May I know if the size of page file/virtual memory affects performance? I have noticed that when the Remaining Page File size is under 8000MB, the miner still runs but on the webUI of the pool I'm using, the profitability is significantly lowered (about 50-60%) reduction based on average profitability every 60mins.

At about 12 hours uptime on the miner, my desktop crashes and everything lags/stops running properly and I am unable to run new applications when that happens.

Do you have any recommendation on what I can do other than restarting my desktop every 8-10 hours(my current solution)?

sebek72 commented 3 years ago

Looks to be repeating:

[ 9510.455043] miner4[3772]: segfault at 21 ip 0000000000467ce4 sp 00007faeef7fd750 error 4 in t-rex[400000+19fa000] [ 9510.455049] Code: df e8 30 7b 4f 00 48 83 7b 58 00 0f 84 8d 00 00 00 48 8b 43 48 0f b6 50 21 31 c0 84 d2 74 54 48 83 c4 18 5b 5d c3 48 8b 43 48 <0f> b6 40 21 84 c0 0f 94 c0 48 83 c4 18 5b 5d c3 0f 1f 40 00 48 8b [ 9513.699449] DTS: killing sk:000000009a0c9807 (127.0.0.1:56940 -> 127.0.0.1:4059) state 6 [ 9956.137851] miner4[4985]: segfault at 21 ip 0000000000467ce4 sp 00007f4c44ff8750 error 4 in t-rex[400000+19fa000] [ 9956.137860] Code: df e8 30 7b 4f 00 48 83 7b 58 00 0f 84 8d 00 00 00 48 8b 43 48 0f b6 50 21 31 c0 84 d2 74 54 48 83 c4 18 5b 5d c3 48 8b 43 48 <0f> b6 40 21 84 c0 0f 94 c0 48 83 c4 18 5b 5d c3 0f 1f 40 00 48 8b [ 9959.324766] DTS: killing sk:000000007a8389f2 (127.0.0.1:57240 -> 127.0.0.1:4059) state 6 [10162.756467] miner0[14112]: segfault at 21 ip 0000000000467ce4 sp 00007eff257f9750 error 4 in t-rex[400000+19fa000] [10162.756473] Code: df e8 30 7b 4f 00 48 83 7b 58 00 0f 84 8d 00 00 00 48 8b 43 48 0f b6 50 21 31 c0 84 d2 74 54 48 83 c4 18 5b 5d c3 48 8b 43 48 <0f> b6 40 21 84 c0 0f 94 c0 48 83 c4 18 5b 5d c3 0f 1f 40 00 48 8b [10165.144140] DTS: killing sk:000000007e015a70 (127.0.0.1:57356 -> 127.0.0.1:4059) state 6 [52897.068535] miner4[376]: segfault at 21 ip 0000000000467ce4 sp 00007f33237fd750 error 4 in t-rex[400000+19fa000] [52897.068542] Code: df e8 30 7b 4f 00 48 83 7b 58 00 0f 84 8d 00 00 00 48 8b 43 48 0f b6 50 21 31 c0 84 d2 74 54 48 83 c4 18 5b 5d c3 48 8b 43 48 <0f> b6 40 21 84 c0 0f 94 c0 48 83 c4 18 5b 5d c3 0f 1f 40 00 48 8b [52900.329472] DTS: killing sk:0000000067906db6 (127.0.0.1:55314 -> 127.0.0.1:4059) state 6 [61925.689876] miner0[29782]: segfault at 21 ip 0000000000467ce4 sp 00007ff6f97fd750 error 4 in t-rex[400000+19fa000] [61925.689891] Code: df e8 30 7b 4f 00 48 83 7b 58 00 0f 84 8d 00 00 00 48 8b 43 48 0f b6 50 21 31 c0 84 d2 74 54 48 83 c4 18 5b 5d c3 48 8b 43 48 <0f> b6 40 21 84 c0 0f 94 c0 48 83 c4 18 5b 5d c3 0f 1f 40 00 48 8b [61928.884174] DTS: killing sk:00000000e1c99fd4 (127.0.0.1:60854 -> 127.0.0.1:4059) state 6

pumbogongles commented 3 years ago

@trexminer

Hello. Sorry to bother again. I was monitoring the Page File drain situation I am currently facing and got some interesting findings that baffles me even further.

I was working from home in another room so I left the miner on idle while periodically came to check on the page file via dxdiag 07:30am - Page File: 49100MB used, 6121MB available 10:00am - Page File: 23796MB used, 31425MB available 12:21pm - Page File: 35317MB used, 19904MB available 13:13pm - Page File: 39757MB used, 15464MB available 15:27pm - Page File: 50787MB used, 6516MB available At 16:30pm I realised I hadn't got any shares on the pool for over an hour. I decided to restart and let the miner run.

20:02pm - Page File: 30329MB used, 24892MB available 20:27pm - Page File: 32792MB used, 22429MB available

I turned on a game and played while mining and when I stopped playing this happened

20:55pm - Page File: 10954MB used, 44267MB available

left it on idle.

21:20pm - Page File: 15240MB used, 39981MB available

started gaming and stopped.

21:36pm - Page File: 10272MB used, 44950MB available

left it on idle.

21:51pm - Page File: 14291MB used, 40930MB available

TL;DR - Available Page File depletes while my PC is idle but increases when I decide to run games. I'm not sure what to make of this, would appreciate any feedback