nanopool / Claymore-Dual-Miner

Claymore's Dual Ethereum+Decred_Siacoin_Lbry AMD+NVIDIA GPU Miner
1.12k stars 276 forks source link

Memory leak when restarting thread #106

Open agrajag9 opened 7 years ago

agrajag9 commented 7 years ago

First reported here: https://www.reddit.com/r/EtherMining/comments/6yfqxg/not_enough_gpu_memory_to_place_dag_you_cannot/

When a pool connection fails, we see an error in STDOUT that the miner thread is hanging and needs to be restarted. The thread is then restarted successfully, but without reinitializing the VRAM, as shown below (ellipsized for brevity):

DevFee: ETH: Stratum - connecting to 'us1.ethpool.org' <149.56.26.222> port 3333
...
ETH: Stratum - Cannot connect to us1.ethpool.org:3333
DevFee: ETH: Stratum - Failed to connect, retry in 20 sec...
...
Miner thread hangs, need to restart miner!

ͼ

ETH: 2 pools are specified
Main Ethereum pool is us1.ethermine.org:4444
At least 16 GB of Virtual Memory is required for multi-GPU systems
Make sure you defined GPU_MAX_ALLOC_PERCENT 100
Be careful with overclocking, use default clocks for first tests
Press "s" for current statistics, "0".."9" to turn on/off cards, "r" to reload pools, "e" or "d" to select current pool
OpenCL initializing...

AMD Cards available: 1
GPU #0: Ellesmere, 1461 MB available, 36 compute units
GPU #0 recognized as Radeon RX 480/580

This same card at first initialization is detected with 8169 MB available.

Hardware: https://pcpartpicker.com/list/f6Cyhq

Version: 0ebb105bd3a6cdd35d94663eabf245e9 Claymore.s.Dual.Ethereum.Decred_Siacoin_Lbry_Pascal.AMD.NVIDIA.GPU.Miner.v9.8.-.LINUX.tar.gz a919e303d2250f2719c7a28bfebd9a79 ethdcrminer64

Drivers: AMDGPU-PRO Driver Version 17.30 for Ubuntu 16.04.3

OS: Ubuntu 16.04.3 LTS

[ 2017-09-07T12:57:26 agrajag9@eth1.srv.a9development.com:/home/agrajag9 ]
$ uname -a
Linux eth1.srv.a9development.com 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
koenvandenberge commented 7 years ago

Same issue here on Windows 10

LionRelaxe commented 7 years ago

Same here, it's systematically going lower and lower at each restart-inducing error, on two separate miners. Setup: Xubuntu 16.04 LTS, one miner with 1x RX580 4gb, one miner with 3x RX580 4gb. Noticed on Claymore 10.0, pretty sure it happened on 9.7 too. AMD Drivers 17.10, and also with 17.30 with the ROCm kernel.

Note that pressing CTRL-C, and manually restarting miner restores the memory to maximum.

==== GPU0 t=57C fan=92% DCR: 09/21/17-09:52:52 - New job from dcr-us.coinmine.pl:2222 DCR: 09/21/17-09:52:56 - New job from dcr-us.coinmine.pl:2222 ETH: Stratum - Cannot connect to us1.ethermine.org:4444 DevFee: ETH: Stratum - Failed to connect, retry in 20 sec... ETH: 09/21/17-09:53:20 - New job from us2.ethermine.org:14444 ETH - Total Speed: 24.548 Mh/s, Total Shares: 6, Rejected: 0, Time: 00:21 ETH: GPU0 24.548 Mh/s DCR - Total Speed: 1153.749 Mh/s, Total Shares: 35, Rejected: 1 DCR: GPU0 1153.749 Mh/s GPU0 t=57C fan=92% Miner thread hangs, need to restart miner!

����������������������������������������������������������������ͻ � Claymore's Dual ETH + DCR/SC/LBC/PASC GPU Miner v10.0 � ����������������������������������������������������������������ͼ

ETH: 9 pools are specified Main Ethereum pool is us2.ethermine.org:14444 DCR: 1 pool is specified Main Decred pool is dcr-us.coinmine.pl:2222 At least 16 GB of Virtual Memory is required for multi-GPU systems Make sure you defined GPU_MAX_ALLOC_PERCENT 100 Be careful with overclocking, use default clocks for first tests Press "s" for current statistics, "0".."9" to turn on/off cards, "r" to reload pools, "e" or "d" to select current pool OpenCL initializing...

AMD Cards available: 1 GPU #0: Ellesmere, 1714 MB available, 36 compute units GPU #0 recognized as Radeon RX 480/580 POOL/SOLO version GPU #0: algorithm ASM No NVIDIA CUDA GPUs detected. Total cards: 1 AMD ADL library not found. ETH: Stratum - connecting to 'us2.ethermine.org' <45.79.103.105> port 14444 DUAL MINING MODE ENABLED: ETHEREUM+DECRED ETH: eth-proxy stratum mode "-allpools" option is set, default pools can be used for devfee, check "Readme" file for details. Watchdog enabled Remote management (READ-ONLY MODE) is enabled on port 3333

costinh commented 6 years ago

I'm starting to have this issue on all of my miners, did you guys find a workaround?

JusCallMeRico commented 6 years ago

I'm having same problem. Manually relaunching miner does restore memory to max.. but only runs for a few hours at best...

Running 3 ASUS RX 570 ROG Strix OC with 1112 cclock and 2000 mclock; in total rig only pulling 420W so don't think this is thermal. Running Xubuntu and Claymore V10

Glad I'm not alone.. sad there doesn't seem to be a fix at the moment.

LionRelaxe commented 6 years ago

I've found a workaround. Usually, Claymore enter this states, then retry-fail-"take more memory"-retry-refail-repeat. Closing claymore (CTRL-C) before the computer is totally jammed works, and free the VRAM. Restarting Claymore works.

My workaround is to use the -r 1 option, forcing Claymore to close. You can reboot if you wish. I invoke claymore in a bash script with a forever loop, forcing it to restart on closure. So when the snag hits, Claymore kills itself and the script restarts it. Hope this helps.

agrajag9 commented 6 years ago

Yes, killing and restarting the process resolves the issue temporarily as the kernel frees the memory once the PID no longer needs it. However this is not a viable long-term solution as it doesn't effectively mitigate the memory leak when creating the new process.

In order to resolve the memory leak problem, when the code enters a failed state, it should exit with a non-0 return value. This is standard procedure for applications that enter unrecoverable failed states. Although in my particular case the memory leak is recoverable, in other situations where the GPU is unresponsive it may not be recoverable. As such, if the situations are not capable of being handled independently, then they should be handled as the worst-case scenario (unresponsive hardware).

The non-0 return value also allows the parent process (e.g. a script) to effectively handle the failure itself.

imperialgames commented 6 years ago

Do you guys mine only eth by chance? we are having the same problem (available memory decreasing over time until i can't assign dag file).

JusCallMeRico commented 6 years ago

Yes ETH only... tbh switched to ethOS with exact same settings and it ran faster and completely stable... this was about two weeks ago and it's still stable on ethOS getting ready to add a couple more cards

agrajag9 commented 6 years ago

I am mining ETH only, but I suspect the leak persists across dual mining as well since it appears to be related to how ETH-mining threads are killed.

It sure would be nice if @nanopool would show up in this thread. This bug should be fixable with a simple destructor update for the thread class.

Mr10001 commented 6 years ago

I have this problem on EthOS as well.

mrsags commented 6 years ago

Increase virtual memory to match memory of ALL cards. I have 8x 4gb and 1 8gb. Virtual memory = 40GB ( set at 45gb for Windows program cache). This worked :)

ghost commented 6 years ago

This is not a virtual memory related issue, the best could be a driver or memory errors clogging up with time. Sometimes this occurs immediately after restarting a miner, and other times it takes one or two days for it to shit the bed.

mrsags commented 6 years ago

Have any tried the fix and repeated the error? Also, make sure you DON’T all “lock pages in memory” under group policy. This cause tons of problem across mining all kinds of coins...

mrsags commented 6 years ago

My fix is to prevent the hanging and restarting in the first place..

YasserGomaa commented 6 years ago

The same problem here with ETHOS Any HELP ?

imperialgames commented 6 years ago

only way we found on ethos is to remove the dev fee with claymore=flags -r 1 -nofee

YasserGomaa commented 6 years ago

Where to type this line ?

imperialgames commented 6 years ago

in either your local or remote config file.

YasserGomaa commented 6 years ago

so after adding the line what will happen ?

JusCallMeRico commented 6 years ago

Going to attempt this now with fresh ethOS images; are we sure it's not supposed to be in claymore.stub.conf?

YasserGomaa commented 6 years ago

this is my old config flags --cl-global-work 8192 --farm-recheck 200

imperialgames commented 6 years ago

the line says, claymore only flags = (reboot 1 if crash) and (no fee to dev). the problem is that the dev fee pool keep disconnecting. that's why the memory does not get flushed.

imperialgames commented 6 years ago

keep the regular flags line

YasserGomaa commented 6 years ago

so it should be like this "flags -r 1 -nofee" or "claymore=flags -r 1 -nofee"

imperialgames commented 6 years ago

claymore=flags -r 1 -nofee cause it will apply only when you use the globalminer claymore

YasserGomaa commented 6 years ago

globaldriver amdgpu maxgputemp 85 globalminer claymore stratumproxy enabled globalfan 85 proxywallet Mywallet proxypool1 eu1.ethermine.org:4444 globalcore 1400 globalmem 2000 globalfan 90 globalpowertune 4 flags --cl-global-work 8192 --farm-recheck 200 claymore=flags -r 1 -nofee

YasserGomaa commented 6 years ago

is this good configuration ?

imperialgames commented 6 years ago

maxgputemp 90 stratumproxy enabled

ETH POOL

globalminer claymore proxypool1 us1.ethermine.org:14444 proxywallet WALLET dualminer enabled dualminer-coin lbry dualminer-pool lbry.suprnova.cc:6256 dualminer-wallet WALLET claymore=flags -r 1 -nofee 1 -mport 3333 -allcoins 1 flags --cl-global-work 16384 --farm-recheck 200

that's mine

JusCallMeRico commented 6 years ago

imperialgames: "the line says, claymore only flags = (reboot 1 if crash) and (no fee to dev). the problem is that the dev fee pool keep disconnecting. that's why the memory does not get flushed."

I had been running with -allpools 1 but am going to try taking that out... also had been leaving -allcoins 1 out as well... not sure if it'll help but we'll see

YasserGomaa commented 6 years ago

So is there anyway to mine etherum without claymore ?

JusCallMeRico commented 6 years ago

Yup, Etherminer.... slower but honestly slow and steady may just win the race

YasserGomaa commented 6 years ago

Dears i found something strange however i changed the user name and password of ethos however i found strange commands have been typed in the shell the followed link has been added to my pc and scripts from it ran to my system https://github.com/pooler/cpuminer

do this mean that iam hacked hhhhh

YasserGomaa commented 6 years ago

ETH+LBRY how to dual mine ?

what to type in claymore.stub.conf ?

imperialgames commented 6 years ago

Check the settings i posted above you do it in the local or remote.conf

YasserGomaa commented 6 years ago

so what about the claymore.stub.conf ?

imperialgames commented 6 years ago

You dont have to touch it

YasserGomaa commented 6 years ago

imperialgames what is the default for claymore.stub.conf ?