Closed icook closed 9 years ago
As an update on this, I've added a few debugging tools to the latest version and have been running it on our lowest use port. I tried using the tools while not running live, but it was difficult to tell what was "leaking" without some bigger numbers/longer runtimes.
The tool I've installed and used thus far dumps a list of how many instances of each type of object are held in memory. Then when you run it again, it shows how many more there are now vs last time you ran it.
Results are roughly:
Transaction
objects. I doubt this is the cause of the big leak. My guess is that something in the reporting engine holds reference to the BlockTemplate
long term, although I'm not sure.I've added some code to dump more information about the tuples, and will wait and see the results.
It appears that it may actually be just the job mapper, which would make me feel silly. I new entry goes in the job mapper (as a 2-tuple) every job push or flush. We're doing about 370 push/flush actions per hour on litecoin servers, which with 383 workers (on the biggest litecoin port) is 3.4 million tuples a day, assuming no one disconnects.
I ran a quick test and it showed that 10 million of these tuples (including the weakref, etc) takes ~1 GB of ram. So if this is the culprit we should see something like 200-300Mb a day increase in usage in the largest server. I'm proposing the next step is to resolve this problem and then re-eval.
yes, we should remove old jobs from job mapper
confirmed fixed by #98?
Are we running this in prod anywhere?
Pretty sure most of the vanilla coin stratums are running 6.0, so yes
Oh I just realized you might have meant the patched PP. The PS ports are running 0.6.1 - not sure if it includes this patch or not
Right, sorry I wasn't very clear on my question. I don't think we're running this anywhere, and until we are we won't be able to easily confirm it.
still on master... its grows and i think its tyhe scheduler...
@Fcases since master
changes frequently could you provide the version number and more specific details?
no idea, i git pulled today and restarted pp, as seen on #irc
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
2428 root 20 0 473M 190M 4644 S 1.3 4.8 0:18.33 powerpool_0
2428 root 20 0 554M 271M 4644 S 0.7 6.9 0:26.85 powerpool_0
2428 root 20 0 1545M 1262M 4644 S 1.3 31.9 2:50.15 powerpool_0
and going, im on master as of right now...
one thing is i think you added gevents module in latest which i didnt have.. i redid requirments and testing...
all i know is a git log shows..
commit dd5f139626098c830f399db224948376aac286be Author: Isaac Cook isaac@simpload.com Date: Tue Dec 2 14:39:54 2014 -0600 as latest entr
one thing is i think you added gevents module in latest which i didnt have
Gevent has been a requirement since the first commit of powerpool.
no idea, i git pulled today and restarted pp, as seen on #irc
Odd, what kind of traffic is that server seeing? Most of our instances don't bloat nearly that quickly, regardless of being on v0.6.3 or v0.6.2.
nothing much 2 miners 50mhs, anything you can think of i can do to figure out ?
tried valgrind but i don't know much about that
@ericecook Is this still occuring? I don't believe we've had issues with this anymore.
@icook I'm unsure. We haven't had memory issues since moving the scheduler out to cron jobs
Ah, right. That should probably be documented in simplecoin_multi now that I think about it...
I'll go ahead and close this since it seems to be resolved by #98.
Long, high load instances are eating more and more RAM as they run, as much as 4GB after several weeks at a few gigahash. This is possibly caused by: