Closed aredridel closed 9 years ago
Fortunately, the sharp spikes just aren't happening. The smalloc logging seems to show they're being freed. TLSWrap seems to be freed regularly but, lately, seems to have a floor of 600. WriteWrap is more interesting, seemingly alternating between 1 and 590.
RSS is trending in a slightly upward angle, gaining ~10mb over 4 hours, during a time when 0.12 was flat. At this point (couple days of uptime), io.js is ~100mb higher at ~450mb as compared to 0.12's 350mb.
Thank you, @jasisk . Unfortunately, it doesn't seem like anything bad is happening with either of TLSWrap/WriteWrap/smalloc. I can only suggest trying io.js 1.0.0 at this point, to isolate the change that introduced the leak.
You got it. Will deploy first thing in the morning, report back after a few days.
Thank you, @jasisk!
Any news, @jasisk ?
I ended up letting the last deploy run over the weekend just to make sure the initial hunch wasn't wrong since it had been a while since I'd let it run for a solid chunk of time.
Deployed v1.0.0 earlier tonight. We'll see how this goes.
@jasisk I have a fix for you :) Might help a bit, or might fix it... (crossing my fingers).
Going to push it to PR in a bit.
Here are the other changes that needs to be re-checked:
removeListener
C++
cc @bnoordhuis : if you have time let's iterate this list together and verify that there are no leaks. (I've diffed all changes between v0.12 node.js and iojs v1.1.0 to find these).
@jasisk https://github.com/iojs/io.js/pull/1244 here you go, please give it a try. It is definitely fixing some leak, but it is really hard to tell how much yet.
@jasisk I have very good hopes for https://github.com/iojs/io.js/pull/1244 to fix problem that you are experiencing. (No need to redeploy, the landed commits are just a bit re-iterated patches that you applied in your testing).
Only 7 hours on it but haven't seen this shape before:
Huge :+1:
@jasisk what are you using to generate those graphs?
@balupton I'm using Grafana. We stuff all of our metrics into InfluxDB. Looking forward to the upcoming releases of both for various reasons. We also have Heka in the pipeline for metrics-triggered actions. We get graph annotations in Heka, I just need to push those back into Influx because Grafana supports them. twitter@jcse if you want to chat about it.
@jasisk btw, how does the graph look after 2 days?
Hard to say at this point as we've only got 24 hours on it—I restarted three machines to get a proper comparison. I'm a little concerned by the trend over the last 6 hours but not enough yet to really conclude anything. Though even at this stage I can tell there's been substantial progress.
The build is #1244 on 2db758c562b4e480acc2d1a654e2c471a653a039.
purple: io.js, blue: 0.12.1, yellow: 0.12.0
@jasisk looks like there is some leak still :) I'll continue looking into it then. Thank you!
@jasisk may I ask you to take a heapsnapshot of the leaking process? I don't need the full file, just a want to take a sneak-peak look at the "Global Handles" contents. (See the attached picture).
UPDATE: would be cool to have a bit more items than what present on the attached screenshot :)
You got it. Instrumented, taking snapshots periodically. Will update when it's grown a bit.
@jasisk how is it going? ;)
I don't have quite as much data as I'd like (pulled a couple of machines out of the pool for a bit) but, from what I can tell, looks like @rvagg and others' speculation was right: lots of timer refs from stats collection (unref'd timeout).
In any case, I put the machines back, kicked them over and am snapshotting every 10mb.
@jasisk You mean like https://github.com/joyent/node/issues/8364 / https://github.com/iojs/io.js/pull/1152#issuecomment-88870832?
@jasisk no way! haha :) Well, I'd be more than happy if these are just a timers. Do they appear on a Global Handles list?
See also https://github.com/iojs/io.js/commit/33fea6ed5f506e7748cb9132f63dc03215a9eb39#commitcomment-10440550 for more info on how / why the original #1152 is broken
@Fishrock123 I'm going to address this problem now. Could you please assign it to me if we have any issue in a tracker?
Yup. That's exactly it.
0.12 on the left, io.js (custom build mentioned above) on the right
Yeah, it leaks timers now. I'll figure it out soon using @Fishrock123 patch.
On Thursday, April 2, 2015, Jean-Charles Sisk notifications@github.com wrote:
Yup. That's exactly it.
0.12 on the left, io.js (custom build mentioned above) on the right [image: handles] https://cloud.githubusercontent.com/assets/62923/6972101/6c7f6220-d94f-11e4-8ee8-ef1eb2e85489.png
— Reply to this email directly or view it on GitHub https://github.com/iojs/io.js/issues/1075#issuecomment-89022907.
@jasisk could you please give a try to https://github.com/iojs/io.js/pull/1330 ? I hope it'll help
:+1: deploying now.
@jasisk How's it looking? :D
Hasn't calmed down yet but it's looking really good! I've learned to hold off for a few days—for some reason we've historically seen explosive memory growth on the order of 120mb / day, starting a couple of days in—but can't ask for a better graph so far.
purple: 0.12, blue: 1.x
@jasisk Mind giving an update? :D
Was subscribing to, "no news is good news." :grinning:
These two machines are still instrumented to take heap snapshots periodically which is why it's not super clean, however they're acting similarly which makes me think you guys nailed it!
blue: iojs, purple: 0.12
Hooray! I suggest closing the issue then, and reopening it in case of any trouble ;)
:+1:
Indeed. I'd gone ahead and said this was fixed as we knew in the 1.6.4 release. Closing, hopefully for good! :)
Congratulations to everyone envolved. I was accompanying this issue, and was one of the best efforts to exterminate memory leaks that I've seen in my short history with open source projects. :clap:
wow they must love @indutny at google V8 team. Pushing features just so iojs can fix memory leak. Great stuff :ok_hand:
Congrats, your commitment to this problem and enthusiasm is outstanding and infectious. :+1:
Thanks guys!
This thread was exciting to watch as a node/io newbie , inspired by @indutny work . :+1:
@indutny What did you use for this screenshot? https://cloud.githubusercontent.com/assets/238531/6881510/064cb80e-d530-11e4-8a94-26ed428de317.png
@devinus probably node-inspector or the blink developer tools
@gm112 My first guess was node-inspector, but it doesn't support io.js. I'm not sure how it could be Blink developer tools either.
@devinus You can tell Blink Developer tools to connect to a remote debugging session, though I'm not 100% sure if this is just on v8 or a remote chrome instance. However Node-Inspector has limited support on iojs, some things work and some things don't. But I could be wrong :)
@devinus I think this was a heap snapshot file generated by https://github.com/bnoordhuis/node-heapdump and opened in Chrome Developer Tools.
I'm diagnosing a problem when I run paypal's npm proxy https://github.com/krakenjs/kappa when run on iojs. After 300 requests for a 50kb tarball, memory allocated grows from 250 mb base to 450 mb.
I've run it under valgrind, and there's some things that look a bit suspicious around tlswrap, smalloc and some other crypto related bits. The interesting bits of the valgrind report are:
Any help or suggestions for what to look at to track this down would be wonderful.
Node 0.12 and 0.10 don't show this behavior with the same code (we're stress-testing io.js in parallel to shake stuff like this loose)