vibe-d / vibe.d

Official vibe.d development
MIT License
1.15k stars 284 forks source link

Possible Memory Leak #924

Open dariusc93 opened 9 years ago

dariusc93 commented 9 years ago

With vibe.d using the Web Interface, there seem to be some form of a memory leak over time. I've been using the ApacheBench as well as loader.io to perform testing and what made me notice more was when I used loader.io to simulate a number of clients connected to the server all at once. I've dug though my code to see if there is a caused, but so far there is non so I made a small example with it just showing hello world and over the matter of several minutes within testing, the memory increases using 30% of 2GB ram. Left it overnight but the memory never been let go. Any possible reason to why this would happen? Isnt there a GC within D that suppose to kick in to clean up memory?

extrawurst commented 9 years ago

what platform, OS, compiler, compiler version and vibe.d version are we talking about ? I had a similar experience with my vibe.d based system after switching to the new dmd 2.066.1 compiler version. reverting to 2.065 seemed to help in my case, for whatever reason

dariusc93 commented 9 years ago

Ubuntu 14.04 x64, DMD 2.066.1, and v0.7.21. I could always try downgrading, or possibly switch to LDC or GDC to see if that helps any.

extrawurst commented 9 years ago

well i must pull back here, after the server ran for a couple of hours without the massive bloat when built with the other release it suddenly grew back to 2gigs again. so this does not seem to solve anything. this is a huge showstopper for vibe.d in its current release. @s-ludwig any ideas what might be causing this ? the last release did not suffer from that issue

dariusc93 commented 9 years ago

I would agree @Extrawurst, the last version never had this issue, it would normally stick around 2% max on the type of test im doing right now. If you want to test, you can use loader.io against it

extrawurst commented 9 years ago

whats even more bad is that i cannot downgrade to the old release :(

extrawurst commented 9 years ago

befor anyone asks i also tried the VibeManualMemoryManagement version, same problem.

extrawurst commented 9 years ago

@dariusc93 in the IRC you said the amount of memory leaked is equal to the data the server sends ? please elaborate on that here in this issue tracker to keep it archive it.

etcimon commented 9 years ago

Out of curiosity, do you get the same behavior with the libasync driver?

dariusc93 commented 9 years ago

@Extrawurst According to the test, The amount of data it received from the server would be between 400MB to 600MB, and that would lead up to the amount of memory the application is using on the server. It wont go beyond that amount from what it looks like unless you can make the test get more from the server.

@etcimon Yes. I been using the libasync driver but when i caught on to the leak I switched to vibe.d completely before reporting the issue. It didnt increase as fast but when I did found out about the memory spike, i compiled my code against vibe.d without libasync and retried.

Update: So far, I cant afford to downgrade vibe.d to its previous version, I will build against vibe.d master to see if any changes would help. I've tested with direct connection, and behind nginx (which is what i been using but plan on using haproxy in the near future). I've tested with and without ssl, but so far no luck. Haven't tried against LDC but i doubt it would change anything.

etcimon commented 9 years ago

So the server never really starves? If that's the case, I found this same issue back in march, you can see it here:

https://github.com/rejectedsoftware/vibe.d/issues/596

I ended up investigating why the GC would keep the memory around. After working through everything, I concluded that collection frequency was the only real issue and proposed to increase the speed of expansion of the GC memory pools:

https://github.com/D-Programming-Language/druntime/pull/826

etcimon commented 9 years ago

I think the right solution to keep the active memory low would be to tweak the GC here:

https://github.com/D-Programming-Language/druntime/commit/4d793a3d1c36db33720df53bf4b59cf877880365#diff-6f1ab0423fff9dcd084ecf9a677dc426L2193

Keeping the pools sized in small chunks helps reduce the amount of pools, which is necessary if you need to see a small # for the RAM column in the task manager :)

dariusc93 commented 9 years ago

@etcimon In many of my applications ive made in D ive never had much of an issue with memory until now. Also are you using LDC or DMD when it comes to your applications and have you notice a difference in memory?

etcimon commented 9 years ago

@etcimon In many of my applications ive made in D ive never had much of an issue with memory until now. Also are you using LDC or DMD when it comes to your applications and have you notice a difference in memory?

I'm using DMD 2.066.1 / vibe.d master / libasync master, in a production e-commerce, and while it crashes on some random asserts every few hours because of some unrelated complex API calls, it doesn't exceed 500MB in memory usage.

On another hand, I'd have to look more into this anyways b/c I need to make a lightweight desktop app. I think tweaking the pool size in the GC code would be a good start (maybe make it a setting).

Another useful tip to debug a leak (if there is one) would be to see if it comes from vibe.d containers, which helps track it, you'll have the actual memory usage by using this:

https://github.com/rejectedsoftware/vibe.d/blob/master/source/vibe/utils/memory.d#L149

So, you first uncomment the DebugAllocator here and here, you create a new page handler which calls defaultAllocator() or manualAllocator() and cast the returned Allocator to a DebugAllocator, then you can poll the actual byte usage using the function bytesAllocated(), send it to the HTTP bodyWriter and see if it's really leaking from vibe.d objects.

dariusc93 commented 9 years ago

@etcimon may I ask that you perform a quick test using loader.io and see if memory starts to rise and doesnt release? I will be uncommenting and running the test soon to see where its leaking from.

etcimon commented 9 years ago

@etcimon may I ask that you perform a quick test using loader.io and see if memory starts to rise and doesnt release?

I'm pretty sure this does occur as you say so. However, from my experience it doesn't leak unlimited, it's simply a bounded pool size.

I've been brainstorming a Buffer type that works like an Array, but with a custom allocator and an allocation predicate (to have a max length and avoid memory attacks). This not only allows more control over the lifetime of objects in Json/Redis/Stream.Operations/etc., but also it allows to start developing some @nogc containers and memory. So I'll be attributing everything I wrote with @nogc and I will propose a few pulls to vibe.d that use buffers optionally, so that we can achieve a smaller pool size in the GC as a build option and allow vibe.d applications can stay low in memory (~10mb).

I've been looking into the GC a lot, and decided to resort to RAII instead with many types of buffers. I'm immediately getting involved with this new memory container library before a first release of Botan.

dariusc93 commented 9 years ago

@etcimon So far it looks like it still have yet to be resolve. If you run the example/web example and use "ab -k -n 1000000 -c 1000 http://localhost:8080/" several times (several times as in possibly 4 or 5 if not more), you would see that the memory is not being freed up like it should after the benchmark have ended. I waited several hours and the memory isnt freed. I would use LDC but it would require major changes in my code and one library i use, ldc breaks. GDC could be an option. I may try using the latest dmd but i dont think its stable for production. At this rate i might have to split my project in have and migrate one half to go and leave the other half alone and hope it would reduce the overhead. I have yet to try the normal way with URLRouter or a rest interface but the web interface meets more of my need.

extrawurst commented 9 years ago

I have the same issues with URLRouter too, so don't bother. i have no clue why this memory problem occured suddenly (as in last year at some point) but not a lot people seem to bother

On Sat, Feb 21, 2015 at 1:21 PM, Darius Clark notifications@github.com wrote:

@etcimon https://github.com/etcimon So far it looks like it still have yet to be resolve. If you run the example/web example and use "ab -k -n 1000000 -c 1000 http://localhost:8080/" several times (several times as in possibly 4 or 5 if not more), you would see that the memory is not being freed up like it should after the benchmark have ended. I waited several hours and the memory isnt freed. I would use LDC but it would require major changes in my code and one library i use, ldc breaks. GDC could be an option. I may try using the latest dmd but i dont think its stable for production. At this rate i might have to split my project in have and migrate one half to go and leave the other half alone and hope it would reduce the overhead. I have yet to try the normal way with URLRouter or a rest interface but the web interface meets more of my need.

— Reply to this email directly or view it on GitHub https://github.com/rejectedsoftware/vibe.d/issues/924#issuecomment-75369367 .

dariusc93 commented 9 years ago

@Extrawurst What version of vibe.d and what version of dmd?

extrawurst commented 9 years ago

@dariusc93 Using dmd 2.065 and 2.066.1 and vibe.d 0.7.22

dariusc93 commented 9 years ago

@Extrawurst Have you tried using the latest code within the master repo?

extrawurst commented 9 years ago

@dariusc93 last time i checked was beginning of dec '14 - i wonder what of the few commits since then should change any of the above ?

dariusc93 commented 9 years ago

@etcimon What would be the best way to debug this issue to see where the leak is coming from?

etcimon commented 9 years ago

@etcimon What would be the best way to debug this issue to see where the leak is coming from?

You'd have to enable GC printf debugging and write everything to a log. All pointers allocated or freed should appear here. You isolate the pointers so that nothing else appears on the line, pipe it into uniq in the logs to find dormant allocations.

Once you find the dormant allocation, you restart the application in GDB with a watch *(long*) 0xffffffff where 0xffffffff is the pointer to the object. This should insert a data breakpoint that monitors activity in this memory location. You run the application and it should stop the program when this location is touched, so at that point you print the call stack using the GDB command bt (backtrace), and if it's nothing important just enter c (continue) to keep going. With the most recent GDB it should give you the entire demangled stack which ends with GC.qalloc or something like that.

Once you know where this dead object was created, you'll know what it was and you can try and figure out where you're leaving references around and clean those up. If you're a power user, you can try to create a function that does this automatically through thread_scanAllType when given a pointer, and call that from GDB using an extern(C). You then watch the matching data locations that hold your pointer, in GDB, to find the call stack and see what objects hold those references. Edit This will not find pointers in the heap, and you can't know what the GC is actually watching in the heap unless you write lots of debugging code in the GC itself to log the marking process for specific memory ranges.

If the references were slices, you'll probably have a hard time finding them because the pointer can be to anything inside the allocated range.

It would be great to have a utility that does all of this automatically inside an IDE. Maybe a Mono-D extension.

Unfortunately, the only other ways I know of is simply not using the GC, or proof-reading every allocation. Maybe upgrading druntime to the most recent version could also fix this, if it was a maximum pool size issue.

I refactored the memory handling parts of vibe.d into a library called memutils that was integration tested with Botan. This would have the potential of eliminating the GC allocations, using a ScopedPool idea described here:

https://github.com/rejectedsoftware/vibe.d/issues/978#issuecomment-73819358

I'm wondering if Sönke would agree with changing everything memory-related in a pull request

etcimon commented 9 years ago

Another simple way I've just thought of would be to add a function in the GC that takes a memory range void[] and returns the number of marks in the application, and the rtinfo of the memory that holds those marks.

This way, you can write Tuple!(void[], typeinfo)[] getReferenceHolders(void[] myObj) and a bunch of assert through the code to make sure there's no reference holders and print some proper debug info it there are any.

extrawurst commented 9 years ago

@etcimon this is all good but why does this behaviour suddenly appear out of nowhere last year without upgrading dmd or changing my code that makes use of vibe.d?

etcimon commented 9 years ago

@etcimon this is all good but why does this behaviour suddenly appear out of nowhere last year without upgrading dmd or changing my code that makes use of vibe.d?

Until we know what it is exactly, we can only vaguely suppose where it came from. Due to this, I can't pronounce myself with any certainty. Currently, I'm working on http/2 implementation and once I'm done, I'll upgrade vibe.d with http2/botan/memutils and tackle this issue.

dariusc93 commented 9 years ago

Well I hope it does get fix soon because now it leaves me concern that if the memory isnt cleaned up there is a chance of someone exploiting vibe and dumping any data out of memory (eg passwords, account info, etc). I have no clue where to start to debug to be honest since, as @Extrawurst said it didnt pop up until last year. I could try to cycle back in the versions to see where it couldve started then work from there.

etcimon commented 9 years ago

I started looking into it a little https://github.com/rejectedsoftware/vibe.d/pull/987

dariusc93 commented 9 years ago

@etcimon thanks. I did pull the master this morning when i saw there was a merge and test with the http server example, but the memory is still not being cleared up. Once it reach 9.4% on my machine, it does slows down in increasing memory but not once did it free up.

dariusc93 commented 9 years ago

I have did a network test and can confirm after several test that the memory leak still exist. The memory is not freed by GC, and over time, after ~14% to 20% of memory being used (on a 16GB server) that the connection starts to drop slightly or performance decrease. @s-ludwig any ideas? I do not want to have to restart the server every several weeks just to clear up the memory.

etcimon commented 9 years ago

The memory is not freed by GC, and over time, after ~14% to 20% of memory being used (on a 16GB server) that the connection starts to drop slightly or performance decrease.

Are you using lots of json ? Which other modules are involved

dariusc93 commented 9 years ago

@etcimon I am not even doing much with the example im using. It is the web interface but it is also with the URLRouter as well. You can have it set to write out "Hello, World", use apache benchmark or loader.io (which is what I used), had 1500 to 5000 connection per second to the server and watched the memory spike up. I tried it all sorts of ways, including json (though json isnt normally send from the server).

etcimon commented 9 years ago

Well, that's odd, I'm on Windows 10 using DMD 2.067.0 with vibe.d master and apache bench on the "http_server" example with libevent.

It's been about 5 minutes running 10,000,000 requests at 1000 clients and the server is spiked at 13% CPU (one saturated core) and 11.8MB memory usage.

What's your setup?

dariusc93 commented 9 years ago

@etcimon Ubuntu 14.04 with vibe.d master. Also using DMD 2.066 and nginx and haproxy, but my issue can be produced directly

etcimon commented 9 years ago

Also using DMD 2.066 and nginx and haproxy, but my issue can be produced directly

You should definitely try out the new DMD 2.067. It might have been a GC bug

etcimon commented 9 years ago

Also, make sure you delete the dub.selections.json when re-compiling, there's always a chance you're using an old version because of that file (it's really inconvenient I know)

dariusc93 commented 9 years ago

@etcimon Is there a build available for use that I could installed via deb? I also have deleted dub.selections.json (do not see why it even exist), but like I said it can even be produced with the http examples.

etcimon commented 9 years ago

Are you on libasync or libevent?

dariusc93 commented 9 years ago

libevent. Whatever is default in vibe.d

etcimon commented 9 years ago

@etcimon Is there a build available for use that I could installed via deb?

Not sure, I know you can find it on http://dlang.org/download though

dariusc93 commented 9 years ago

Oh wow, I was never aware of 2.067 becoming stable. I will run a quick test and come back with results shortly.

etcimon commented 9 years ago

Here's my valgrind massif analysis on fedora 20

    KB
301.4^#                                                                       
     |#::::::::::::@::::@:::@:::::::::::::@::::@@:::::::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
     |#:::: :::::: @::::@:::@:::::::::::::@::::@ ::: :::::@:::::@::::@:::::@::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   4.651

It shows a 301.4KB memory usage peak for 4.65 billion instructions (running apache bench), no signs of leaks

dariusc93 commented 9 years ago

Is this with 2.066 or 2.067?

etcimon commented 9 years ago

I don't use anything but 2.067

etcimon commented 9 years ago

I did find a leak in libasync though, I'm going to fix that one

dariusc93 commented 9 years ago

Okay, I am running a test with loader.io with my app being compiled with 2.067. So far, memory doesnt spike up fast but it does slowly moves up but doesnt release anything. Is GC being called anytime to clean up the memory or is it all allocated and wont be reallocated?

etcimon commented 9 years ago

So far, memory doesnt spike up fast but it does slowly moves up but doesnt release anything. Is GC being called anytime to clean up the memory or is it all allocated and wont be reallocated?

The GC calls collection throughout the execution and frees everything it can. What's your numbers?

dariusc93 commented 9 years ago

After 15000 connections per second it shows 4.9% with it being at 787M. GC isnt kicking in even wile idle.

etcimon commented 9 years ago

What about apache bench?

dariusc93 commented 9 years ago

@etcimon Sorry I thought i replied to this. What information do you want from Apache Bench?