Closed adinapoli closed 9 years ago
Hmm, since Text is such a commonly used structure, it's hard to say what might be causing this problem. It's entirely possible that this leak could be a problem with snap's reloading code, but it's hard to know for sure. I think there are two ways you could approach this.
It's hard to say which would be easier for you.
On Mon, Feb 9, 2015 at 10:36 AM, Alfredo Di Napoli <notifications@github.com
wrote:
Hello folks,
Sorry if it is not relevant anymore in the upcoming 1.0 release, but I thought would have been quite useful to bring it up nevertheless.
I have a web server that essentially restart itself if certain conditions are met; in particular, I have a main snaplet that holds the state of a connection to a RabbitMQ broker. If such connection is lost, for example due to a network failure, I use /admin/reload to reload the state of my snaplet back into memory.
As part of a particular catastrophic simulation, the reload function was executed by the server quite frantically, until the broker went back online. Monitoring the memory allocation of my server though I did notice a huge memory allocation. Profiling reveals the following:
[image: screen shot 2015-02-09 at 15 47 06] https://cloud.githubusercontent.com/assets/442035/6108771/0fa0d9be-b076-11e4-94b9-15ce0b03cf39.png
[image: screen shot 2015-02-09 at 15 52 56] https://cloud.githubusercontent.com/assets/442035/6108778/17d5f31c-b076-11e4-80a8-0e0119b9b00e.png
[image: screen shot 2015-02-09 at 15 50 56] https://cloud.githubusercontent.com/assets/442035/6108782/1fcc5bba-b076-11e4-8247-c0e87a921f9e.png
I couldn't actually profile with -hr as I get this GHC error:
hermes-server: internal error: Invalid object in isRetainer(): 39 (GHC version 7.8.3 for x86_64_apple_darwin) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug [1] 19712 abort hermes-server -p 9939 +RTS -hr -L120
As far as I can see one possible suspect is this Chunk data structure, which may be coming from text:
http://hackage.haskell.org/package/text-1.0.0.0/docs/Data-Text-Lazy-Internal.html
Browsing through the snap code, I can see that the reload function does nothing but calling an IO action which contains an Either Text Text:
In particular, I can see that you are forcing the result of these IO action in the reload handler:
But despite that, as Either is a lazy data structure, the underlying text within contained will not be forced, and this might explain the leak. I'm not fully convinced of this track though, as also ByteString has a Chunk type, and that ARR_WORDS might be telling us exactly that.
Do you guys think:
- This is a bug in my application code
- It's a potential bug in Snap
- Is not a bug but a "feature": There is no way we can protect ourselves from this to happen.
Thank you in advance!
Alfredo
— Reply to this email directly or view it on GitHub https://github.com/snapframework/snap/issues/137.
Hi Doug, thanks for the prompt reply.
I think that considering the delicate transition the project is facing (0.x -> 1.0), you and Greg would be probably in the best position to troubleshoot what's happening, so I would be inclined to go with the number two.
I will try to whip up a minimal example reproducing the bug later this week. Thanks :)
Alfredo
I'm betting this has to do with some snaplet (either heist or something in your user code) doing a lazy modifyMVar instead of a strict one; that would mean that until the contents of the MVar are demanded, the MVar would contain a thunk chain. Similar caution applies to e.g. IORef.
Are you using the heist cache tag anywhere?
Hello Greg,
I'm not using the heist cache anywhere, but my snaplet does indeed use a lot of IORefs, having to deal with the mutant nature of the RabbiMQ brokering. I was always very cautious though, and I am using atomicModifyIORef'
everywhere I need to modify the content of a ref.
However, I can see that the underlying RabbitMQ library is using modifyMVar in a couple of places:
https://github.com/hreinhardt/amqp/search?utf8=%E2%9C%93&q=modifyMVar
So indeed, that may be the source of the leak. During this week, I will do two things:
That should give us a better picture of what's going on.
Thanks guys! Alfredo
I think we can close the case.
After having managed to run the profiling with the -hr
option enabled ( the crash seems to be an already reported GHC bug, which can be partially worked around by removing the -N
threading option when running the program. After that, and a bit of cost centre annotations, the culprit was evident:
(It is renown that http-client's Manager
is a pretty heavy object to create, and I'm pretty sure that my code was not strict enough, causing more and more managers to be created every time).
I am inclined to close this issue now: I still have a marginal leak in my profiling, but I do not think it is any way related to the reloading portion of the framework.
Sorry for the noise guys!
Hello folks,
Sorry if it is not relevant anymore in the upcoming 1.0 release, but I thought would have been quite useful to bring it up nevertheless.
I have a web server that essentially restart itself if certain conditions are met; in particular, I have a main snaplet that holds the state of a connection to a RabbitMQ broker. If such connection is lost, for example due to a network failure, I use
/admin/reload
to reload the state of my snaplet back into memory.As part of a particular catastrophic simulation, the reload function was executed by the server quite frantically, until the broker went back online. Monitoring the memory allocation of my server though I did notice a huge memory allocation. Profiling reveals the following:
I couldn't actually profile with
-hr
as I get this GHC error:As far as I can see one possible suspect is this
Chunk
data structure, which may be coming fromtext
:http://hackage.haskell.org/package/text-1.0.0.0/docs/Data-Text-Lazy-Internal.html
Browsing through the snap code, I can see that the reload function does nothing but calling an IO action which contains an
Either Text Text
:https://github.com/snapframework/snap/blob/fadf49ac8bea51bc9c3d7ab3d794a3cd7b55a8da/src/Snap/Snaplet/Internal/Types.hs#L54
In particular, I can see that you are forcing the result of these IO action in the reload handler:
https://github.com/snapframework/snap/blob/fadf49ac8bea51bc9c3d7ab3d794a3cd7b55a8da/src/Snap/Snaplet/Internal/Types.hs#L398
But despite that, as
Either
is a lazy data structure, the underlying text within contained will not be forced, and this might explain the leak. I'm not fully convinced of this track though, as also ByteString has aChunk
type, and thatARR_WORDS
might be telling us exactly that.Do you guys think:
Thank you in advance!
Alfredo