Closed glassfishrobot closed 7 years ago
Reported by fae
fae said: Work around code here: https://gist.github.com/haku/4c5447c33850947c4862
oleksiys said: Could you pls. explain how we can reproduce the problem? I mean more step-by-step details.
Thank you.
@rlubke said: Could you please give this JAR [1] a shot in your environment?
[1] https://www.dropbox.com/s/pmw8vy3ovnbfyrx/grizzly-framework-2.3.29-SNAPSHOT.jar?dl=0
@rlubke said: Changes applied:
2.3.x: 3c30dc05944f88184123f57d4956a14dfc5815eb master: 8f783a3c8bd7357c77cdd45e7f07d836ada927eb
This issue was imported from java.net JIRA GRIZZLY-1824
Marked as fixed on Friday, January 13th 2017, 2:03:22 pm
I have been making a HTTP service using HttpServer and loading testing it to check how it behaves under very high load. This service will sit behind a reverse proxy, so I expect all the TCP connections to come from the same client host. My setup code looks like this (I have omitted lots of stuff I do not think is relevant to this issue):
Under high load from Gatling many of the requests do not finish gracefully, e.g. Gatling state is:
This is totally expected, but I do not expect it to break the service such that the JVM crashes or it is unusable after the load is removed. However I found that under this load the JVM would run out of memory and start throwing OutOfMemoryError and killing random threads until everything ground to a halt. In a heap dump from the dead (but JVM still running) service I found that org.glassfish.grizzly.utils.DelayedExecutor.DelayQueue.queue was responsible for 80+% of the heap usage.
After a lot of investigation, I have a couple of ideas on what is happening:
1) It might be possible for a connection to die in some way that does not cause events added using org.glassfish.grizzly.filterchain.FilterChainContext.addCompletionListener() to be called. IdleTimeoutFilter depends on that callback being reliable so that org.glassfish.grizzly.utils.IdleTimeoutFilter.ContextCompletionListener.onComplete() is called. If it is not called any connection that does not close cleanly is left in the DelayQueue.queue Map with timeout set to FOREVER} (Long.MAX_VALUE). Thus DelayedExecutor will never remove it from DelayQueue.queue and process it.
2) I noticed that many of the leaked connection objects have timeout set to 9223372036854775806, Long.MAX_VALUE - 1, same as IdleTimeoutFilter.FOREVER_SPECIAL, and counter is still set to 1. But the connection is closed. So I am wondering if it is possible that org.glassfish.grizzly.utils.IdleTimeoutFilter.ContextCompletionListener.onComplete() does not always to the right thing. For example, IdleRecord.timeoutMillisUpdater.compareAndSet(idleRecord, FOREVER_SPECIAL, timeoutToSet) does not check the return value.
I am afraid I have not delved deep enough to figure it out exactly. What I have managed though is to create a temporary work around that cleans up closed connections that still have a timeout set. It resets the timeout to be in the past for connections that are closed so that DelayedExecutor will then process them as expected. Code will be attached.
Environment
Java 1.8.0_71 on CentOS 6.7.
Affected Versions
[2.3.24]