mpenet / jet

[not maintained, use at own risk] Jetty9 ring server adapter with WebSocket support via core.async and Jetty9 based HTTP & WebSocket clients (jvm based, no cljs)
http://mpenet.github.io/jet/
167 stars 19 forks source link

EofException #12

Closed hlship closed 9 years ago

hlship commented 9 years ago

My handler returns a core.async channel containing a Ring response map.

I've found that if my client terminates early, I start seeing exceptions in my console:

15:06:40.851 INFO  [async-thread-macro-9] e0010450-01ba-4f08-834d-e33349598d2f f.endpoint-tracking - Invoking fan.auth.resources.users/index.
Exception in thread "async-dispatch-17" java.lang.Error: org.eclipse.jetty.io.EofException
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:724)
Caused by: org.eclipse.jetty.io.EofException
  at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:192)
  at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:408)
  at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:302)
  at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:128)
  at org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:678)
  at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:246)
  at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:208)
  at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:474)
  at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:771)
  at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:804)
  at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:142)
  at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:135)
  at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:378)
  at clojure.java.io$fn__6903.invoke(io.clj:297)
  at clojure.lang.MultiFn.invoke(MultiFn.java:238)
  at clojure.java.io$copy.doInvoke(io.clj:396)
  at clojure.lang.RestFn.invoke(RestFn.java:425)
  at qbits.jet.servlet$eval18046$fn__18047.invoke(servlet.clj:149)
  at qbits.jet.servlet$eval17926$fn__17927$G__17917__17934.invoke(servlet.clj:100)
  at qbits.jet.servlet$set_response_body_BANG_.invoke(servlet.clj:105)
  at qbits.jet.servlet$set_body_BANG_.invoke(servlet.clj:207)
  at qbits.jet.servlet$eval18117$fn__18119.invoke(servlet.clj:240)
  at qbits.jet.servlet$eval18087$fn__18088$G__18078__18097.invoke(servlet.clj:220)
  at qbits.jet.servlet$eval18123$fn__18124$fn__18125.invoke(servlet.clj:231)
  at clojure.core.async.impl.channels.ManyToManyChannel$fn__3907.invoke(channels.clj:102)
  at clojure.lang.AFn.run(AFn.java:22)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  ... 2 more
Caused by: java.io.IOException: Broken pipe
  at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
  at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
  at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
  at sun.nio.ch.IOUtil.write(IOUtil.java:65)
  at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:466)
  at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:170)
  ... 28 more

I discovered this using Apache bench (the ab command), using the -t option. I would guess that after it hits its limit, it just closes the HTTP connection(s).

The UUID (f80d055a-7dc1-4823-899e-b2eee7ae8e73) is assigned to each new request.

In fact, using ab -c 20, I see 19 of these exceptions in my log; I see a burst of 19 of these exceptions, a pause of about 20 seconds, then another burst. The UUIDs (assigned by my code as the request is received by my Ring handler) do not repeat, this means that something inside Jet and/or Jetty is retrying them.

I'm tracking down a bit more about how this happens, but it really looks like it would be nice if this exception could be caught and either ignore, or perhaps written to a channel associated with the original request (i.e., so that I could write asynchronous code that would know how to recover from the response being non-deliverable).

hlship commented 9 years ago

As a note, when running ab, I don't see any signs of failures:

Server Software:        Jetty(9.2.6.v20141205)
Server Hostname:        localhost
Server Port:            8080

Document Path:          /auth/users
Document Length:        1271 bytes

Concurrency Level:      20
Time taken for tests:   10.003 seconds
Complete requests:      157
Failed requests:        0
Keep-Alive requests:    157
Total transferred:      239268 bytes
HTML transferred:       199547 bytes
Requests per second:    15.70 [#/sec] (mean)
Time per request:       1274.274 [ms] (mean)
Time per request:       63.714 [ms] (mean, across all concurrent requests)
Transfer rate:          23.36 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       1
Processing:   891 1208 213.2   1150    2089
Waiting:      891 1208 213.2   1150    2089
Total:        891 1208 213.3   1150    2089

Percentage of the requests served within a certain time (ms)
  50%   1149
  66%   1239
  75%   1278
  80%   1316
  90%   1428
  95%   1666
  98%   1952
  99%   2039
 100%   2089 (longest request)

The output would be different if the server dropped any connections, or returned any non-2xx responses.

hlship commented 9 years ago

To clarify, the "burst of exceptions" keeps occurring, seemingly forever.

mpenet commented 9 years ago

Hi,

Thanks for the report, I ll have a look as soon as I can (probably later today).

mpenet commented 9 years ago

Could you provide me a minimal test case? I tried using a simple handler with or without timeout in the go block, using ab, curl, chrome and I couldnt reproduce the issue. Also please try with master.

hlship commented 9 years ago

Thanks for looking at this. I'm at a loss. Yesterday this happened consistently, every time I tested my server using ab. Today everything works smooth as silk. If I can find a way to reproduce it, I'll create a tiny test app.

hlship commented 9 years ago

The error has resurfaced in my application.

I'm working on an example.

https://github.com/hlship/jet-issue-12-example

So far, I can get it to time out under heavy load.

hlship commented 9 years ago

So far, I haven't seen the EofException using my test code, but I do see a point where a heavily loaded server simply stops operating after about 16000 requests. See the project in the link above.

mpenet commented 9 years ago

Thanks! I ll have a look.

mpenet commented 9 years ago

What kind of -c -n options are you using to trigger the timeouts?

mpenet commented 9 years ago

I just tried with various -c and -n, up to 100k requests, another test with concurrency 100, and I can't see the error :(

hlship commented 9 years ago

I'm on Mac OS X, JDK 1.7. Perhaps that makes a difference?

Maybe I should reboot my machine :-)

hlship commented 9 years ago

Seriously, I am going to reboot my Mac and see if that affects my results.

mpenet commented 9 years ago

I am on linux so I guess there could be a difference yes.

$ uname -a
Linux sputbox 3.12.6-031206-generic #201312201218 SMP Fri Dec 20 17:20:06 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ java -version
java version "1.7.0_65"
Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

This does sound like an OS level issue, I ll try to investigate more on my side as well.

niwinz commented 9 years ago

On heavy load machine with a lot of open connections, maybe the connection limit is the bottleneck than jet or jetty...

mpenet commented 9 years ago

yeah, ulimit show unlimited on my machine, I was thinking about this as well (and some other settings of the same kind)

hlship commented 9 years ago

ulimit shows unlimited on my machine as well

niwinz commented 9 years ago

ulimit is not a good indicador about it, ulimit -n is a good indicador (amount of file descriptors allowed) but not the unique. I have an article about it: http://www.niwi.be/2013/03/13/optimizing-server-for-heavy-load-django-application/ (it is for django but almost all is completely independent of application/framework)

mpenet commented 9 years ago

Thanks Andrey, it's very informative. I also had in mind revisiting this post http://www.http-kit.org/600k-concurrent-connection-http-kit.html , but yours seems more to the point.

hlship commented 9 years ago

Ok, I was able to reduce it to a very small and easily reproducible case. Please see the example project and follow the notes.

hlship commented 9 years ago

It looks similar to this: https://jira.codehaus.org/browse/JETTY-747

mpenet commented 9 years ago

Ok from a quick inspection the outputstream is closed when we try writing to it, which throws. I am still undecided on how I would report this error in async mode (the same can happen with the body channel in chunked transfers). I will probably add a :control channel in the request map that would receive such errors, which would later be replaced by a deferred channel probably (if that ever comes out of c.c.async jira). Something similar to what we have in the websocket client/server.

I have a couple of branches to merge/complete first though (backpressure, flow-control improvements) this will probably come next.

As for the "retrying", we don't do anything like this in Jet at least, I will see if I can spot when/where this happens.

hlship commented 9 years ago

Thanks! We can probably continue with the current state of the code, but may have to bounce our servers occasionally, to clear out this "repeating requests".

On Thu, Feb 5, 2015 at 4:23 AM, Max Penet notifications@github.com wrote:

Ok from a quick inspection the outputstream is closed when we try writing to it, which throws. I am still undecided on how I would report this error in async mode (the same can happen with the body channel in chunked transfers). I will probably add a :control channel in the request map that would receive such errors, which would later be replaced by a deferred channel probably (if that ever comes out of c.c.async jira). Something similar to what we have in the websocket client/server.

I have a couple of branches to merge/complete first though (backpressure, flow-control improvements, this will probably come next.

As for the "retrying", we don't do anything like this in Jet at least, I will see if I can spot when/where this happens.

— Reply to this email directly or view it on GitHub https://github.com/mpenet/jet/issues/12#issuecomment-73038004.

Howard M. Lewis Ship

Creator of Apache Tapestry

The source for Tapestry training, mentoring and support. Contact me to learn how I can get you up and productive in Tapestry fast!

(971) 678-5210 http://howardlewisship.com @hlship

hlship commented 9 years ago

Any progress or workaround on this? This is creeping into our production app (fortunately, still in evaluation mode) forcing us to do occasional restarts. If I can't get resolution to this soon, it will be necessary for us to scrap the async support provided by Jet.

mpenet commented 9 years ago

Not at all I am afraid, I am currently traveling and didn't have the time to look into it. I had to do a release today (because of this: https://github.com/eclipse/jetty.project/blob/cfca172dd68846b2c7f30a9a6b855f08da7e7946/advisories/2015-02-24-httpparser-error-buffer-bleed.md) and bumped jetty's dependency in the process, it was lagging behind a few versions. It might be worth trying 0.5.5 and see if it still there. If that's the case I ll get into it asap.

mpenet commented 9 years ago

Ths is fixed in 0.5.7

You can now reach the :ctrl key in the request-map, it's a core.async channel that would receive eventual errors and handle them. I tested against your small project, this seems to fix the problem.

mpenet commented 9 years ago

actually I ll do another release shortly, I introduced some reflection here and there.

mpenet commented 9 years ago

done