Open rossabaker opened 5 years ago
In our case, it happens when we run our integ suites (probably hard to reproduce)
I have a pretty easy reproduction on this one. Not one I'm sure I want to run on Travis, but it shows the bug.
That's great
I have confirmed that this is not the timeout stages. I stripped them all away and it still happens. I now suspect that it is related to the simultaneous rendering of the request with receiving of the response. I think we need to figure out how to formally cancel the request writer and await its cancellation before recycling.
I observe this with a bit of high traffic using websockets using 0.21.0-RC1. Was there ever a fix?
Not yet.
Hi,
I am facing the same issue. Here is my stack trace (sorry I cannot do more on it right now :/ ):
http4s version: 0.21.0
20/04/16 06:10:50 ERROR blaze.Http1Connection: Error rendering request
java.nio.channels.WritePendingException
at sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:352)
at sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:416)
at org.http4s.blaze.channel.nio2.ByteBufferHead.org$http4s$blaze$channel$nio2$ByteBufferHead$$go$1(ByteBufferHead.scala:42)
at org.http4s.blaze.channel.nio2.ByteBufferHead.writeRequest(ByteBufferHead.scala:58)
at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:109)
at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:107)
at org.http4s.client.blaze.ReadBufferStage.channelWrite(ReadBufferStage.scala:13)
at org.http4s.client.blaze.ReadBufferStage.writeRequest(ReadBufferStage.scala:21)
at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:109)
at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:107)
at org.http4s.blazecore.IdleTimeoutStage.channelWrite(IdleTimeoutStage.scala:12)
at org.http4s.blazecore.IdleTimeoutStage.writeRequest(IdleTimeoutStage.scala:46)
at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:109)
at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:107)
at org.http4s.client.blaze.Http1Connection.channelWrite(Http1Connection.scala:25)
at org.http4s.blazecore.util.IdentityWriter.writeBodyChunk(IdentityWriter.scala:56)
at org.http4s.blazecore.util.EntityBodyWriter.$anonfun$writePipe$3(EntityBodyWriter.scala:64)
at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:87)
at cats.effect.internals.IORunLoop$.startCancelable(IORunLoop.scala:41)
at cats.effect.internals.IOBracket$BracketStart.run(IOBracket.scala:86)
at cats.effect.internals.Trampoline.cats$effect$internals$Trampoline$$immediateLoop(Trampoline.scala:70)
at cats.effect.internals.Trampoline.startLoop(Trampoline.scala:36)
at cats.effect.internals.TrampolineEC$JVMTrampoline.super$startLoop(TrampolineEC.scala:93)
at cats.effect.internals.TrampolineEC$JVMTrampoline.$anonfun$startLoop$1(TrampolineEC.scala:93)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85)
at cats.effect.internals.TrampolineEC$JVMTrampoline.startLoop(TrampolineEC.scala:93)
at cats.effect.internals.Trampoline.execute(Trampoline.scala:43)
at cats.effect.internals.TrampolineEC.execute(TrampolineEC.scala:44)
at cats.effect.internals.IOBracket$BracketStart.apply(IOBracket.scala:72)
at cats.effect.internals.IOBracket$BracketStart.apply(IOBracket.scala:52)
at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:136)
at cats.effect.internals.IORunLoop$.start(IORunLoop.scala:34)
at cats.effect.internals.IOBracket$.$anonfun$apply$1(IOBracket.scala:44)
at cats.effect.internals.IOBracket$.$anonfun$apply$1$adapted(IOBracket.scala:34)
at cats.effect.internals.IORunLoop$RestartCallback.start(IORunLoop.scala:341)
at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:119)
at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:355)
at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:376)
at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:316)
at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
at cats.effect.internals.PoolUtils$$anon$2$$anon$3.run(PoolUtils.scala:51)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Can anyone explain the consequence of this exception getting logged? Does it cancel one or more running requests? The exception doesn't seem to bubble up so does that mean that an internal retry mechanism is catching it?
A fiber is started to write the request, and this message is logged when that fiber fails.
It's an interesting question: my inclination is that we should instead be canceling the request and raising an error. But the response to a partial request rendering might be part of the recovery. It's natural that the server can begin responding to a partially written request, and bad things can always happen when writing a request, even after fixing this bug. How do we best represent "this is the response we managed to get from an incorrectly rendered request?" I think the status quo is wrong, but I'm not sure what's right.
Hello. I ran into the same issue. For some reason, I wasn't able to reproduce error locally, while in the production setup it appears constantly. Perhaps networking or platform (Linux) is a part of the problem.
http4s version: 0.21.4 pool size: 1 idle timeout: Duration.Inf
[2020-06-22 19:39:36,806] DEBUG [PoolManager]: Requesting connection: curAllocated=1 idleQueues.size=1 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2020-06-22 19:39:36,806] DEBUG [PoolManager]: Recycling connection: curAllocated=1 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2020-06-22 19:39:36,806] DEBUG [Http1Connection]: Connection was idle. Running.
[2020-06-22 19:39:36,806] DEBUG [Http1Connection]: Beginning request: POST http://dns:8080/api/v1/datapoints
[2020-06-22 19:39:36,809] DEBUG [Http1Connection]: Resetting org.http4s.client.blaze.Http1Connection after completing request.
[2020-06-22 19:39:36,809] DEBUG [PoolManager]: Recycling connection: curAllocated=1 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2020-06-22 19:39:36,809] DEBUG [PoolManager]: Returning idle connection to pool: curAllocated=1 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2020-06-22 19:39:36,811] DEBUG [PoolManager]: Requesting connection: curAllocated=1 idleQueues.size=1 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2020-06-22 19:39:36,811] DEBUG [PoolManager]: Recycling connection: curAllocated=1 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2020-06-22 19:39:36,811] DEBUG [Http1Connection]: Connection was idle. Running.
[2020-06-22 19:39:36,811] DEBUG [Http1Connection]: Beginning request: POST http://dns:8080/api/v1/datapoints
[2020-06-22 19:39:36,814] DEBUG [Http1Connection]: Resetting org.http4s.client.blaze.Http1Connection after completing request.
[2020-06-22 19:39:36,814] ERROR [Http1Connection]: Error rendering request
java.nio.channels.WritePendingException: null
at java.base/sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:353)
at java.base/sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:417)
at org.http4s.blaze.channel.nio2.ByteBufferHead.org$http4s$blaze$channel$nio2$ByteBufferHead$$go$1(ByteBufferHead.scala:42)
at org.http4s.blaze.channel.nio2.ByteBufferHead.writeRequest(ByteBufferHead.scala:58)
at org.http4s.blaze.channel.nio2.ByteBufferHead.writeRequest(ByteBufferHead.scala:25)
at org.http4s.blaze.channel.nio2.ByteBufferHead.writeRequest(ByteBufferHead.scala:16)
at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:96)
at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:94)
at org.http4s.client.blaze.ReadBufferStage.channelWrite(ReadBufferStage.scala:14)
at org.http4s.client.blaze.ReadBufferStage.writeRequest(ReadBufferStage.scala:20)
at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:96)
at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:94)
at org.http4s.client.blaze.Http1Connection.channelWrite(Http1Connection.scala:25)
at org.http4s.blazecore.util.IdentityWriter.writeBodyChunk(IdentityWriter.scala:57)
at org.http4s.blazecore.util.IdentityWriter.writeEnd(IdentityWriter.scala:64)
at org.http4s.blazecore.util.EntityBodyWriter.$anonfun$writeEntityBody$1(EntityBodyWriter.scala:53)
[2020-06-22 19:39:36,814] DEBUG [Http1Connection]: Resetting org.http4s.client.blaze.Http1Connection after completing request.
[2020-06-22 19:39:36,814] ERROR [Http1Connection]: Error rendering request
At this point, I cannot understand the logic behind these messages.
Resetting ... after completing request.
printed when reading stage (response) is completed. While WritePendingException
thrown at the writing stage (request).
Can the writing stage be active along with the reading stage?
@iRevive the problem went away completely once I rate-limited our use of blaze HTTP client. We would see it when the node was doing > 250 requests / second. Once I rate limited the service to 200 requests / second we stopped seeing the problem completely. (I used https://github.com/SystemFw/upperbound to do the rate limiting.)
Not sure how useful all of this will be... In one service we have a periodic process that queries a few other services once an hour. In particular, it sends two consecutive GET
queries (over plain HTTP) to a service that doesn't see much action otherwise. On one of the instances (with both of the services running on Windows) we're seeing something like this around 80% of the time:
java.nio.channels.WritePendingException: null
in the log.I've failed to reproduce this locally or get any extra data (I don't control this instance), so I'm at a loss. I can't understand why the timeout happens (as opposed to a retry), and why this only happens between these two services - even though there are more places with pretty much the same flow.
UPD: I've since seen this issue with other services. Also, the timeout behavior might be due to http4s/blaze#642. As a workaround, we've just switched to another client (Jetty).
Any news on this? I'm also seeing a relation between the frequency of this error occurring and the speed at which I am making the requests. Higher performance, more prone to throw error.
I can sometimes hit this
evalMap
ping a basicGET
request 20,000 times./cc @ngbinh, who also reported it