http4s / blaze

Blazing fast NIO microframework and Http Parser
Apache License 2.0
349 stars 63 forks source link

Disconnect error not detected #292

Open nigredo-tori opened 5 years ago

nigredo-tori commented 5 years ago

Current disconnect handling implementation in ChannelHead relies on matching IOException.getMessage. This is problematic - for example, on Windows these messages are localized, which leads to junk log messages:

2019-05-31 10:35:21,048  WARN: Channel IOException not known to be a disconnect error //[Thread-12] o.h.blaze.channel.nio2.ByteBufferHead.checkError() L:18
java.io.IOException: Удаленный хост принудительно разорвал существующее подключение

Shouldn't there be a better way to detect such situations? What do other libraries do?

rossabaker commented 5 years ago

That's a great question. I imagine this is a solved problem in netty. It's hard when the implementation doesn't throw anything more specific than IOException. I'd love to find a better solution.

We could add the Russian version to solve your problem, but we'll never keep up in general until a more robust alternative is identified.

rossabaker commented 5 years ago

This is way better than we do, but still not great: https://github.com/netty/netty/blob/f17bfd0f64189d91302fbdd15103788bf9eabaa2/handler/src/main/java/io/netty/handler/ssl/SslHandler.java#L1100

nigredo-tori commented 4 years ago

I've seen a similar situation recently, only this time the error ("An established connection was aborted by the software in your host machine") wasn't even in the broken pipe messages... I'm not even sure we need this log message anymore. In the past it might have made sense since there was something different happening in this branch, but right now it's EOF for all IOExceptions, so why bother?

rossabaker commented 4 years ago

I think that's fair. If it's not surfacing any that we'd like to treat differently, there really isn't a point.

DenisNovac commented 3 years ago

@rossabaker So is this error depends on error? I got this error:

2021-03-16 17:51:23,653 [WARN] - Channel IOException not known to be a disconnect error
java.io.IOException: Обрыв канала
    at java.base/sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
    at java.base/sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
    at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:182)
    at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:123)
    at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.implWrite(UnixAsynchronousSocketChannelImpl.java:699)
    at java.base/sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:383)
    at java.base/sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:417)
    at org.http4s.blaze.channel.nio2.ByteBufferHead.org$http4s$blaze$channel$nio2$ByteBufferHead$$go$1(ByteBufferHead.scala:58)
    at org.http4s.blaze.channel.nio2.ByteBufferHead.writeRequest(ByteBufferHead.scala:74)
    at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:124)
    at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:122)
    at org.http4s.client.blaze.ReadBufferStage.channelWrite(ReadBufferStage.scala:30)
    at org.http4s.client.blaze.ReadBufferStage.writeRequest(ReadBufferStage.scala:38)
    at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:124)
    at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:122)
    at org.http4s.blazecore.IdleTimeoutStage.channelWrite(IdleTimeoutStage.scala:28)
    at org.http4s.blazecore.IdleTimeoutStage.writeRequest(IdleTimeoutStage.scala:62)
    at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:124)
    at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:122)
    at org.http4s.client.blaze.Http1Connection.channelWrite(Http1Connection.scala:41)
    at org.http4s.blazecore.util.IdentityWriter.writeBodyChunk(IdentityWriter.scala:72)
    at org.http4s.blazecore.util.EntityBodyWriter.$anonfun$writePipe$3(EntityBodyWriter.scala:74)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:104)
    at cats.effect.internals.IORunLoop$.restartCancelable(IORunLoop.scala:51)
    at cats.effect.internals.IOBracket$BracketStart.run(IOBracket.scala:100)
    at cats.effect.internals.Trampoline.cats$effect$internals$Trampoline$$immediateLoop(Trampoline.scala:67)
    at cats.effect.internals.Trampoline.startLoop(Trampoline.scala:35)
    at cats.effect.internals.TrampolineEC$JVMTrampoline.super$startLoop(TrampolineEC.scala:90)
    at cats.effect.internals.TrampolineEC$JVMTrampoline.$anonfun$startLoop$1(TrampolineEC.scala:90)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:94)
    at cats.effect.internals.TrampolineEC$JVMTrampoline.startLoop(TrampolineEC.scala:90)
    at cats.effect.internals.Trampoline.execute(Trampoline.scala:43)
    at cats.effect.internals.TrampolineEC.execute(TrampolineEC.scala:42)
    at cats.effect.internals.IOBracket$.$anonfun$guaranteeCase$1(IOBracket.scala:116)
    at cats.effect.internals.IOBracket$.$anonfun$guaranteeCase$1$adapted(IOBracket.scala:114)
    at cats.effect.internals.IOAsync$.$anonfun$apply$1(IOAsync.scala:37)
    at cats.effect.internals.IOAsync$.$anonfun$apply$1$adapted(IOAsync.scala:37)
    at cats.effect.internals.IORunLoop$RestartCallback.start(IORunLoop.scala:447)
    at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:156)
    at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:463)
    at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:484)
    at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:422)
    at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
    at cats.effect.internals.PoolUtils$$anon$2$$anon$3.run(PoolUtils.scala:52)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)

I guess it happens somewhere here:

val http = new Http(config, (_: util.Map[String, util.List[String]], _: util.Map[String, String]) => ())

retryingOnAllErrors(policy, logOnError){
  http
    .post(url)
    .authentication()
    .client()
    .form()
    .param("client_id", cfg.clientName)
    .response()
    .execute()
}

Also retries seems to stop working when this error happens.

Would it be good if message was in English: java.io.IOException: Channel is broken?

nigredo-tori commented 3 years ago

Обрыв канала

I'd say that this translates to "Broken pipe", which would be caught by the condition here. So the message you've posted is, indeed, another instance of this issue.

DenisNovac commented 3 years ago

I thought that my troubles might be because of that but now i see it ends in EOF anyway. Thanks

rossabaker commented 3 years ago

It sounds like we should just go ahead and get rid of that log message, and do so on the series/0.14 branch?

kell18 commented 3 years ago

Hi, Not sure if that's related but I'm having another problem - EOF is hard to debug without any stacktrace/additional message attached (blaze-core v0.14.17). For example, we're having a big app that makes calls to multiple external services with the client. And common .handleErrorWith in the main that logs errors. What I see at the end when something goes wrong:

Integration failed: EOF

And nothing else. I don't even know at which step it failed... Is there a way to make it better (for example by attaching info about the request)? And/or attach stacktrace to it?

rossabaker commented 3 years ago

The EOF is a regrettable design, because it's both a command an an exception. It suppresses the stack trace because the exception is used for control flow and is expected to happen a lot.

We might be able to enhance the request loop in BlazeClient to wrap the EOF in an exception that provides request context. :thinking: