Open leon-becker-secforge opened 3 months ago
Thanks for a nice repro, working on it :+1:
I researched the internals and IMHO your repro is fundamentally invalid and therefore it should not be considered a bug. The reason being : There is more ChannelEvent
types and one should send a request to client only in case of HandshakeComplete
and Read
types, otherwise you are sending message at wrong time and using channel at wrong state ...
If we implement it correctly, all works fine and as expected :
Method.GET / "websocketWorking" -> handler(
Handler.webSocket { ch =>
ch.receiveAll {
case Read(WebSocketFrame.Text(text)) =>
ZIO.logInfo(s"Received $text") *> ch.send(Read(WebSocketFrame.text("How are you?")))
case UserEventTriggered(HandshakeComplete) =>
ch.send(Read(WebSocketFrame.text("Hello there !!!"))).repeat(Schedule.fixed(Duration.fromSeconds(3)))
case _ => ZIO.unit
}
}.toResponse
)
I could see a possible improvement by validating whether user sends request at the right time at ZIO side, but imho it is a redundant complexity.
This seems to me like a valid conclusion. Since the examples were expanded since the last time I looked at them and now also contain this handshake event, I think a problem like this should occur less in the future. It would still be nice if this problem was not possible, but one cannot make everything fool proof.
Initially, I didn't encounter this problem when using websockets in ZIO HTTP directly, but when using tapir. When investigating I came up with this minimal reproduction and reported it here as it seemed like a ZIO HTTP problem.
So thank you for the investigation, I will open a new issue in tapir's repo.
/bounty $150 to validate whether user sends the request at the right time.
/attempt #2737
with your implementation plan/claim #2737
in the PR body to claim the bountyThank you for contributing to zio/zio-http!
Describe the bug
When using a websocket, sending a message to early as the server can cause problems in netty. The initial connection is established with no signs of problems in the browser, yet the socket won't react until the client closes the socket after a timeout of roughly 60s. The bug happens randomly, yet it seems to happen more often on worse hardware.
An exception is logged, but by default hidden.
To solve the bug, one can introduce a short delay for the first message, which for me always worked.
Logs
``` timestamp=2024-03-21T11:22:58.47839274+01: level=DEBUG thread=zio-fiber-64 message="An error was silently ignored because it is not anticipated to be useful" cause=Exception in thread "zio-fiber-64" java.lang.UnsupportedOperationException: unsupported message type: TextWebSocketFrame (expected: ByteBuf, DefaultFileRegion) at io.netty.channel.epoll.AbstractEpollStreamChannel.filterOutboundMessage(AbstractEpollStreamChannel.java:529) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:868) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:877) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:863) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:968) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:856) at io.netty.handler.codec.http.HttpObjectEncoder.write(HttpObjectEncoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:881) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:863) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:968) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:856) at io.netty.channel.ChannelDuplexHandler.write(ChannelDuplexHandler.java:115) at io.netty.handler.codec.http.HttpServerKeepAliveHandler.write(HttpServerKeepAliveHandler.java:87) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:879) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:863) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:968) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:856) at io.netty.handler.codec.http.websocketx.WebSocketProtocolHandler.write(WebSocketProtocolHandler.java:118) at io.netty.handler.codec.http.websocketx.WebSocketServerProtocolHandler.write(WebSocketServerProtocolHandler.java:54) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:881) at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:940) at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247) at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:413) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:833) atTo Reproduce
scala-cli run websocket_bug_reproduction.scala
.The provided code creates a server with two websockets and an HTML endpoint at "/" that will contact both websockets. Both sockets should send a message every 5 seconds, yet the one named broken will fail with the described bug some times.
Expected behaviour
Sending a message directly as the server should work without any need to introduce a delay.
Desktop (please complete the following information):
Additional context I found the bug when using tapir, where it was usually caused by an early PingFrame.