zio / zio-http

A next-generation Scala framework for building scalable, correct, and efficient HTTP clients and servers
https://zio.dev/zio-http
Apache License 2.0
748 stars 379 forks source link

Websockets fail on too early messages #2737

Open leon-becker-secforge opened 3 months ago

leon-becker-secforge commented 3 months ago

Describe the bug

When using a websocket, sending a message to early as the server can cause problems in netty. The initial connection is established with no signs of problems in the browser, yet the socket won't react until the client closes the socket after a timeout of roughly 60s. The bug happens randomly, yet it seems to happen more often on worse hardware.

An exception is logged, but by default hidden.

To solve the bug, one can introduce a short delay for the first message, which for me always worked.

Logs ``` timestamp=2024-03-21T11:22:58.47839274+01: level=DEBUG thread=zio-fiber-64 message="An error was silently ignored because it is not anticipated to be useful" cause=Exception in thread "zio-fiber-64" java.lang.UnsupportedOperationException: unsupported message type: TextWebSocketFrame (expected: ByteBuf, DefaultFileRegion) at io.netty.channel.epoll.AbstractEpollStreamChannel.filterOutboundMessage(AbstractEpollStreamChannel.java:529) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:868) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:877) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:863) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:968) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:856) at io.netty.handler.codec.http.HttpObjectEncoder.write(HttpObjectEncoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:881) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:863) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:968) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:856) at io.netty.channel.ChannelDuplexHandler.write(ChannelDuplexHandler.java:115) at io.netty.handler.codec.http.HttpServerKeepAliveHandler.write(HttpServerKeepAliveHandler.java:87) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:879) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:863) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:968) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:856) at io.netty.handler.codec.http.websocketx.WebSocketProtocolHandler.write(WebSocketProtocolHandler.java:118) at io.netty.handler.codec.http.websocketx.WebSocketServerProtocolHandler.write(WebSocketServerProtocolHandler.java:54) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:881) at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:940) at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247) at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:413) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:833) at .MainServer.socketApp(websocket_bug_reproduction.scala:37) at .MainServer.socketApp(websocket_bug_reproduction.scala:38) ```

To Reproduce

  1. Use the provided code with scala-cli run websocket_bug_reproduction.scala.
  2. Visit http://localhost:6896/
  3. Watch the logs, the problem usually occur directly or after a few reloads.

The provided code creates a server with two websockets and an HTML endpoint at "/" that will contact both websockets. Both sockets should send a message every 5 seconds, yet the one named broken will fail with the described bug some times.

//> using dep "dev.zio::zio-http::3.0.0-RC4"
//> using dep "dev.zio::zio-logging::2.2.2"
//> using scala "3.3.3"

import zio.*
import zio.http.*
import zio.http.ChannelEvent.*
import zio.http.template.*
import zio.logging.*
import zio.logging.LogFormat.*

object MainServer extends ZIOAppDefault {

  def app = Routes(
    Method.GET / "" -> handler(
      // Use example from https://developer.mozilla.org/en-US/docs/Web/API/WebSocket#examples
      Response.html(Dom.raw("""<script>
// Create WebSocket connection.
const s1 = new WebSocket("ws://localhost:6896/websocketBroken");
const s2 = new WebSocket("ws://localhost:6896/websocketWorking");

// Listen for messages
s1.addEventListener("message", (event) => {
  console.log("Message from server ", event.data);
});
s2.addEventListener("message", (event) => {
  console.log("Message from server ", event.data);
});

      </script>"""))
    )
  ).toHttpApp

  val socketApp = Routes(
    Method.GET / "websocketBroken" -> handler(
      Handler.webSocket { ch =>
        ch.send(Read(WebSocketFrame.text("Hello there"))).repeat(Schedule.fixed(Duration.fromSeconds(5)))
      }.toResponse
    ),
    Method.GET / "websocketWorking" -> handler(
      Handler.webSocket { ch =>
        ch.send(Read(WebSocketFrame.text("Hello there")))
          .repeat(Schedule.fixed(Duration.fromSeconds(5)))
          .delay(Duration.fromMillis(100))
      }.toResponse
    )
  ).toHttpApp

  def logger = consoleLogger(
    ConsoleLoggerConfig.default.copy(filter = LogFilter.LogLevelByNameConfig(LogLevel.All, Map.empty[String, LogLevel]))
  )

  override val bootstrap = Runtime.removeDefaultLoggers >>> logger

  def run = Console.printLine("http://localhost:6896/") *>
    Server.serve(app ++ socketApp).provide(Server.defaultWith(_.port(6896)))

}

Expected behaviour

Sending a message directly as the server should work without any need to introduce a delay.

Desktop (please complete the following information):

Additional context I found the bug when using tapir, where it was usually caused by an early PingFrame.

pragmaxim commented 2 months ago

Thanks for a nice repro, working on it :+1:

pragmaxim commented 2 months ago

I researched the internals and IMHO your repro is fundamentally invalid and therefore it should not be considered a bug. The reason being : There is more ChannelEvent types and one should send a request to client only in case of HandshakeComplete and Read types, otherwise you are sending message at wrong time and using channel at wrong state ...

If we implement it correctly, all works fine and as expected :

    Method.GET / "websocketWorking" -> handler(
      Handler.webSocket { ch =>
        ch.receiveAll {
          case Read(WebSocketFrame.Text(text)) =>
            ZIO.logInfo(s"Received $text") *> ch.send(Read(WebSocketFrame.text("How are you?")))
          case UserEventTriggered(HandshakeComplete) =>
            ch.send(Read(WebSocketFrame.text("Hello there !!!"))).repeat(Schedule.fixed(Duration.fromSeconds(3)))
          case _ => ZIO.unit
        }
      }.toResponse
    )

I could see a possible improvement by validating whether user sends request at the right time at ZIO side, but imho it is a redundant complexity.

leon-becker-secforge commented 2 months ago

This seems to me like a valid conclusion. Since the examples were expanded since the last time I looked at them and now also contain this handshake event, I think a problem like this should occur less in the future. It would still be nice if this problem was not possible, but one cannot make everything fool proof.

Initially, I didn't encounter this problem when using websockets in ZIO HTTP directly, but when using tapir. When investigating I came up with this minimal reproduction and reported it here as it seemed like a ZIO HTTP problem.

So thank you for the investigation, I will open a new issue in tapir's repo.

jdegoes commented 3 weeks ago

/bounty $150 to validate whether user sends the request at the right time.

algora-pbc[bot] commented 3 weeks ago

💎 $150 bounty • ZIO

Steps to solve:

  1. Start working: Comment /attempt #2737 with your implementation plan
  2. Submit work: Create a pull request including /claim #2737 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to zio/zio-http!

Add a bountyShare on socials