ktorio / ktor

Framework for quickly creating connected applications in Kotlin with minimal effort
https://ktor.io
Apache License 2.0
13.09k stars 1.07k forks source link

Websocket close on client side (JavaFX webkit) breaks future websocket connections #1275

Closed altavir closed 4 years ago

altavir commented 5 years ago

Ktor Version and Engine Used (client or server and name) CIO server on Windows, version 1.2.3 + websockets.

Describe the bug The page opens a websocket connection on connect and server starts to send data with small intervals. When page is reloaded or switched, the socket should be closed and server should receive close event. It works as expected in chrome, but in JavaFX browser, the following error is thrown:

java.io.IOException: An established connection was aborted by the software in your host machine
    at java.base/sun.nio.ch.SocketDispatcher.write0(Native Method)
    at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:51)
    at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
    at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:58)
    at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:50)
    at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:466)
    at io.ktor.network.sockets.CIOWriterKt$attachForWritingDirectImpl$1$1.invokeSuspend(CIOWriter.kt:75)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedContinuation.resumeWith(Dispatched.kt:108)
    at kotlinx.coroutines.io.internal.CancellableReusableContinuation.resumeWith(CancellableReusableContinuation.kt:93)
    at kotlinx.coroutines.io.ByteBufferChannel.resumeReadOp(ByteBufferChannel.kt:2211)
    at kotlinx.coroutines.io.ByteBufferChannel.flushImpl(ByteBufferChannel.kt:156)
    at kotlinx.coroutines.io.ByteBufferChannel.flush(ByteBufferChannel.kt:162)
    at kotlinx.coroutines.io.ByteBufferChannel.flushImpl(ByteBufferChannel.kt:140)
    at kotlinx.coroutines.io.ByteBufferChannel.flush(ByteBufferChannel.kt:162)
    at io.ktor.http.cio.websocket.WebSocketWriter.drainQueueAndSerialize(WebSocketWriter.kt:120)
    at io.ktor.http.cio.websocket.WebSocketWriter.writeLoop(WebSocketWriter.kt:47)
    at io.ktor.http.cio.websocket.WebSocketWriter$writeLoop$1.invokeSuspend(WebSocketWriter.kt)

After that, the pages are loaded as expected, but no new websocket connections are established (server returns 404 on websocket address even from chrome).

To Reproduce The example project lives here: https://github.com/mipt-npm/plotly.kt/tree/dev/example-fx

The server websocket configuration lives here: https://github.com/mipt-npm/plotly.kt/blob/d29cb745c6db371fdfdfeee0b82f1da39b046d66/plotlykt-server/src/main/kotlin/scientifik/plotly/server/PlotlyServer.kt#L94

Client-side websocket connection is here: https://github.com/mipt-npm/plotly.kt/blob/d29cb745c6db371fdfdfeee0b82f1da39b046d66/plotlykt-server/src/main/resources/js/plots.js#L50

Comment The behavior seems to reproduce only in FX browser, so it is probably due to some bug in it, still even in that case, it should not render server unusable to other users, so I think it could have severe consequences for users.

altavir commented 5 years ago

The same behavior with Netty:

io.ktor.util.cio.ChannelWriteException: Cannot write to a channel
    at io.ktor.server.netty.cio.NettyResponsePipeline.processCallFailed(NettyResponsePipeline.kt:144)
    at io.ktor.server.netty.cio.NettyResponsePipeline.access$processCallFailed(NettyResponsePipeline.kt:26)
    at io.ktor.server.netty.cio.NettyResponsePipeline.processJobs(NettyResponsePipeline.kt:447)
    at io.ktor.server.netty.cio.NettyResponsePipeline$processJobs$1.invokeSuspend(NettyResponsePipeline.kt)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:238)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.nio.channels.ClosedChannelException
    at io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:955)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:863)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1365)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:716)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:763)
    at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:789)
    at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:757)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:766)
    at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:789)
    at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:757)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:766)
    at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:789)
    at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:757)
    at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:812)
    at io.ktor.server.netty.cio.NettyResponsePipeline$processBodyFlusher$2.invokeSuspend(NettyResponsePipeline.kt:312)
dmitrievanthony commented 4 years ago

Hi @altavir,

I downloaded and tested your project and it works fine, I can't reproduce this issue. Could you please tell me whether this issue is still valid?

altavir commented 4 years ago

@dmitrievanthony It still fails on push mode (pull mode used in the example by default does no use websockets). In order to check it you can pull dev branch and replace this line by pushUpdates(300).

I did not check it on newer versions of ktor. It possible it will work fine since there were some significant changes in IO. I need to find time to check that.

altavir commented 4 years ago

Just checked and the behavior is still the same for the latest version of ktor.

dmitrievanthony commented 4 years ago

Thanks, @altavir. I've reproduced the problem. Trying to figure out what's wrong so far.

altavir commented 4 years ago

I tried to debug, but the problem lies quite deep inside the IO. It is quite possible that FX webkit is doing some bad things with websockets, but it should not drop other connections. Currently, no new websocket connections could be formed on the server (even from the normal browser) after this error.

dmitrievanthony commented 4 years ago

Well, it seems the difference between Chrome and FX browser is that when a page is refreshed Chrome closes WebSocket correctly (sending Close frame) while the FX browser just breaks the connection. And it looks like we don't handle it properly because of closeReason is not completed after such a break.

So, closeReason is not completed and the subscription is not canceled. As a result, you have the broadcast channel that has several subscriptions and one of them doesn't process events and as a result the whole processing stops.

Anyway, continue the investigation.

e5l commented 4 years ago

Fixed in 1.3.2