Open the-sun-will-rise-tomorrow opened 4 months ago
What OS do you use? Is it reproducible on Linux?
Is wss://
(i.e. TLS) necessary for the hang? Or does it also hang with plain ws://
?
when the target is a remote host and not localhost
It is the only difference, not e.g. ws://
on localhost vs wss://
on remote host?
What OS do you use? Is it reproducible on Linux?
Yes, it happens on Linux 6.6.7 for me.
It is the only difference, not e.g.
ws://
on localhost vswss://
on remote host?
Yes; if I point websocat
at a locally running TCP proxy (like socat
) which always accepts & buffers input then redirects to the remote host, the problem doesn't occur.
Is
wss://
(i.e. TLS) necessary for the hang? Or does it also hang with plainws://
?
I can test this but it will take some time.
If you are open for testing, you can try early Websocat4 build - does it flush properly?
websocat4 --binary tcp-l:127.0.0.1:1234 wss://1.2.3.4:1234/url
--exit-on-eof
is not yet supported though.
websocat4early.zip - linux x86_64 executable, you can also built yourself from websocat4
branch.
Or does it also hang with plain
ws://
?
It hangs with plain ws://
as well.
If you are open for testing, you can try early Websocat4 build - does it flush properly?
I am, but, sorry, I can't use that binary.
e6a57f3f97547d78a3929ba42e3bc5001fea71b0 does not hang.
Is a significantly older pre-built Websocat version (e.g. https://github.com/vi/websocat/releases/tag/v1.8.0) also buggy?
Is the problem also reproducible locally if one uses network namespaces, veth and netem to emulate a non-perfect network?
What does "send a large packet to 127.0.0.1:1234" mean from a user perspective? Is something like cat /dev/zero | nc 127.0.0.1 1234
that or one needs something more specific?
For me running websocat --binary tcp-l:127.0.0.1:1234 wss://ws.vi-server.org/mirror
and testing performance with cat /dev/zero | nc 127.0.0.1 1234 | pv > /dev/null
does not show hangs.
Fixing it for Websocat1 may be nontrivial (especially without a good repro), and Websocat1 may be nearing sunset.
Is your use case already covered by a workaround, so that proper fix can wait, i.e. consist of abandoning legacy version and finishing and releasing (an alpha version of) Websocat4?
Is a significantly older pre-built Websocat version (e.g. https://github.com/vi/websocat/releases/tag/v1.8.0) also buggy?
Yes. I tested 1.8.0 and 1.3.0 and they hang.
Is the problem also reproducible locally if one uses network namespaces, veth and netem to emulate a non-perfect network?
I would need to set that up :eyes:
What does "send a large packet to 127.0.0.1:1234" mean from a user perspective? Is something like
cat /dev/zero | nc 127.0.0.1 1234
that or one needs something more specific?
On the other side of the WebSocket is an SQL server. The hang happens when, after a handshake and authentication, I send a large query (120 KiB). I have not tried piping /dev/zero
.
For me running
websocat --binary tcp-l:127.0.0.1:1234 wss://ws.vi-server.org/mirror
and testing performance withcat /dev/zero | nc 127.0.0.1 1234 | pv > /dev/null
does not show hangs.
I think the difference is that there isn't a finite amount of data on input (there's always more data to push out any stuck previous data). A better chance to reproduce this would be to connect two echo servers, and then send an initial large packet; it should bounce infinitely.
Is your use case already covered by a workaround, so that proper fix can wait, i.e. consist of abandoning legacy version and finishing and releasing (an alpha version of) Websocat4?
websocat4
and enabling --ping-interval
both seem to work...
I'm running:
When I send a large packet to
127.0.0.1:1234
, sometimes websocat doesn't send it to the destination immediately (and just sits there until something causes it to send the pending data).If the destination sends a PING, websocat does flush pending data.
If I add
--ping-interval
to websocat's command line, that also causes websocat to send pending data when it sends PINGs.I can reproduce it with v1.12.0 and current master (34ddb4d1416887059fb249cc97de1538598fed9c).
I can only reproduce it when the target is a remote host and not localhost (probably because the buffer needs to actually pile up locally for the problem to happen).