Synchronous read timeout

MeanSquaredError commented 2 months ago

Hi,

Currently there is no way to call the synchronous read_message() method with a timeout. I think it is important to have support for a read timeout because the websocket can stay connected without receiving any data for a while and we cannot always rely on ping/pong messages to interrupt the blocking read.

rbeeli commented 2 months ago

Hi,

That's indeed a valid point.

With the async (i.e. ASIO) client, that's already possible, see https://github.com/rbeeli/websocketclient-cpp/blob/main/examples/asio/ex_reconnect_asio.cpp

For bare TCP (unencrypted) socket, I have extended the POSIX socket wrapper to be able to set read/write timeouts: https://github.com/rbeeli/websocketclient-cpp/blob/main/examples/builtin/ex_hello_ws_builtin.cpp

Only for the built-in synchronous OpenSslSocket (WSS) case there is currently no timeout example, which is something to look into, e.g. via poll/select.

MeanSquaredError commented 2 months ago

It makes sense to have a better handling of timeouts. For example if a timeout happens before a full frame is read, just buffer the data received so far and return an empty reply without flagging an error. Then upon next read request check if the buffered data + any newly read data are sufficient for a new frame and if so, return the frame.

rbeeli commented 2 months ago

I'll look into a solution for the built-in OpenSslSocket implementation. Whether it will be possible to recover from a timeout needs to be seen. Commonly, the client is closed and reconnected after experiencing a timeout.

rbeeli commented 1 month ago

Hi @MeanSquaredError,

I have extended the codebase to allow for timeouts for the client functions handshake (previously called init), read_message, send_message, send_pong_frame, and close, incl. the socket connect functions.

This allows the application to avoid eternal hanging on any of those functions. Please note that the client must be closed on a timeout error and recreated, the client cannot recover from a timeout.

As always, the examples are a good reference: examples/builtin/

MeanSquaredError commented 1 month ago

@rbeeli

Thank you for the update. I think that I need a version that would be able to recover from timeouts. Basically I want to do the following in a loop:

Read the websocket with a timeout of 1 second. If data was read successfully (no timeout), proces the data. If there was a timeout, just continue with step 2.
Check if there was no data for 30 seconds, if so send a PING frame.
Go to 1.

So I am trying to use timeouts as a poor-man's version of ASIO/coroutines without actually using either of these.

Here is some background information to explain why I need that.

I am trying to read data from a websocket+STOMP connection provided by a crypto exchange. Currently I am using a custom WS implementation + custom STOMP implementation, but I am looking for a good 3rd party websocket library to switch to. Earlier I tried using boost::beast::websocket, but their implementation is buggy and limited, so I had to switch temporarily to my own implementation.

My program connects to the crypto exchange's WS+STOMP stream and then subscribes to the trades channel for a crypto pair. Once my code subscribes to the stream, the exchange sends a WS/STOMP frame with the last 100 trades for the crypto pair and after that new trades (if any) are being streamed.

However after the initial block with 100 trades, it is possible that the exchange has no new data to send for 60 seconds. In such case after the 60 second timeout elapses, the exchange simply closes the TCP connection, without any websocket close frame. I would like to avoid re-connecting, because the reconnect takes time and upon the reconnect the exchange will send me once again the 100 last trades, and their receiving/processing takes additional time too.

So I would like to try sending PING frames (either WS PING or STOMP heart-beat, whichever works), every 30 seconds or so, in the hope that the exchange would not close the connection if it sees the WS PING/STOMP heartbeat.

However in order to do that, I need blocking reads with timeouts that do not require the websocket to be destroyed and created again, because that would imply a reconnect to the exchange's data stream.

So I wonder what would be the easiest way to achieve that. Maybe use SSL_poll on the underlying SSL connection and only continue with the blocking read if there is some data. In 99% of the cases it will probably mean that there is a websocket frame to be read. Or maybe just try to switch to ASIO/coroutines, something that I really try to avoid. Do you have any suggestions/advice?

rbeeli commented 1 month ago

Hi @MeanSquaredError,

I understand your requirement, in fact, I have had the same, just in ASIO it's easier to implement by using your own timer next to the read_message call. May I ask what the reason is to not use ASIO in your case? It would solve your problem.

The only way I see to recover from a read_message timeout is if the very first read call inside read_message times out, hence no data has been read yet, then it is possible to read again and not be in a corrupt/undefined state.

Once data has been read and it times out inside the same call to read_message, recovering is not easily possible without rewriting the whole read part and do considerable bookkeeping, which I do not plan to implement.

If we want to recover from a read_message timeout, we might need two timeout parameters for the read_message function. The first would be a message timeout, i.e. how long do we want to wait for a message before we time out, and the second is a read timeout, i.e. how long should it max. take to read a message once we start reading bytes. It would be possible to recover from the first timeout, but not from the second. This way you could periodically send ping etc. on message timeouts. Also, a message timeout could be a dedicated return type, while a read timeout is a normal error.

Another reason to have two timeouts is that we don't want to necessarily time out while reading a message. For example, let's assume we have only one timeout parameter and it's set to 30 sec. Now, after 29.5 sec, a message comes in and it takes 1 sec to read it from the network. At 30 sec, it would time out in the middle of reading the message, even if normally reading a message within 1 sec would be considered ok. To avoid this, the message timeout could be set to 30 s, and the read timeout to 5 s, so the max. time the method could take is 35 s.

MeanSquaredError commented 1 month ago

Hi @rbeeli

May I ask what the reason is to not use ASIO in your case? It would solve your problem. The reason why I try to avoid ASIO is that its programming model is pretty intrusive and the whole program would have to be written around it. Essentially the whole processing would happen inside io_context::run()

If we want to recover from a read_message timeout, we might need two timeout parameters for the read_message function. The first would be a message timeout, i.e. how long do we want to wait for a message before we time out, and the second is a read timeout, i.e. how long should it max. take to read a message once we start reading bytes. It would be possible to recover from the first timeout, but not from the second. I understand that. I really need just the first timeout, which is the timeout until data becomes available. I don't really need the second timeout, because as I mentioned in my previous message, in 99% of the cases, once data starts flowing in, the whole frame should arrive real fast.

I think I will just create a simple wrapper around WebSocketClient::read_message() that will do an initial SSL_poll() wait until data becomes available or a timeout occurs and then if data is available will just forward the call to WebSocketClient::read_message().

By the way, I was able to workaround my problem with the exchange that times out after 60 seconds. If the STOMP connection to the server is configured so that

The server sends a heartbeat to the client every 30 seconds.
The client does not send any hearbeats.

then the server does not close the WS+STOMP stream even if there is no real data to be sent. So the only thing that my code has to do is to ignore the received server-side heartbeats. This means that at this point I don't have to think of workarounds to send PING frames or other keepalive message.

If further down the road a similar problem arises, that cannot be solved through blocking reads/writes, I guess I will have to switch to ASIO, but for now I guess I will stay with blocking reads/writes.

Anyway, thanks for the help. I think that the problem is resolved, so we could probably close the issue?

rbeeli commented 1 month ago

Hi @MeanSquaredError I have added the following function to the blocking WebSocketClient:

    /**
     * Waits until a message is available to read from the WebSocket connection.
     * Note that this includes control frames like PING, PONG, and CLOSE frames.
     * 
     * Note that the timeout does not cause an error, but returns `false` if it expires.
     * 
     * @returns `true` if a message is available to read, `false` if timeout expires.
     */
    [[nodiscard]] inline expected<bool, WSError> wait_message(std::chrono::milliseconds timeout_ms
    ) noexcept
{ }

I believe this function should help to resolve your issue.

See the corresponding example here: examples/builtin/ex_wait_message_wss_builtin.cpp.

This way, you can periodically perform actions while waiting for a message/frame to arrive, without having to deal with errors or recreating the whole client; it simply returns false when the timeout elapses.

MeanSquaredError commented 1 month ago

@rbeeli Great, wait_message() is exactly what I need. Thanks!

rbeeli / websocketclient-cpp

Synchronous read timeout #4