snapview / tokio-tungstenite

Future-based Tungstenite for Tokio. Lightweight stream-based WebSocket implementation
MIT License
1.88k stars 236 forks source link

`poll_ready` flush after `WouldBlock` errors #287

Closed alexheretic closed 1 year ago

alexheretic commented 1 year ago

With this PR poll_ready continues to generally return Ready. However, after start_send encounters WouldBlock poll_ready will instead try to flush.

When poll_ready->flush returns Ready poll_ready will revert to just returning Ready ... until the next WouldBlock.

This mechanism prevents the case tokio-tungstenite users filling up the tungstenite write-buffer without actually handling any errors (since WouldBlock errors are not returned to the user).

We also maintains feed generally not flushing as it will only do so after encountering WouldBlock errors. So should perform well.

See https://github.com/snapview/tokio-tungstenite/issues/286#issuecomment-1595678753.

Resolves #286

Questions/alternatives

alexheretic commented 1 year ago

By writing the question _"Should pollready try to flush if it is !ready?" I've mostly convinced myself that it should.

~Someone stop me or I'll add that to this PR in a bit...~

Update: I did that and updated the PR description.

daniel-abramov commented 1 year ago

It looks like a quick fix, but it feels a bit risky as we need to make many assumptions about how tungstenite should buffer the messages. I'm also not sure if there are no edge cases that we need to consider, e.g. the latest changes in tungstenite allow specifying how much data would be buffered and not written to the underlying stream before hitting a limit, which theoretically means that if a size of that limit/threshold being close to the size of our buffer, we can get to the point where N feed()s (i.e. poll_ready() + start_send()) called in a row would simply buffer N messages filling up the internal out buffer in tungstenite without writing anything to the stream causing the N+1 attempt to call feed() to return an error telling that the buffer is full.

I don't think there is an elegant way to fix this issue (surely we could add some ifs comparing the threshold value ad the out buffer with the frame size, but I'm not sure if it's really worth the effort and if it introduces too many complexities).

Personally, I think that if we really want to have the maximum flexibility for the buffering/flushing, it would be better to introduce explicit methods for that and keep the former implementation of the Sink (that seem to have always behaved in a correct [albeit inefficient] way).

alexheretic commented 1 year ago

I think it is important that feed in general does not flush. This is the reason feed and send exist. Having both of them always flush seems just incorrect and a performance bug. We don't need to do that anymore. It seems a bit silly to have feed behave incorrectly in order to ensure it behaves ... correctly.

To be clear the reason for feed to exist is:

Unlike send, the returned future does not flush the sink.

I don't see the configurable write_buffer_size & max_write_buffer_size as an edge case, it is just a buffering behaviour within the general contract of a write interface. If the user sets the max_write_buffer_size low enough they may see an error. A simpler example, if a user sets the max_write_buffer_size smaller than the size of a message all feed attempts will fail no matter how you flush. Can you explain why this breaks the Sink contract?

As I see it now this behaviour looks correct. feed generally won't flush, but now will try to flush when writes block. This matches the Sink docs suggestions for behaviour.

Implementations of poll_ready and start_send will usually involve flushing behind the scenes in order to make room for new messages. It is only necessary to call poll_flush if you need to guarantee that all of the items placed into the Sink have been sent.

All other tungstenite errors are simply returned to the user to deal with. That isn't a change from existing behaviour. So the only interesting error is WouldBlock since it is not returned to the user. And that was the issue with https://github.com/snapview/tokio-tungstenite/pull/284 that it didn't handle this hidden error any differently from the flush happy past.

This PR addresses that by flushing after WouldBlock errors. That means in general feed will not flush (which is what it exists to not do) but if the underlying stream starts blocking it will try to flush meaning we'll either flush or return Pending and reflect the blocking properly. I still think this PR seems a good fix for the issue.

jgallagher commented 1 year ago

I don't see the configurable write_buffer_size & max_write_buffer_size as an edge case, it is just a buffering behaviour within the general contract of a write interface. If the user sets the max_write_buffer_size low enough they may see an error. A simpler example, if a user sets the max_write_buffer_size smaller than the size of a message all feed attempts will fail no matter how you flush. Can you explain why this breaks the Sink contract?

If I set max_write_buffer_size to a value that is in between the size of my messages and the size at which we start seeing WouldBlock errors, what happens if I call feed() repeatedly without flushing? I think (just from reading the changes, not trying it, so could certainly be wrong!) that:

  1. We would never set self.ready = false (because we haven't sent enough data to start seeing WouldBlock), so poll_ready() always returns Ready
  2. Once we hit max_write_buffer_size, we'd get a failure in start_send

This seems to break the contract of feed, right? I would expect to be able to feed() continuously and the Sink should flush as needed, which wouldn't happen in this case. Maybe my configuration is questionable, but it seems like it should be valid? I've based my max_write_buffer_size on the size of my messages (and ensure it's large enough to hold some number of them), but because it's smaller than some underlying value I don't know I hit this edge case.

alexheretic commented 1 year ago

Firstly thinking about hypotheticals, max_write_buffer_size should never be set at or below write_buffer_size, its should always at least write_buffer_size + 1 message. We should probably add some basic errors around this upstream. But in the following I'll assume these two are set to reasonable things or default.

We would never set self.ready (because we haven't sent enough data to start seeing WouldBlock), so poll_ready() always returns Ready

If you run your example you'll see this isn't the case. That's because the write-buffer isn't filling up. If the underlying stream is not returning WouldBlock (or any other error that you would see) that means the writes are going into the underlying stream and not building up in tungstenite's write-buffer.

So that is the aim of this PR that you can call feed continuously and flush will happen after WouldBlock as needed, no build up of messages in tungstenite's write-buffer over the write_buffer_size + 1 message.

I've based my max_write_buffer_size on the size of my messages (and ensure it's large enough to hold some number of them)

As mentioned before it if bounded the max should always be at least write_buffer_size + 1 message. I actually don't see a huge benefit to bounding at all it unless you aim to ignore (or just log) most errors and only handle WriteBufferFull.

The important point is the write-buffer only fills up past write_buffer_size when the underlying stream writes are failing.

I think I have clearly not documented it well enough upstream :( Here's what I have currently, let me know if you have suggestions!

write_buffer_size: usize

The target minimum size of the write buffer to reach before writing the data to the underlying stream. The default value is 128 KiB.

Note: flush will always fully write the buffer regardless.

max_write_buffer_size: usize

The max size of the write buffer in bytes. Setting this can provide backpressure in the case the write buffer is filling up due to write errors. The default value is unlimited.

Note: Should always be set higher than write_buffer_size.

Update: I've tried to clarify the docs further based on this discussion: https://github.com/snapview/tungstenite-rs/pull/361

alexheretic commented 1 year ago

imo the approach in this pr seems good, I've tried to explain why as clearly i can. Are there remaining concerns @daniel-abramov?

daniel-abramov commented 1 year ago

imo the approach in this pr seems good, I've tried to explain why as clearly i can. Are there remaining concerns @daniel-abramov?

Apologies for the long delay.

Let's give it a try, thanks for explaining your point!