quicwg / base-drafts

Internet-Drafts that make up the base QUIC specification
https://quicwg.org
1.62k stars 204 forks source link

Handling of a lost push signal #4930

Closed kazuho closed 2 years ago

kazuho commented 2 years ago

Current design of server push relies on the client being capable of associating two streams: a stream that triggers push (i.e. the stream on which PUSH_PROMISE was sent) and a stream that conveys the pushed response (i.e. the push stream carrying Push Stream Header). Push ID is used for associating the two streams.

However, there is no mechanism that guarantees the delivery of Push ID for neither of the two streams. Both PUSH_PROMISE and Push Stream Header may get lost when the server resets the stream or when the client requests the server to stop sending.

If either signal is lost, the client fails to associate the information that was received on the other side that was not lost, and ends up retaining the other side forever.

Originally reported by @tatsuhiro-t on the mailing list thread. Examples that lead up to this problem can be found in this email and this.

kazuho commented 2 years ago

To fix the issue, we have to meet the following two requirements:

  1. PUSH_PROMISE never gets lost, and
  2. Require either of the following two: a. servers to make sure that Push Stream Headers are delivered to the client and clients never send STOP_SENDING frames for push streams until it receives Push Stream Headers, or b. change the push identifier to a value that can be guaranteed to be delivered even when the push stream is reset at any moment.

We can meet requirement 1 by sending PUSH_PROMISE on the control stream.

For requirement 2, the difficulty of a is that there's no way of HTTP/3 stacks telling if something is delivered to the client at the transport level. We could introduce an HTTP/3-level acknowledgement frame much like we do in QPACK, but that's going to be complicated.

For option b, we might consider getting rid of Push ID and send the stream ID of the push stream on the PUSH_PROMISE frames. We require QUIC stacks to either guarantee the delivery of FIN or RESET_STREAM for the streams that have been used by the sender. The problem here would be that, at the moment, we do not require QUIC stacks have an API to notify HTTP/3 stacks that a stream has been reset when not a single byte of data have been received (we can assume that HTTP/3 stacks would be notified of resets once some bytes have been received and provided to the HTTP/3 stacks). But maybe fixing this would be easier than option a.

tatsuhiro-t commented 2 years ago

With removing push ID and using stream ID, we are no longer able to share a push stream with multiple streams and there is no limit for PUSH_PROMISE. If that is acceptable, then there is an another option: create a push stream first and send PUSH_PROMISE to it. This way, all information is in one place. To cancel a push stream, just STOP_SENDING by receiver or RESET_STREAM by sender to a push stream. CANCEL_PUSH is not needed anymore.

Yet another option is drop server push from draft entirely. Leave it to a future extension draft with a concrete mitigation.

kazuho commented 2 years ago

With removing push ID and using stream ID, we are no longer able to share a push stream with multiple streams and there is no limit for PUSH_PROMISE. If that is acceptable ...

I'm not sure if I agree with your observations, due to the following two reasons:

Yet another option is drop server push from draft entirely.

I did not want to bring that option to the table, but it works for me. Considering the fact that we did not notice this design issue until as late as now, it is questionable if we are really the right people to design this feature, now.

RyanTheOptimist commented 2 years ago

On Thu, Jul 29, 2021 at 1:31 AM Tatsuhiro Tsujikawa < @.***> wrote:

Yet another option is drop server push from draft entirely. Leave it to a future extension draft with a concrete mitigation.

I strongly support this proposal. We are quite late in the process to be rethinking how push works, and as a consequence are unlikely to have significant deployment experience with whatever new design we come up with. As a result, I think we are unlikely to have a deep understanding of the performance implications in time. I think it would be wise to punt this to an extension.

Cheers,

Ryan

tatsuhiro-t commented 2 years ago

@kazuho

I'm not sure if I agree with your observations, due to the following two reasons:

I misunderstand your description of option b, that is I wrongly thought that just the associated stream ID is set in PUSH_PROMISE, which leads me to the wrong observation. Sorry for the noise.

kazuho commented 2 years ago

@tatsuhiro-t No need to say sorry. Rather, I think we have to thank you for noticing the issue because otherwise we might have ended up shipping something broken.

Besides, option b is indeed complex. I hope we find a simpler fix that we can apply, but for me, option b was as hard as I could go.

ianswett commented 2 years ago

I also support moving this to a future extension, given the discussion above.

I'll note that we made the decision as a WG to ship 'h3' when the transport drafts shipped. I think that greatly limits the scope of any changes we can make to the existing push design using the existing SETTING.

afrind commented 2 years ago

Is the issue mitigated if the client has a timer for any half-pushes (promise with no stream or stream with no promise) and cancels them if they are not "joined" before the timeout?

Note that a client probably should have such a timer anyways, since a broken or malicious server could otherwise just send one of the two push components but not the other.

MikeBishop commented 2 years ago

I second @afrind's proposal -- a malicious server can always cause this problem, and it warrants text in Security Considerations. The fact that this situation can occur accidentally as well doesn't change what's required.

kazuho commented 2 years ago

@afrind

Is the issue mitigated if the client has a timer for any half-pushes (promise with no stream or stream with no promise) and cancels them if they are not "joined" before the timeout?

Note that a client probably should have such a timer anyways, since a broken or malicious server could otherwise just send one of the two push components but not the other.

That's a good point. The other mitigation would be to reopen an HTTP/3 connection once in a while. By doing such things, clients can clean up garbage.

But I think there might be a different failure mode that cannot be covered.

Let's say that for a given Push ID, both PUSH_PROMISE and the Push Stream were lost. Then, the client is never send a CANCEL_PUSH for that Push ID, even though the server has already used that ID.

If the client is designed to support a constant number of concurrent pushes, and this happens, then the server loses room for initiating new pushes without the client noticing that.

Perhaps the most probable case of us hitting this failure mode would be clients fetching resources speculatively and then cancelling them frequently. If a server tries to associate a push against such a request, and then tries to cancel the pushed request when the client cancels the client-initiated request that initiated the push, then there's chance the client does not receive neither of the two signals.

This silent loss of room to push might be an issue for HTTP applications that are designed on the presumption that a server would always have room to push, assuming that the client updates MAX_PUSH_ID as it receives new pushes.

All that said, I do agree that none of the failure modes are severe. Endpoints do not end up in a state where they are forced to close the HTTP/3 connection while client-issued requests are inflight. Therefore, I think I would not argue against writing down the problems and shipping HTTP/3 with server push, if that's the preference of the working group.

LPardue commented 2 years ago

What @kazuho says reminds me of some of the text we added in https://datatracker.ietf.org/doc/html/draft-ietf-quic-http-34#section-6.2 to speak to the fact that connections could be degraded based on some certain interactions of the protocol.

MikeBishop commented 2 years ago

Can you help me understand how that would happen?

It's that middle part that I don't get. It's not clear why the client would discard a stream until it sees the type and the Push ID to determine it's not useful. If the server is resetting the stream independently, we already encourage servers to send CANCEL_PUSH in this circumstance for exactly this reason:

A server sends CANCEL_PUSH to indicate that it will not be fulfilling a promise which was previously sent. The client cannot expect the corresponding promise to be fulfilled, unless it has already received and processed the promised response. Regardless of whether a push stream has been opened, a server SHOULD send a CANCEL_PUSH frame when it determines that promise will not be fulfilled.

What I would expect to happen instead is:

Do we simply need a corresponding SHOULD NOT abort reading until you've seen the stream type, and SHOULD NOT abort reading a push stream until you've seen the Push ID?

tatsuhiro-t commented 2 years ago

I think an endpoint has right to cancel or abort new stream to protect itself due to memory pressure or high load. The RFC9000 allows an endpoint not to send STOP_SENDING without passing data to app (we can require QUIC stack to pass such data in this case as discussed in ml) if it sees fin and server does not notice that pushed stream is silently rejected. Even if client sends STOP_SENDING, by the time STOP_SENDING is received by the server, it has finished processing the pushed stream and forgets about it. I imagine that if QUIC stack provides BSD socket-like interface and HTTP/3 server writes all data and can clear the memory without waiting for an acknowledgement.

That said, if we ensure that client gets either PUSH_PROMISE or push ID in pushed stream, timer works. One concern to this approach is the impact to connection-level flow control credit. If client holds pushed stream until corresponding PUSH_PROMISE is received, it might not give back flow control credit to server, which means decreased flow control window until timer fires. How much it costs depends on the settings. The current text says it should give at least 1024 bytes to a remote endpoint, so if 100 such pushed streams are held, 100k is spent for these streams until timer fires. This is a bit extreme case however. Given the possibilities that this occurs, it is probably not a severe problem.

As for using timers for the counter measures to broken or malicious servers, I'm not sure the timer is needed. If client cancels pushes when timer fires, those servers just open another pushes and quickly create the same situation again.

MikeBishop commented 2 years ago

The protection is against keeping the memory around indefinitely, even after the stream has been closed. It's definitely true that a server can consume as much memory with spurious pushes as the client allows it to open streams.

kazuho commented 2 years ago

@MikeBishop Thank you! I think your explanation and the PR are correct.

Push Streams carry the response part of a server push. The lack of the response is indicated by CANCEL_PUSH. We would have the guarantee that either one would be noticed by the client, if we require clients to always read at least up to the Push ID field of a Push Stream. I see that requirement being added by the PR, as a recommendation (SHOULD). That strength seems to be consistent with other requirements that we have.

PUSH_PROMISE sends the request headers of a server push. But unlike the response part, there is no way of signalling the potential lack of. Hence, a timer is needed. I also see that in the PR.

LPardue commented 2 years ago

The proposed resolution is #4931. At our AD direction, the chairs sent a consensus call to the WG list. We can confirm there is consensus to incorporate the new text. We'll do that during the AUTH48 changes, which the editor will run.

MikeBishop commented 2 years ago

Closed in #4931.