moq-wg / moq-transport

draft-ietf-moq-transport
Other
71 stars 16 forks source link

SUBSCRIBE state machine #237

Open kixelated opened 10 months ago

kixelated commented 10 months ago

We should have a section that documents the subscription state machine, like QUIC does with stream states: https://www.rfc-editor.org/rfc/rfc9000.html#section-3

It makes it significantly easier to discuss about which messages are necessary and reason about what is available at each state. Implementations are certainly allowed to work outside of this framework, but personally it was a super valuable reference when I was implementing QUIC streams.

There's not much point documenting the current subscription states because they dead-end quickly. Here's my proposal that supports publisher and subscriber resets, via RESET and STOP respectively:

Publisher

publisher

Subscriber

subscriber

kixelated commented 10 months ago

So just to elaborate the proposal a bit more.

There's an ID in each message that identifies the subscription state. It's up for debate if we allow reusing IDs ("track_id") or if each new subscription auto-increments the ID ("subscription_id").

The messages contain other information but they don't impact the subscription state. For example, SUBSCRIBE_OK can contain expires, but it will not automatically move the subscription into the closed state without an explicit SUBSCRIBE_RESET.

kixelated commented 10 months ago

I would also like to see each message contain an (optional?) group/object sequence. This is useful for providing a range of valid OBJECTs and knowing what might be in transmit, although there is no guarantee.

The only questionable one is SUBSCRIBE_STOP, since it could refer to a group that doesn't exist yet. It does seem kind of useful but it would mean queuing up unsubscriptions.

wilaw commented 10 months ago

By the same logic that suggests a SUBSCRIBE_OK response to the SUBSCRIBE command , should there also be a SUBSCRIBE_STOP_OK response to the SUBSCRIBE_STOP command?

kixelated commented 10 months ago

By the same logic that suggests a SUBSCRIBE_OK response to the SUBSCRIBE command , should there also be a SUBSCRIBE_STOP_OK response to the SUBSCRIBE_STOP command?

My proposal is that the publisher sends SUBSCRIBE_RESET in response to SUBSCRIBE_STOP (subscriber close), unless a SUBSCRIBE_RESET has already been sent (publisher close).

The main race condition that needs to be addressed is when both the publisher and subscriber try to close the subscription at the same time.

     SUBSCRIBE -> |
                  | <- SUBSCRIBE_OK
SUBSCRIBE_STOP -> | <- SUBSCRIBE_RESET
         (what happens now?)

My proposal is pretty simple, the subscription is closed when a RESET is sent/received. It's the only way to advance to the terminal state and any further messages can be ignored.

If you explicitly acknowledge each message like Suhas is proposing, then you get into an ambiguous state. It's reasonable behavior for endpoints to reply with a corresponding OK, ERROR, or even nothing. These are bugs just waiting to happen, especially because this is triggered by a rare and likely untested race condition.

I suggest to try drawing up the state machine if explicit OK/ERROR messages seem like a good idea. You'll find that some of the messages are redundant (SUBSCRIBE_RESET == SUBSCRIBE_STOP_OK), some are useless (SUBSCRIBE_RESET_OK, SUBSCRIBE_STOP_ERROR), and that it's difficult to enter a terminal state when both endpoints close at the same time. I could try to draw it up too when I get home.

hardie commented 10 months ago

Speaking as an individual, I agree that having the state machine laid out will improve the discussion.

An additional point on the SUBSCRIBE_STOP semantics in-line.

On Tue, Aug 22, 2023 at 7:30 PM kixelated @.***> wrote:

By the same logic that suggests a SUBSCRIBE_OK response to the SUBSCRIBE command , should there also be a SUBSCRIBE_STOP_OK response to the SUBSCRIBE_STOP command?

My proposal is that the publisher sends SUBSCRIBE_RESET in response to SUBSCRIBE_STOP (subscriber close), unless a SUBSCRIBE_RESET has already been sent (publisher close).

The main race condition that needs to be addressed is when both the publisher and subscriber try to close the subscription at the same time.

SUBSCRIBE -> | | <- SUBSCRIBE_OK SUBSCRIBE_STOP -> | <- SUBSCRIBE_RESET (what happens now?)

My proposal is pretty simple, the subscription is closed when a RESET is sent/received. It's the only way to

If both are trying to close this at the same time, there is a condition where the group_id pair:

SUBSCRIBE_STOP: request_end_group and the racing SUBSCRIBE_RESET: end_group

doesn't match. If the client requested a later group than the end_group in the racing SUBSCRIBE_REST this requires client side action to update its expectations (as it may cause an error condition for the application using the transport). The state machine will also have to be clear on whether SUBSCRIBE_RESET used as a SUBSCRIBE_STOP_OKAY can ever alter the end_group proposed in SUBSCRIBE_STOP (e.g. to a group id that's already in flight).

advance to the terminal state and any further messages ignored.

If you explicitly acknowledge each message like Suhas is proposing, then you get into an ambiguous state. It's reasonable behavior for endpoints to reply with a corresponding OK, ERROR, or even nothing. These are bugs just waiting to happen, especially because this is triggered by a rare and likely untested race condition.

I suggest to try drawing up the state machine if explicit OK/ERROR messages seem like a good idea. You'll find that some of the messages are redundant (SUBSCRIBE_RESET == SUBSCRIBE_STOP_OK), some are useless (SUBSCRIBE_RESET_OK, SUBSCRIBE_STOP_ERROR), and that it's difficult to enter a terminal state when both endpoints close at the same time.

— Reply to this email directly, view it on GitHub https://github.com/moq-wg/moq-transport/issues/237#issuecomment-1688709800, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKVXZFWSAHJVM237Q57653XWT3FXANCNFSM6AAAAAA3V3D6FA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

kixelated commented 10 months ago

If both are trying to close this at the same time, there is a condition where the group_id pair: SUBSCRIBE_STOP: request_end_group and the racing SUBSCRIBE_RESET: end_group doesn't match. If the client requested a later group than the end_group in the racing SUBSCRIBE_REST this requires client side action to update its expectations (as it may cause an error condition for the application using the transport). The state machine will also have to be clear on whether SUBSCRIBE_RESET used as a SUBSCRIBE_STOP_OKAY can ever alter the end_group proposed in SUBSCRIBE_STOP (e.g. to a group id that's already in flight).

Exactly. That's why I added a separate "stop" state for the subscriber instead of jumping straight to "closed".

Ultimately, the publisher is the authority on the start/end group. It knows the minimum sequence in the cache and the maximum sequence that has been transmitted. The subscriber can hint a start/end but it has imperfect information until it gets an explicit reply.

Suppose a subscriber sends a STOP with group X, an RTT passes, and then receives a RESET with group Y.

A subscriber may choose to process objects while in the "stop" state (valid in QUIC), during which time it MAY receive objects >X. Once it receives the RESET and transitions to the "closed" state, then it MUST NOT receive objects >Y.