SUBSCRIBE_DONE logic is gruesome

martinduke commented 2 weeks ago

How does a relay know when to send SUBSCRIBE_DONE?

The purpose of relay is to aggregate subscribes, so if overlapping subscribes come in with different endpoints I can't rely on a SUBSCRIBE_DONE from upstream.

If a SUBSCRIBE is entirely for future objects, the easiest thing to do is to send it when I've sent the last object ID, or a higher ID is published. One can add a timer per subscribe to allow for reordering, or just accept that reordered objects arriving after the last won't be delivered. I can't simply keep state for each object in the subscribe because I have no idea where there are object or group gaps. (maybe this is the use case for explicitly communicating these gaps after all?)

If a SUBSCRIBE is entirely for past objects, either I am serving them from cache (in which case I can provide an API to the cache to call when it is fully served), or I have to send a SUBSCRIBE upstream. If the SUBSCRIBE is unique, then I'll get a SUBSCRIBE_DONE from upstream. If it's aggregated somehow, once again I have to either watch for the last sequence number and run a timer, or keep score of all the objects I expect to see. IIUC, as it stands there is no guarantee that a subscribe for past objects will give me the complete set of objects.

If a SUBSCRIBE is for both past and future objects, then the union of both algorithms must both return true in order for the relay to send SUBSCRIBE_DONE. So the sender has to keep track of two separate object ranges independently. Because the SUBSCRIBE may arrive just as the publisher is incrementing the group ID, I need to track the max object ID in each group so I can tell when the past portion of the subscription is done.

At the origin, this is somewhat simpler in that published objects can be assumed to arrive in order and without gaps, and past objects can arrive in their own order and without gaps.

kixelated commented 2 weeks ago

+1 I can't figure out when a publisher is supposed to send this message. Is it supposed to wait until all data streams have been acknowledged or reset? Unfortunately that's not possible with many QUIC libraries (including W3C WebTransport) so the best you can do is a timer.

For transfork, I put the subscriber in control of terminating a subscription when all data (within the requested range) has been received. It does require drop notifications though (or reliable reset).

afrind commented 2 weeks ago

Individual Comment:

maybe this is the use case for explicitly communicating these gaps after all?

as it stands there is no guarantee that a subscribe for past objects will give me the complete set of objects.

What if we added these requirements/guarantees? A publisher is required to send every sequential group/object id in the range with no gaps (either the object or a placeholder object with the status explaining why no group/object is there), or terminate the subscription with error. End-of-group and end-of-track markers are also required.

It would make things much easier to reason about - I kind of like the idea. For sure it adds some overhead - is that a dealbreaker? It's a handful of bytes to skip and object or group or end of a group.

I'm not sure even those guarantees would solve the SUBSCRIBE_DONE problem though. Reordering of stream-per-object streams, datagrams in general, and stream resets (undefined) still seem problematic. Maybe timers are inevitable.

vasilvv commented 2 weeks ago

I wonder if we need to separate two kinds of SUBSCRIBE_DONE.

One is "EOF" generated by the original publisher. Those can be forwarded as-is, and don't necessarily indicate that there won't be further objects due to reordering. SUBSCRIBE_DONE isn't really needed for finite range subscribes, since the subscriber knows where that one ends.

Another is "error, and I closed the subscribe". That one can be generated both by the relay and the original publisher.

martinduke commented 2 weeks ago

Individual Comment:

maybe this is the use case for explicitly communicating these gaps after all?

as it stands there is no guarantee that a subscribe for past objects will give me the complete set of objects.

What if we added these requirements/guarantees? A publisher is required to send every sequential group/object id in the range with no gaps (either the object or a placeholder object with the status explaining why no group/object is there), or terminate the subscription with error. End-of-group and end-of-track markers are also required.

It would make things much easier to reason about - I kind of like the idea. For sure it adds some overhead - is that a dealbreaker? It's a handful of bytes to skip and object or group or end of a group.

I'm not sure even those guarantees would solve the SUBSCRIBE_DONE problem though. Reordering of stream-per-object streams, datagrams in general, and stream resets (undefined) still seem problematic. Maybe timers are inevitable.

Yes, as it stands, the spec provides essentially no expectations for what actually arrives for a subscribe. The sender might choose to omit objects in the middle of a group even though it has those objects. There are no requirements for the order in which things are sent.

And of course, there are the different stream mappings which have their own implications; in some cases FIN or RST of a stream might provide a hint, though I know Ian is resistant to exposing these explicit QUIC signals to MoQT.

It makes coding the receiver quite challenging.

moq-wg / moq-transport

SUBSCRIBE_DONE logic is gruesome #465