moq-wg / moq-transport

draft-ietf-moq-transport
Other
80 stars 18 forks source link

Group IDs may not indicate a temporal ordering - Then what is a Group ID? #492

Open ianswett opened 1 month ago

ianswett commented 1 month ago

@fluffy and @wilaw Said that group IDs may be sparsely used, particularly in cases when they were chosen in a way to avoid collisions.

This sounds like an interesting set of use cases, but previously my understanding was that groups indicated some sort of temporal ordering, even if it's not strict. If Group IDs don't imply temporal ordering and join points, and instead they're a way to publish Objects in a way that don't collide with one another, it seems like a different thing?

Top of head strawman: Possibly different users could have the ability to publish Objects as themself for a single track?

Having a few specific use cases would be helpful here, because it feels like there are real use cases, and the existing mechanisms might be awkward at best? It would also be useful to understand the use cases so we understand if they're in scope for MoQ.

ianswett commented 1 month ago

Possibly relevant to #484

wilaw commented 1 month ago

I have advocated strongly for having Group IDs begin at 0 and increment by 1, with no gaps. This restriction (which is identical to what we have for Objects) brings us some valuable benefits:

  1. Arbitrary relays, without any other information, can know if Groups are missing
  2. Clients can tell if groups are missing, without having a catalog to define the numbering scheme.

I spoke this over with Cullen and Suhas. They mentioned that they have a use-case in which clients will publish data under the same namespace and name, but using different ranges of groups IDs (correct me if I'm wrong there guys) and that therefore GroupIDs cannot increment by 1. My concern is that this one use-case is burdening the transport with a whole ton of complexity around knowing where there are gaps. Witness #423. I'd like to simplify moq-transport by setting restrictions on GroupsIDs and then help brainstorm alternate solutions to their multi-publisher use-case that are compliant with a simpler version of moqt.

kixelated commented 1 month ago

I completely disagree with the use-case @fluffy and @suhasHere have suggested. A track must have a single publisher and I just can't envision how MoQ would work otherwise.

If you have multiple publishers, then make multiple tracks/namespaces. A subscriber can request them all. Or a muxer can request them all and produce a new merged track at incrementing group boundaries.

gwendalsimon commented 1 month ago

I am also in favor of Group ID beginning at 0 and incrementing by 1.

suhasHere commented 1 month ago

There was this issue discussion on groupId and gaps and why relays need not do the book-keeping on gaps between groupIds or such .. https://github.com/moq-wg/moq-transport/issues/427

fluffy commented 1 month ago

I'm totally lost on what problem people think they are trying to solve with this issue.

I really object to how we constantly reopen stuff for no reason. The protocol works fine with multiple publishers and out of order groups - we have running code that does that. We have discussed this is the past and the use cases. There is no new data that is like here is why this causes performance problems. We have multiple use cases that take advantage of this. ( some of the variations of group chat, logging, metrics )

GroupID are left to the applications using them to choose. Due to timing and priority issues, if a relay receives group 2, object 1, it can not assume it will never receive another object form group 2. Some applications use them to indicate ordering and dependency. The data model makes it clear they are a join point, and because of that things that have dependencies can use them for the dependent things.

The proposal we had for make before break can results in out of oder groups.

afrind commented 1 month ago

Speaking as Chair:

@fluffy

I'm totally lost on what problem people think they are trying to solve with this issue.

The issue as filed is soliciting use cases. Will stated you have some, it would help if you could document them here in as much detail as possible. Other folks seem to be advocating for a particular design without seeing the full picture. It would probably help if we understood the use cases first.

The protocol works fine with multiple publishers and out of order groups - we have running code that does that.

I'm surprised to hear about use cases where multiple publishers are publishing to the same track simultaneously in different groups. I don't ever recall talking about this pattern, apologies if I missed it.

There is no new data that is like here is why this causes performance problems

Victor pointed out here (https://github.com/moq-wg/moq-transport/issues/427#issuecomment-2048417109) that arbitrary group or object numbering interact poorly with subscribe ranges/fetch. Because a relay can never know the how the id spaces are populated for a track, the relay/cache has to assume all ids exist until told otherwise by the original publisher. Suhas indicated that if a publisher drops (skips?) a group it will signal that. There are scalability challenges if the 62 bit space is sparse - perhaps people were talking past each other in the issue?

We have discussed this is the past and the use cases.

I think the group is struggling because use cases and the protocol design elements they depend on are not well documented, in writing, in a central place. I've gone back and forth about resurrecting the use cases and requirements draft and adding new detailed sections that have this scope (eg: explaining make-before-break, and what moqt protocol elements it requires). Ideally, we would get consensus on the use cases we are trying to support.

Would folks find it helpful? I'm wary of having folks invest in writing something if we're not going to read it.

suhasHere commented 1 month ago

Because a relay can never know the how the id spaces are populated for a track, the relay/cache has to assume all ids exist until told otherwise by the original publisher.

As discussed at length in the other issue, "no relay should not assume any structure between groups". Groups are independent join points and we should not add more application semantics at the MOQT layer . it tends to more bookkeeping which is not really needed.

suhasHere commented 1 month ago

Let me take a stab at going through the range of solutions that can exists when fetch requests are made for large absolute ranges. For this discussion, I am ignoring caching since it is easy to think through caches independently.

When a relay receives a fetch request for large absolute range, one the following can be the response it gets from the upstream

  1. Relay receives a marker object for each group that is not included.
  2. Relay receives some kind of maker object with range of groups that doesn't exist. [ this is optimization on 1]
  3. Relays receives no indication on the groups not included.

These options apply similarly regardless of group Ids being sequential or non-sequential and when things are missing or not produced or so.

Does this makes sense ? Happy to add a PR to clarify some of these .

afrind commented 1 month ago

@suhasHere @fluffy and others,

We're not looking for solutions to support non-sequential groups yet - we're still looking to get documentation of the application use case that drives the non-sequential group requirement.

fluffy commented 1 month ago

So I know this sounds a bit snarky and I really don't mean it that way but have you read https://datatracker.ietf.org/doc/draft-jennings-moq-metrics/ ? There also been a bunch of similar things. I sort of work on the assumption chairs are reading the drafts in the WG.

afrind commented 1 month ago

but have you read ...

:D Thanks for the bump.

Well now I've skimmed it at least far enough to see this:

   The MOQT group ID identifies point in time when a given set of
   metrics were captured by the resource.  Group ID, thus represents
   capture time as number of milliseconds since "1 Jan 1972" using NTP
   Era zero conventions and truncated to 62 bit integer.  The first
   object, with MOQT object ID of 0 captures 2 pieces of information:

   1.  The capture timestamp as UNIX Epoch time in nanoseconds since
       00:00:00 UTC on 1 January 1970.

   2.  One or more attributes scoped to a given resource specified in
       the track name.  This field is optional and if omitted, the
       attribute values correspond to the most recent object 0 that had
       any attribute values.

I'll take a closer look. Let's use the metrics repo for any discussion about it's design rather than having it here.

Are there other use cases you would like to surface here? Even if the chairs are reading every draft, not everyone else in the wg is.

wilaw commented 1 month ago

When a relay receives a fetch request for large absolute range, one the following can be the response it gets from the upstream

Relay receives a marker object for each group that is not included. Relay receives some kind of maker object with range of groups that doesn't exist. [ this is optimization on 1] Relays receives no indication on the groups not included.

These options apply similarly regardless of group Ids being sequential or non-sequential and when things are missing or not produced or so.

So lets imagine the client requests the range 0-10,000, but the publisher only produces group ID's at odd numbers. Is the proposal that each relay in the distribution chain send along 5,000 marker objects to indicate the gaps? That doubles the number of objects sent in response to each subscription. Compare that to a scheme in which groups IDs are sequential and zero markers need to be sent. Surely a marker-less scheme is more efficient in caches and across the wire?

fluffy commented 1 month ago

So the use case for fetch has generally been VOD and getting the most recent 15 seconds on live streaming video. Both of these case would make sense to use sequential groups so I don't see this problem. The use case people have brought up about sparse groups are not so much cases where getting the old data makes much sense and also some of them the applications know what the groups are if they wanted them.

I'm of course not suggesting a classic VOD use case would use non sequential groups, however it might on scalable codecs, but what I don't see why the relays do anything different if groups have to be sequential or not.

Consider a live streaming like use case where groups are sequential and it is stream per group. A relay gets 3 streams coming in to it and on stream A get gets an object in group 1, on stream B it just waiting and has not received anything yet. And on stream C it gets an object from group 3. I probably don't want to have the relays not send objects from group C because they are waiting for something that might arrived from group 2. I'm not understanding what people think they want to do in the relays that requires the groups to be sequential.

Overall, we need a design for fetch (lacking a better name for subscribe to old data). If fetch ends up for priority reasons to be delivered over a single stream, then it upstream just puts in order stream. Of course other designs are possible but am not getting what people want and why

suhasHere commented 1 month ago

@suhasHere @fluffy and others,

We're not looking for solutions to support non-sequential groups yet - we're still looking to get documentation of the application use case that drives the non-sequential group requirement.

My intent was not to provide a solution. I was instead proposing non-normative clarification text on the solution space which applies similarly regardless of the groupId structure.

However, I feel once we resolve fetch api, flow control and limits around handling large absolute ranges, we can revisit to see if groupId are still the issue or not.

afrind commented 1 month ago

They mentioned that they have a use-case in which clients will publish data under the same namespace and name, but using different ranges of groups IDs (correct me if I'm wrong there guys) and that therefore GroupIDs cannot increment by 1.

Are logging and metrics the motivating use cases for multiple publishers of the same track? Reading the drafts, it doesn't appear like that is the intent, or you could get collisions. If you have any use cases like this, can you share them?

afrind commented 1 month ago

Individual Comment:

previously my understanding was that groups indicated some sort of temporal ordering, even if it's not strict

Based on how SUBSCRIBE is defined now, I think both group IDs and object IDs have to indicate an ordering. If I SUBSCRIBE at a given start group and object, only published objects with groups numerically greater than or equal to that will be delivered. This is similar with SUBSCRIBE_UPDATE specifying the end of a subscription. Object IDs are also numerically comparable for prioritization within their group and priority level. There's text all over the draft implying these comparisons are valid, for example in stream-per-track, group IDs cannot go backwards.

If an application wants to pick IDs at random, use a hash scheme, or dole out ranges to multiple publishers where newer things have numerically smaller IDs, then some funny interactions and protocol violations may ensue. No one has publicly specified a use case yet for doing this.