moq-wg / moq-transport

draft-ietf-moq-transport
Other
72 stars 16 forks source link

1 producer received per broadcast? multiple producers received per broadcast #87

Open vr000m opened 1 year ago

vr000m commented 1 year ago

There seems to be a strong assumption to have 1 publisher in a broadcast.

For use-cases in which there are multiple broadcasters, is the viewer watching one broadcast or several? There is a bandwidth/fairness consequence of a viewer watching multiple broadcasts.

For example: NFL Red zone.

ESPN is ingesting several broadcasts, each game could be its own broadcast. some one could be watching just one of the games, however, it is possible for a viewer watching Red zone get simultaneous games, the priority of these individual feeds maybe equal. In this case the relay and the viewer can co-ordinate a way to get all/some of the streams depending on the user-experience.

Or another similar example is video conferencing, such as, town halls, fitness, newscasts, etc. where there may be a few speakers and several more viewers.

suhasHere commented 1 year ago

In general we need to acknowledge multiple publishers within a MoQ MediaSession and how it gets, identified and delivered

afrind commented 1 year ago

Speaking as an individual: this ties to what Ted called a composition -- which represents what the viewer is actually watching. I think we agree that a viewer can be watching something with multiple simultaneous publishers. Where there might be disagreement is to what degree that needs to be captured in the transport protocol. The NFL Red Zone case could also be accomplished by having each game as a separate broadcast (received over one or many QUIC connections), with the application handling the composition.

What are the tradeoffs for having multiple broadcasts in a single session vs one broadcast per session with multiple sessions?

vr000m commented 1 year ago

The biggest advantage of a single session or multiple sessions is how the congestion control(s) acts or interacts across these QUIC connections.

In case of a bandwidth limitation, all the video streams that make up the composite cannot be received. In this case, some degradation choices need to be made: lower frame resolution or frame rate (skipping some frames) for some or all videos that make up the broadcast.

In one world, the relay which is broadcasting the composited videos understands the underlying priority of the frames from one broadcast and the other and can manage the priority queue of packets/frames.

In another world, the client which is receiving several broadcasts and compositing on the client side, it can unilaterally decide which video streams to receive, and which frames to not to subscribe to (by cancelling the streamIDs).

Personally, in the latter case, I like the client being able to decide (independently from the relays) which videos in the composite it wants to degrade, however, I also like the simplicity of a composite broadcast, because relays may benefit from knowing which set of video streams are related and can cache them appropriately.

wilaw commented 1 year ago

I also like the simplicity of a composite broadcast, because relays may benefit from knowing which set of video streams are related and can cache them appropriately.

@vr000m - can you clarify what cache them appropriately means? If a relay knows streams are related, I can imagine how it might prioritize resources between those streams in the face of congestion, however why would it cache them differently compared to unreleated streams? Caching is for the benefit of other (parallel or subsequent) sessions, not the session that triggers the initial request.

kixelated commented 1 year ago

Yeah, "broadcast" is certainly not the best terminology.

My intention is that a "broadcast" is a collection of tracks that share a common presentation timestamp and prioritization number space. This implies that a single encoder (source) produces a broadcast, although it doesn't rule out transcoding if it uses the same number space (ie. timestamp passthrough).

A "composition" is a collection of multiple broadcasts presented to the viewer. These can be synchronized based on program time (ie. wall clock) but absolutely not based on the presentation timestamp (PTS). This is completely out of scope of the transport.


Now let's go to your NFL example. First off, we have multiple camera feeds. These are separate encodes, and in this terminology, start out as separate "broadcasts".

The studio decodes each feed, synchronizes based on program time (wall clock), and does "server-side compositing". In the current terminology, this means creating a new "broadcast" for each game that could have any number of tracks (ex. translations, captions, camera angles). These tracks could be prioritized against each other, for example audio > video, or even main feed > sideline feed. The idea is that there's an authority who can indicate what content is more important.

If a viewer is watching a single game, then they receive a single broadcast. But what if a viewer wants to watch multiple games? The server doesn't know how to prioritize or synchronize (wall clock skew) between these two broadcasts as they were encoded at separate sources. There's two options:

  1. Client-side compositing. The viewer subscribes to two separate broadcasts (possibly over different connections) and chooses how to render them side by side.
  2. Server-side compositing. The server transcodes, or at the very least, rewrites timestamps/priorities to interleave both broadcasts. The viewer can then subscribe to a single broadcast.

In the real-time latency world, 99% of the time multiple feeds are combined using client-side compositing. There's too much cost/latency introduced by server-side compositing. But in the higher latency world, 99% of the time it's done by server-side compositing. It's easier and more network efficient to transcode all feeds into a single feed.

A problem with the current draft is that there's no way to prioritize unrelated broadcasts when using client-side compositing. The prioritization is decided by the sender/source so it can be propagated through relays. The viewer could influence last mile delivery but that's not really done today. For example, a viewer watching two HLS broadcasts. They'll both equally compete for bandwidth; there's no way to tell broadcast A to back off first during congestion.

fluffy commented 1 year ago

What do need to update in the warp draft to capture the above discussion ?

kixelated commented 11 months ago

This is a generic problem with multiple connections, TCP, QUIC or otherwise.

There are things that could be done here, such as communicating a max sending rate to the peer, asking the peer to use a less than best effort congestion controller (ie: LEDBAT++), but I would only tackle that if we think this is a critical problem for MoQ to solve.

I think this issue is concerns muxing. If I have N input broadcasts/connections, can I combine them into 1 output broadcast/connection?

A previous version of the draft said "yes" if they're part of the same broadcast ID (bundle). That way you can use stream prioritization as QUIC will prioritize relative to the broadcast. However, this did really work with send_order since it's local to the encoder, and I think this issue was filed back then in response.

The current version of the draft keeps things vague. You can have any number of tracks for any number of broadcasts with any number of prioritization schemes... and results will vary.

I don't know if this issue has any concrete action items though. Maybe just close it?

ianswett commented 11 months ago

Thanks, I misunderstood this issue.

I'm happy to close it, but it sounds like it's definitely NotTransport, so I marked it as such for now.