moq-wg / moq-transport

draft-ietf-moq-transport
Other
72 stars 16 forks source link

Auto Bitrate Selection #44

Open kixelated opened 1 year ago

kixelated commented 1 year ago

We're unable to modify the encoder bitrate for 1:N distribution. The common approach to deal with this is to encode multiple renditions, each at a different bitrate/resolution, and choose one to serve to each viewer.

My plan is to treat each rendition as a separate track and use group_id to group renditions (#43). However, there's some particulars about who chooses the next track, and when that choice takes impact. Traditionally in HLS/DASH, the receiver chooses both.

WHO

One of the things that I've learned at Twitch is that client-side ABR just does not work for per frame delivery, which is necessary to minimize latency. The receiver lacks access to the sender's congestion controller and cannot measure the network bitrate when application-limited. We could have the sender frequently push the estimated bitrate to the receiver, but to be the most responsive the sender should choose the rendition to use.

We still need the ability for the receiver to choose any number of tracks manually. The user might manually select a rendition, or it might not support a rendition (based on resolution or profile), or the sender just don't know what language the user wants.

WHEN

The other thing I've learned from LL-HLS is that a constantly updating a playlist is a pain and can introduce latency. I would like something push based, where the sender just knows the next segment to push, instead of needing to constantly inform the receiver so it can select one.

Another one of my requirements is that renditions do not need to be aligned. This means renditions could have different GoP sizes, including an extremely frequent one like HESP. Just because the current track has an independently decodable segment does not mean that the requested track will also have one at the same timestamp.

It's not clear when to switch between tracks. Should it happen immediately, at the next I-frame, a previous I-frame, a specific timestamp, etc? How does the sender or receiver even know where these boundaries exist?

kixelated commented 1 year ago

Additionally, we probably need a way of specifying when the track switch should take place. This seems required for client-side ABR otherwise you could waste bandwidth or introduce gaps. It also seems required when higher latencies are desired, as the receiver needs a way of telling the sender to start playback x seconds in the last.

kixelated commented 1 year ago

Also worth mentioning that we sponsored an academic challenge. Throwing machine learning at the problem can help, but it's nowhere near as good as sender-side ABR.

acbegen commented 1 year ago

One of the things that I've learned at Twitch is that client-side ABR just does not work for per frame delivery, which is necessary to minimize latency. The receiver lacks access to the sender's congestion controller and cannot measure the network bitrate when application-limited. We could have the sender frequently push the estimated bitrate to the receiver, but to be the most responsive the sender should choose the rendition to use.

Clients can switch tracks only at the switching points, which are fragment boundaries in CMAF language. You cannot expect the client to get 4 frames from one track and the subsequent 3 frames from another track to smoothly decode those seven frames. So, per-frame delivery is not the right term here IMO. Even if you do sender-side ABR, it is not actually doing per-frame delivery.

Second, close to the switching points, the sender can convey its own estimation to the client and the client can still make a proper choice. There are tons of advantages why the client should pick what it wants to receive.

Third, obviously the sender has better knowledge of the currently available bandwidth but I don't agree with the statement that a client cannot make a measurement when the data is application-limited. Yes, it won't be perfect but we do have good methods to deal with this (not only as part of the challenge you mentioned but also thru other methods that are enabled to use because we are able to use QUIC rather than TCP now).

acbegen commented 1 year ago

Another one of my requirements is that renditions do not need to be aligned. This means renditions could have different GoP sizes, including an extremely frequent one like HESP. Just because the current track has an independently decodable segment does not mean that the requested track will also have one at the same timestamp.

GoP durations don't need to be the same but the segments (in CMAF) need to be aligned for seamless switching. Putting I frames at different frames for the same content would be illogical.

jordicenzano commented 1 year ago

Should we mention ABR in the draft, I missed some comments about it / implications.I think this could have big implications in the overall MOQT, for instance:

VMatrix1900 commented 1 year ago

Does MOQT draft prevent the server side ABR? Publisher are free to adjust the bitrate of any track, right?

acbegen commented 1 year ago

Does MOQT draft prevent the server side ABR? Publisher are free to adjust the bitrate of any track, right?

The sender can do whatever it wants to do with its tracks as long as the receivers are notified of the changes (one way or another) or the changes are within acceptable limits per the catalog/manifest. None of this removes the need for client-side rate adaptation which we still need for a variety of reasons.