Support multi-variant HLS streams

olafal0 commented 1 month ago

Fixes https://github.com/livekit/ingress/issues/310. Allows multi-variant HLS streams to work for URL ingresses. This fixes two issues:

1.decodebin3 handles variant selection automatically, but sends the notify::caps signal when the source resolution changes. This caused the pipeline to attempt to create a new video output bin and link it to the input bin's video src pad, which fails. Then, the whole pipeline fails.

WebRTCSink.AddTrack creates a capsfilter for each layer with caps: fmt.Sprintf("video/x-raw,width=%d,height=%d", layer.Width, layer.Height). The width and height used for these layers come from the video's initial resolution, which can be very low if it's a low-bitrate variant. So, even if a higher-resolution variant is selected, it will be scaled back down to whatever it was at first.

Changes:

In Pipelines, atomically track whether video and audio output bins have already been created. If they have, skip adding and linking them.
In WebRTCSinks, use the video encoding option's high layer resolution instead of the source resolution, if the video encoding option's width or height is greater than source width or height.

I've tested with this HLS file: https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8 (Note that this may still fail on main, since it contains subtitle tracks, and fails with unsupported mime type (application/x-subtitle-vtt) for the source media. As a workaround, adding application/x-subtitle-vtt to supportedMimeTypes in pkg/media/urlpull/source.go fixes this, and variant selection works correctly.)

CLAassistant commented 1 month ago

All committers have signed the CLA.

biglittlebigben commented 1 month ago

Thanks for submitting this. Glad to see there is a way to make this work with decodebin3. The added logic to upscale the video to the largest layer would however break an existing functionality where we drop layers that are bigger than the source, and match the biggest layer size to the source if smaller. You can see this logic in the filterAndSortLayersByQuality function.

It is important that this functionality is not lost in most cases as upscaling is wasteful, and can lead to degraded quality, by decreasing the amount of bits available per macro bloc to encode no extra details.

Does gstreamer with decodebin3 provide any way to get the expected list of variants from the manifest, anywhere in the pipeline? If not, we may want to ensure that the upscaling code is only triggered on multivariant sources.

olafal0 commented 1 month ago

Ah, makes sense. Unfortunately I didn't find a way to access the list of variants—the manifest is obviously being parsed, and I can see them in the gstreamer logs, but STREAM_COLLECTION messages didn't contain other variants in my tests. Same with the decodebin3 select-stream signal.

We could potentially recalculate layer sizes and change the caps property of the capsfilter when decodebin3 changes variants, since we definitely have that information. Then layer sizing can remain the same, just with updates when the source resolution changes. I'm not sure how the downstream elements will handle that, but I'll try it out.

olafal0 commented 1 month ago

Update: changing the caps on the capsfilter does work, and avoids upscaling. This does introduce a separate issue, however: we can't discard layers when the source is too small, since those layers need to exist for use later. As an example: an HLS stream is started, and defaults to 320x180. We then use the layers:

LOW: 320x180
MEDIUM: 320x180
HIGH: 320x180

Later, hlsdemux2 selects a higher-resolution stream, the queue in the video output bin is notified of the new caps, we recalculate layer sizes, and then change the caps of the capsfilter. Now, the layers are:

LOW: 480x270
MEDIUM: 980x540
HIGH: 1280x720

The downside is, of course, that if the video remains 320x180, then we're pushing 3 duplicate streams for the lifetime of the input.

It would be best if we could skip creating lower layers when they're duplicates, and then create them when needed. I'll look into that next. Maybe we could block output bins that aren't needed yet?

biglittlebigben commented 1 month ago

Update: changing the caps on the capsfilter does work, and avoids upscaling. This does introduce a separate issue, however: we can't discard layers when the source is too small, since those layers need to exist for use later. As an example: an HLS stream is started, and defaults to 320x180. We then use the layers:
LOW: 320x180
MEDIUM: 320x180
HIGH: 320x180
Later, hlsdemux2 selects a higher-resolution stream, the queue in the video output bin is notified of the new caps, we recalculate layer sizes, and then change the caps of the capsfilter. Now, the layers are:
LOW: 480x270
MEDIUM: 980x540
HIGH: 1280x720
The downside is, of course, that if the video remains 320x180, then we're pushing 3 duplicate streams for the lifetime of the input.

It would be best if we could skip creating lower layers when they're duplicates, and then create them when needed. I'll look into that next. Maybe we could block output bins that aren't needed yet?

Thanks for looking into this further. The livekit protocol doesn't allow changing the layers after initial publication. However, it is possible to:

Send video smaller than the nominal layer size (with the caveat that some stream level APIs will return wrong dimensions)
Pause sending media on the largest layers. The SFU should deal with properly provided there is media coming on the smaller layers.

So, indeed, one approach would be to block the output of the layers that are duplicates of the smaller ones, and change the dimensions of layers dynamically as needed.

I'm also curious: what is the behavior of the x264enc gstreamer module when the caps change on its sink pad? The underlying x264 encoding library doesn't support changing video size on the fly. Does the gstreamer module recreate an encoder context as needed on caps change?

livekit / ingress

Support multi-variant HLS streams #311