kaltura / nginx-vod-module

NGINX-based MP4 Repackager
GNU Affero General Public License v3.0
2k stars 439 forks source link

Multi-Audio HLS manifest doesn't have any DEFAULT=YES audio and iOS/Safari picks wrong #1534

Closed benweidig closed 3 months ago

benweidig commented 4 months ago

Hello!

We use nginx-vod in mapped mode with multi-audio.

This json mapping is for a file with 5 audios, and we only want a1 (deu) and a2 (eng):

{
    "durations": [
        1677120
    ],
    "sequences": [
        {
            "clips": [
                {
                    "type": "source",
                    "clipFrom": 0,
                    "path": "/cdn/7100/7100-001-360.m4v",
                    "tracks": "v1-a1-a5"
                }
            ]
        },
        {
            "clips": [
                {
                    "type": "source",
                    "clipFrom": 0,
                    "path": "/cdn/7100/7100-001-270.m4v",
                    "tracks": "v1-a1-a5"
                }
            ]
        }
    ]
}

The HSL manifest contains both audios, but none is marked AUTOSELECT or DEFAULT (slightly reformatted for readability):

#EXTM3U

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio0",NAME="Deutsch",LANGUAGE="de",AUTOSELECT=NO,DEFAULT=NO,CHANNELS="2",URI="https://xxx/index-f1-a1.m3u8"

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio0",NAME="English",LANGUAGE="en",AUTOSELECT=NO,DEFAULT=NO,CHANNELS="2",URI="https://xxx/index-f1-a5.m3u8

#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1206335,RESOLUTION=640x360,FRAME-RATE=25.000,CODECS="avc1.64001e,mp4a.40.2",AUDIO="audio0"
https://xxx/index-f1-v1-a5.m3u8

#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=775359,RESOLUTION=480x270,FRAME-RATE=25.000,CODECS="avc1.640015,mp4a.40.2",AUDIO="audio0"
https://xxx/index-f2-v1-a5.m3u8

#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=253576,RESOLUTION=640x360,CODECS="avc1.64001e",URI="https://xxx/iframes-f1-v1-a5.m3u8"

#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=154341,RESOLUTION=480x270,CODECS="avc1.640015",URI="https://xxx/iframes-f2-v1-a5.m3u8"

Is there an option to declare an audio rendition as default?
A sequence can be marked default, but not the audio track in a sequence as far as I understand it

Is it correct that EXT-X-STREAM-INF and EXT-X-I-FRAME-STREAM-INF are only for ...-a5.m3u8?

I believe https://github.com/kaltura/nginx-vod-module/blob/26f06877b0f2a2336e59cda93a3de18d7b23a3e2/vod/hls/m3u8_builder.c#L1017-L1021 is the location that write DEFAULT=NO/YES but I have problems following the code.

In https://github.com/kaltura/nginx-vod-module/blob/26f06877b0f2a2336e59cda93a3de18d7b23a3e2/vod/media_set_parser.c#L901 the value of is_default is set to -1

For closed captions, the value is set in https://github.com/kaltura/nginx-vod-module/blob/26f06877b0f2a2336e59cda93a3de18d7b23a3e2/vod/media_set_parser.c#L884-L892 but I don't seem to find the code for other sequences.

Nevertheless, the value should be -1 in m3u8_builder.c so the first adaption set should be set to default. Wouldn't that mean that the first audio should be DEFAULT=YES?

Or am I going down the wrong rabbit hole?

benweidig commented 4 months ago

Additional information about the files

7100-001-360.m4v
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '7100-001-360.m4v':
  Metadata:
    major_brand     : M4V 
    minor_version   : 512
    compatible_brands: M4V isomiso2avc1
    encoder         : Lavf58.76.100
  Duration: 00:27:57.12, start: 0.000000, bitrate: 1750 kb/s
  Stream #0:0(deu): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 640x360 [SAR 1:1 DAR 16:9], 1075 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
  Stream #0:1(deu): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 130 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
  Stream #0:2(deu): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
  Stream #0:3(fra): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 132 kb/s
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
  Stream #0:4(ita): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 132 kb/s
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
  Stream #0:5(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 131 kb/s
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
7100-001-270.m4v
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '7100-001-270.m4v':
  Metadata:
    major_brand     : M4V 
    minor_version   : 512
    compatible_brands: M4V isomiso2avc1
    encoder         : Lavf58.76.100
  Duration: 00:27:57.12, start: 0.000000, bitrate: 1319 kb/s
  Stream #0:0(deu): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 480x270 [SAR 1:1 DAR 16:9], 644 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
  Stream #0:1(deu): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 130 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
  Stream #0:2(deu): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
  Stream #0:3(fra): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 132 kb/s
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
  Stream #0:4(ita): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 132 kb/s
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
  Stream #0:5(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 131 kb/s
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]

There are two German audio tracks per file, but only one is selected for the clip.

Safari on iOS and macOS picks the English track as first one played (on German systems / App only supporting German).

Android chooses German.

VLC has three audio tracks for some reasons, German, English, and an unnamed that's English again, which is picked for initial playback (although VLC was tested on a LANG=en_US.UTF-8 system)

Selection_013

Module was built with latest commit, but we also tried with e1a8f0fb45b49966623daff282363a0131e3eeea (before addition of default to sequence) and runs in OpenResty 1.21.4.2

erankor commented 3 months ago

I didn't understand why you didn't get any 'DEFAULT', the code returns DEFAULT=YES for the first EXT-X-MEDIA for a given type. But anyway, if you want to explicitly set is_default in the mapping JSON, you can simply specify the same file path twice - once with tracks=v1-a1 and once with a5. For example -

{
    "durations": [
        1677120
    ],
    "sequences": [
        {
            "clips": [
                {
                    "type": "source",
                    "clipFrom": 0,
                    "path": "/cdn/7100/7100-001-360.m4v",
                    "tracks": "v1-a1"
                }
            ]
        },
        {
            "clips": [
                {
                    "type": "source",
                    "clipFrom": 0,
                    "path": "/cdn/7100/7100-001-360.m4v",
                    "tracks": "a5",
                    "default": true
                }
            ]
        }
    ]
}
benweidig commented 3 months ago

Thanks for the quick response!

I've checked other multi-audio playlists, and none has any audio track as DEFAULT=YES, it's weird...

The problem might be that we use multi-resolution sequences with multi-clips that have multi-audio.

IIRC, we couldn't get it working in any other way than specifying the track per clip with multiple audio tracks and using one sequence per resolution.

We know it's a weird setup, and (most likely) not even fully supported, but it saves us a lot of storage space, and worked fine so far.

I'm trying to create a simpler setup for easier testing your example, as we generate the mapping JSON with Lua, and it needs quite specific file/folder layouts and depends on pre-generated media info from ffprobe...

benweidig commented 3 months ago

Just FYI, I ended up using OpenResty's body_filter_by_lua to set AUTOSELECT/DEFAULT to YES for the requested audio stream.

It was the quickest fix, but I might look into the actual underlying issue in our setup at another time.