kaltura / nginx-vod-module

NGINX-based MP4 Repackager
GNU Affero General Public License v3.0
2k stars 439 forks source link

Option to remove unmuxed audio from HLS playlist #1039

Open gryphon opened 5 years ago

gryphon commented 5 years ago

We have a number of files in adoption set, each comes with onr video and one audio. Each audio is adopted to video by bitrate. We have vod_hls_force_unmuxed_segments options set to off

The problem is that Kaltura adds separate audio tracks to master playlist even if we have audio in our video: https://vod.silatv.ru/videos/3431_43046/master.m3u8

And also Kaltura sets AUTOSELECT for the first audio in the list. And it always sets "audio0" group for all audios included.

This leads to auto-selecting the first (in our case the worst) audio track, even if main chunklist has audio (v1-a1).

What to do if we do not need separate audio tracks?

erankor commented 5 years ago

I can't download the manifest, getting 403... But from your description, the cause is most likely that you have multiple audio codecs (e.g. AAC & AC-3). In that case, the module outputs each codec as a separate audio group, in order to enable the player to choose a group that it can play. For example, there are many devices that support AAC but not AC-3, such a manifest enables devices that support AC-3 to use this codec, and ones that do not will fall back to AAC.

gryphon commented 5 years ago

Here is the similar example without 403:

https://zhopa.gcdn.co/videos/676_spWibBX6isNTtjzm/master.m3u8

No we have just one audio codec (aac), just the different bitrate. Here we got 3 separate audio tracks, 2 of them are absolutelly equal.

erankor commented 5 years ago

Please paste the mediainfo of one of the MP4's + nginx.conf. I may need the MP4's themselves to reproduce it, but let's start with this...

erankor commented 5 years ago

Ah, and also the mapping JSON...

gryphon commented 5 years ago

There is nothing special regarding setup or files. Please check:

nginx.conf:

http {

  vod_base_url "";
  vod_segments_base_url "";

  vod_mode                           mapped;
  vod_upstream_location /storage_api_proxy;
  vod_remote_upstream_location /storage_proxy/;

  #vod_metadata_cache                 metadata_cache 16m;
  #vod_response_cache                 response_cache 2048m;

  vod_max_mapping_response_size      10K;
  vod_metadata_cache                 metadata_cache 2000m;
  vod_response_cache                 response_cache 128m 300s;
  vod_mapping_cache                  mapping_cache 24m 10s;
  vod_initial_read_size              1m;
  vod_cache_buffer_size              10m;

  vod_hls_force_unmuxed_segments     off;

  vod_force_sequence_index on;

  vod_last_modified_types            *;

  vod_segment_duration               6000;
  vod_align_segments_to_key_frames   on;
  vod_dash_fragment_file_name_prefix "segment";
  vod_hls_segment_file_name_prefix   "segment";

  vod_manifest_segment_durations_mode accurate;

  vod_expires 10m;
  vod_expires_live 2s;
  vod_expires_live_time_dependent 1s;

  open_file_cache          max=1000 inactive=5m;
  open_file_cache_valid    2m;
  open_file_cache_min_uses 1;
  open_file_cache_errors   on;

  vod_open_file_thread_pool default;

  vod_hls_mpegts_align_frames off;
  vod_hls_mpegts_interleave_frames on;

  vod_performance_counters vodperf;

  aio on;

  server {
        ....

    location /videos/ {
        vod hls;
        alias /opt/static/videos/;
        add_header Access-Control-Allow-Headers '*';
        add_header Access-Control-Allow-Origin '*';
        add_header Access-Control-Allow-Methods 'GET, HEAD, OPTIONS';

        #vod_bootstrap_segment_durations 2000;
        #vod_bootstrap_segment_durations 2000;
        #vod_bootstrap_segment_durations 2000;
        vod_segment_duration 6000;

    }

    location /playlists/ {
        vod hls;
        alias /opt/static/videos/;
        add_header Access-Control-Allow-Headers '*';
        add_header Access-Control-Allow-Origin '*';
        add_header Access-Control-Allow-Methods 'GET, HEAD, OPTIONS';
    }

    location /storage_proxy {
      internal;
      rewrite /storage_proxy/(.*) $1 break;
      resolver 10.254.0.140 10.254.0.141;
      proxy_pass $1;
    }

    location /storage_api_proxy/videos {

      rewrite /storage_api_proxy/videos/(.*) /api/videos/$1/kaltura.json  break;
      proxy_pass http://admin:3001;

      proxy_set_header content-type "application/json";
    }
}

mapping response:

{"sequences":[{"label":"360","bitrate":{"v":400000,"a":64000},"id":"360p","clips":[{"type":"source","path":"..."}]},{"label":"480","bitrate":{"v":800000,"a":128000},"id":"480p","clips":[{"type":"source","path":"..."}]},{"label":"720","bitrate":{"v":2000000,"a":128000},"id":"720p","clips":[{"type":"source","path":"..."}]}]}

ffmpeg output of 3 files:

first

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.12.100 Duration: 00:02:11.75, start: 0.000000, bitrate: 872 kb/s Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 854x480 [SAR 32880:32879 DAR 137:77], 739 kb/s, SAR 32767:32766 DAR 990519:556715, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default) Metadata: handler_name : SoundHandler

second

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '2.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.12.100 Duration: 00:02:11.75, start: 0.000000, bitrate: 447 kb/s Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 640x360 [SAR 1:1 DAR 16:9], 378 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 64 kb/s (default) Metadata: handler_name : SoundHandler

third

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '3.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.12.100 Duration: 00:02:11.75, start: 0.000000, bitrate: 2021 kb/s Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1889 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default) Metadata: handler_name : SoundHandler

erankor commented 5 years ago

Just to make sure - you're running latest master without any local changes?

gryphon commented 5 years ago

Yes:

ENV NGINX_VERSION 1.16.0 ENV VOD_MODULE_VERSION master

RUN curl -sL https://nginx.org/download/nginx-${NGINX_VERSION}.tar.gz | tar -C /nginx --strip 1 -xz

RUN curl -sL https://github.com/kaltura/nginx-vod-module/archive/${VOD_MODULE_VERSION}.tar.gz | tar -C /nginx-vod-module --strip 1 -xz

erankor commented 5 years ago

Ok, I see why, I missed it before... you are setting label in the JSON. The module sees there are multiple audio tracks with different labels, and assumes you want to provide the user the option to choose between them. Label is currently used only for audio & subtitles, so setting the label to the video resolution is quite meaningless. If you remove the label from the JSON, you won't get the EXT-X-MEDIA tags.

gryphon commented 5 years ago

Okay thank you, I will try. We use this field to show it in video player to let user change video quality in friendly way (ie "UHD" instead 1080).

mlevkov commented 5 years ago

@erankor what would be the option to create a group of audio in hls and in dash. Similarly to @gryphon , we have coupled audio/video segments. However, I'd be interested to just select audio from one of the video files to a separate group, and carry that as a separate payload, thus reducing the need to carry audio/video payload for every segment to effectively reduce the payload size, as well as allowing users to have a selectable audio in case of multiple languages.

Additionally, we have noticed a problem that in remote mode the vod_manifest_segment_durations_mode accurate as the case in this configuration from above, the mapped mode does not create segments that are adhering to the GOP location. As such, the audio will drift over time and with enough "repeated" playback attempts is effectively out-of-sync.

For example, the configuration above has the following elements: vod_segment_duration 6000; vod_align_segments_to_key_frames on; vod_manifest_segment_durations_mode accurate

however, the underlying manifest subvariant (https://zhopa.gcdn.co/videos/676_spWibBX6isNTtjzm/index-s360p-v1-a1.m3u8) produces:

#EXTM3U
#EXT-X-TARGETDURATION:10
#EXT-X-ALLOW-CACHE:YES
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-VERSION:3
#EXT-X-MEDIA-SEQUENCE:1
#EXTINF:10.000,
segment-1-s360p-v1-a1.ts
#EXTINF:10.000,
segment-2-s360p-v1-a1.ts
#EXTINF:10.000,
segment-4-s360p-v1-a1.ts
#EXTINF:10.000,
segment-6-s360p-v1-a1.ts
#EXTINF:10.000,
segment-7-s360p-v1-a1.ts
#EXTINF:10.000,
segment-9-s360p-v1-a1.ts
#EXTINF:10.000,
segment-11-s360p-v1-a1.ts
#EXTINF:10.000,
segment-12-s360p-v1-a1.ts
#EXTINF:10.000,
segment-14-s360p-v1-a1.ts
#EXTINF:10.000,
segment-16-s360p-v1-a1.ts
#EXTINF:10.000,
segment-17-s360p-v1-a1.ts
#EXTINF:10.000,
segment-19-s360p-v1-a1.ts
#EXTINF:10.000,
segment-21-s360p-v1-a1.ts
#EXTINF:1.720,
segment-22-s360p-v1-a1.ts
#EXT-X-ENDLIST

Coincidentally, I have experienced the same thing. Not certain how to address such case and looking for your guidance. As a side point, the remote mode under the same circumstances does produce GOP accurate results.

erankor commented 5 years ago

Not sure what you mean, if you use mapped mode, you can select the audio of some MP4, and assign it some label/language. If you end up with multiple labels, the module will output them as separate EXT-X-MEDIA. If what you meant is to force the creation of multiple groups, the answer is that the module does not provide control over it. The module creates a separate audio group per codec (e.g. AAC vs. AC3), it will not create multiple groups for AAC, for example. Regarding the issue with the segment durations - the conf section you pasted has duration=6 while the manifest has segments of 10 seconds, so I think there may be some confusion here... maybe nginx refused to reload due to some error in the conf or something like that.

On a different subject - did you my comment about S3 authentication (https://github.com/kaltura/nginx-vod-module/issues/717#issuecomment-508566778)? didn't get any feedback...

mlevkov commented 5 years ago

@erankor, Thank you for the link, I've commented.

On the point of key positioning. What I was trying to say is that the mapped mode, for some reason, does not make segments as accurately as in remote mode. It simply slices then at time position that was indicated in the target duration, whereas in the remote mode it tries to make then as close to keyframe as possible, even though both modes are in the same config, of course in different locations. I'm not sure what makes the behavior of mapped mode different from remote mode (a bug? or a feature?). Do I need to signal additional details for the mapped mode to advise of the keyframe positions with such option as "keyFrameDurations"?

erankor commented 5 years ago

When the map result is a single MP4, it should behave exactly the same as remote. However, if the map response contains the durations element (indicating a playlist), the module falls back to returning an estimate. The reason is that in order to return accurate segment durations for a playlist, the module would need to load a potentially high number of MP4s, and therefore I decided not to implement it. If you are using a playlist, and you set keyFrameDurations in the JSON, the module will use it when it calculates the durations of the segments.

mlevkov commented 5 years ago

Hm, interesting point. Allow me to think about your answer a bit more. Also, I've tried to implement the keyFrameDurations, and I must have something done incorrectly. Do you have an example that incorporates keyFrameDurations in JSON? Do I pair it with a dynamic clip in order to enable use of the keyFrameDurations?

erankor commented 5 years ago

We are currently using it only in live, here is a sample -

{
    "durations": [56900],
    "playlistType": "live",
    "firstClipTime": 1569238625240,
    "segmentBaseTime": 946684800000,
    "discontinuity": true,
    "clipTimes": [1569238625240],
    "sequences": [{
        "id": "32",
        "clips": [{
            "type": "mixFilter",
            "sources": [{
                "durations": [8266,
                7000,
                8334,
                8333,
                8300,
                8333,
                8334],
                "paths": ["07/xxx-1.mp4",
                "07/xxx-2.mp4",
                "07/xxx-3.mp4",
                "07/xxx-4.mp4",
                "07/xxx-5.mp4",
                "07/xxx-6.mp4",
                "07/xxx-7.mp4"],
                "type": "concat",
                "tracks": "v1",
                "basePath": "/web/content/kLive/live/z/yyy/32/",
            },
            {
                "durations": [8290,
                7036,
                8359,
                8289,
                8290,
                8359,
                8360],
                "paths": ["07/xxx-1.mp4",
                "07/xxx-2.mp4",
                "07/xxx-3.mp4",
                "07/xxx-4.mp4",
                "07/xxx-5.mp4",
                "07/xxx-6.mp4",
                "07/xxx-7.mp4"],
                "type": "concat",
                "tracks": "a1",
                "basePath": "/web/content/kLive/live/z/yyy/32/",
            }],
            "keyFrameDurations": [8266,
            7000,
            8334,
            8333,
            8300,
            8333],
            "firstKeyFrameOffset": 0
        }],
        "bitrate": {
            "v": 2560000,
            "a": 95000
        }
    }],
    "initialClipIndex": 1,
    "firstClipStartOffset": 0,
    "presentationEndTime": 1884598665648,
    "liveWindowDuration": 150000,
    "expirationTime": 1569238716566
}
mlevkov commented 5 years ago

Huge Thank You! This is exactly what I was looking for.

mlevkov commented 5 years ago

@erankor Thank you for providing the sample. I've tried to formulate a simple case where I can take the same clip and just insert "discontinuity" by providing "slice" sections. However, it does not appear that I'm formulating the structure correctly, thus failing the parsing process. I'm effectively trying to make sure that I have an accurate ad-break position while allowing "segments" to be segmented at the targeted GOP boundaries. Here is the link to the playlist -> https://gist.github.com/mlevkov/823945007954871a8c799543a4a163eb.

erankor commented 5 years ago

From a quick look at this -

  1. the top level durations array should have multiple elements, the internal durations are ignored
  2. clips should be an array of objects
  3. mixFilter is used for mixing several audio streams together / mixing audio + video - the elements in its sources array will play at the same time, not one after the other
flavioribeiro commented 5 years ago

Just to add to this, I'm not using the keyFrameDurations feature yet and thinking of implementing the logic to find the right spot to split the files into the sidecar. Currently, that's a response payload we can get if we pass timestamps as a query string ?breaks=[timestamps] to our gcs_helper fork:

$ http http://gcs_helper/map/encodes/ea9b4c8885554d1b/COLBERT?breaks=00:04,00:20 Response: https://gist.github.com/flavioribeiro/969f4bbc586020629e7888ebd82838d3

the last item on the durations array is the output of mediainfo minus all the ranges passed, ensuring that we'll play the video until the end.

mlevkov commented 5 years ago

From a quick look at this -

1. the top level `durations` array should have multiple elements, the internal `durations` are ignored

2. `clips` should be an array of objects

3. `mixFilter` is used for mixing several audio streams together / mixing audio + video - the elements in its sources array will play at the same time, not one after the other

I've done that with the following (https://gist.github.com/mlevkov/a1257858e4b265564d4d132b05cf8a52) - works as you described, however, it does not have the accurate GOP allocations. How and where do I stick the keyFrameDurations since the one that I've sent noted earlier does not work? Here are the GOP durations with the target of 2 sec -> (https://gist.github.com/mlevkov/a1257858e4b265564d4d132b05cf8a52#file-keyframedurations)

@erankor A more fundamental question, does "keyFrameDurations" work in a "vod" setting?