shaka-project / shaka-packager

A media packaging and development framework for VOD and Live DASH and HLS applications, supporting Common Encryption for Widevine and other DRM Systems.
https://shaka-project.github.io/shaka-packager/
Other
1.9k stars 496 forks source link

Shaka-packager's subtitle output and timing is broken for real-time flows when the input has "gaps/periods" without subtitle streams. #1401

Open Brainiarc7 opened 1 month ago

Brainiarc7 commented 1 month ago

System info

Operating System: Ubuntu 22.04LTS Shaka Packager Version: git master, release 3.1.0 and 3.2.0.

Issue and steps to reproduce the problem

As stated above, with the following addition: When the subtitle track is recovered with content, the packager generates all pending subtitle segments, which breaks playback.

Packager Command:

A trivial example demonstrating the issue: Registering withinotify on the order of creation of each subtitles file, a GAP appears between 12:24:48 and 12:25:05, and then 3 subtitle files are dumped to the filesystem:

inotifywait --format '%w%f %T' --timefmt '%Y-%m-%d %H:%M:%S' -e create -m /opt/data/streams/ingest-content/tvc-dash/ | while read dir file event; do if [[ "$file" =~ $ ]]; then echo "Event: $dir$file $event " | grep webvtt;     fi; done
Setting up watches.
Watches established.
Event: /opt/data/streams/ingest-content/tvc-dash/_live_webvtt_1714467460-24442560000.m4s2024-05-03 12:24:24 
Event: /opt/data/streams/ingest-content/tvc-dash/_live_webvtt_1714467460-24443100000.m4s2024-05-03 12:24:30 
Event: /opt/data/streams/ingest-content/tvc-dash/_live_webvtt_1714467460-24443640000.m4s2024-05-03 12:24:39 
Event: /opt/data/streams/ingest-content/tvc-dash/_live_webvtt_1714467460-24444180000.m4s2024-05-03 12:24:41 
Event: /opt/data/streams/ingest-content/tvc-dash/_live_webvtt_1714467460-24444720000.m4s2024-05-03 12:24:48 
Event: /opt/data/streams/ingest-content/tvc-dash/_live_webvtt_1714467460-24445260000.m4s2024-05-03 12:25:05 
Event: /opt/data/streams/ingest-content/tvc-dash/_live_webvtt_1714467460-24445800000.m4s2024-05-03 12:25:05 
Event: /opt/data/streams/ingest-content/tvc-dash/_live_webvtt_1714467460-24446340000.m4s2024-05-03 12:25:06 
Event: /opt/data/streams/ingest-content/tvc-dash/_live_webvtt_1714467460-24446880000.m4s2024-05-03 12:25:10 
Event: /opt/data/streams/ingest-content/tvc-dash/_live_webvtt_1714467460-24447420000.m4s2024-05-03 12:25:17 
Event: /opt/data/streams/ingest-content/tvc-dash/_live_webvtt_1714467460-24447960000.m4s2024-05-03 12:25:24 
Event: /opt/data/streams/ingest-content/tvc-dash/_live_webvtt_1714467460-24448500000.m4s2024-05-03 12:25:31 
Event: /opt/data/streams/ingest-content/tvc-dash/_live_webvtt_1714467460-24449040000.m4s2024-05-03 12:25:33 

This issue appears in webvtt,webvtt+mp4 and ttml+mp4 based outputs.

What is the expected result?

Shaka packager's subtitle handling should ideally generate empty output segments without some input data OR a subtitle heart beat, and likewise proceed to generate valid timed subtitle segments on resumption. That would resolve what we're experiencing here.

What happens instead?

It is observed that Shaka-packager does not generate subtitles files during a period without content in the input subtitles track. Once the input subtitles track recovers content, shaka-packager generates all the pending files, and this breaks playback for live content.

Related: #1355 and #1254

tobbee commented 1 month ago

@cosmin I'm trying to look solve this problem of text segment generation when there is no output. It seems that you are working quite a lot on improving Shaka-packager so maybe you have some thoughts about the best approach?

I think that the core problem is that all text segment generation is triggered by cue or text samples.

For example, looking at the OutputsEmptySegments unit test there should be three output segments:

There are then events simulating text samples being read. The first one happens at 50ms and stores a text sample. The second one happens at 250ms, and it is only at this time that segment 0 and the empty segment 1 is generated. The segments are generated by a loop in TextChunker::OnTextSample that loops over all old segments and generate them with appropriate segment timing.

If text sample number 2 would have been even later, segment 0 and all intermediate empty segment would be generated even later.

Please correct me if I have misunderstood something.

I see three type of possible remedies (for the case of an incoming MPEG TS stream with teletext or DVB subtitles)

  1. Rely on some incoming data on the text PID with PTS timestamps, so that it is possible to detect time progress in the teletext parser and generate empty cues/samples
  2. Let the video track trigger text segment generation if there is no data (possibly with some delay)
  3. Introduce a timer in the text_chunker that triggers empty chunks if there are no data for some finite time like 500ms

Do you have a take on the preferred method, or maybe some other way of tackling this problem?

cosmin commented 1 month ago

I think some heartbeat approach to generate and pass through empty cues from the input to the output to indicate absence of text seems right.

As to how generate these heartbeat cues, to the extent it's possible when parsing the input to know that there is no text data then that would be ideal. When text and audio/video would arrive separately then having some configurable buffering delay for how long to wait for text after video or audio is received would be good.

So at a high level #1 where possible otherwise #2, but probably not #3.

tobbee commented 1 month ago

@cosmin Thanks for your quick response. I just checked a live TS stream that @Brainiarc7's team suggested as test source and it has PES packets with PTS timestamp every 40ms on the teletext PID.

With those triggers I'll try to enhance the teletext parser to generate a zero-duration TextSample every 500ms starting when a TextSamples has ended, but no new one has started. By setting the duration to zero, I think it should be fairly easy to update the TextChunker::OnTextSample method to only write chunks whose end time has passed but NOT add the TextSample itself as a non-rendered sample.

One could implement alternative 2 as a fallback, but I leave that for later.