livepeer / lpms

Livepeer media server
MIT License
285 stars 72 forks source link

Fix occassional DTS overlap #423

Open j0sh opened 2 months ago

j0sh commented 2 months ago

Fix an occassional DTS overlap by closing the filtergraph after each segment and re-creating it at the beginning of each segment, instead of attempting to persist the filtergraph in between segments.

This overlap occurred mostly when flip-flopping segments between transcoders, or processing non-consecutive segments within a single transcoder. This was due to drift in adjusting input timestamps to match the fps filter's expectation of mostly consecutive timestamps while adjusting output timestamps to remove accumulated delay from the filter.

There is roughly a 1% performance hit on my machine from re-creating the filtergraph.

Because we are now resetting the filter after each segment, we can remove a good chunk of the special-cased timestamp handling code before and after the filtergraph since we no longer need to handle discontinuities between segments.

However, we do need to keep some filter flushing logic in order to accommodate low-fps or low-frame content.

This does change our outputs, usually by one fewer frame. Sometimes we seem to produce an additional frame - it is unclear why. However, as the test cases note, this actually clears up a number of long-standing oddities around the expected frame count, so it should be seen as an improvement.


It is important to note that while this fixes DTS overlap in a (rather unpredictable) general case, there is another overlap bug in one very specific case.

These are the conditions for bug:

  1. First and second segments of the stream are being processed. This could be the same transcoder or different ones.

  2. The first segment starts at or near zero pts

  3. mpegts is the output format

  4. B-frames are being used

What happens is we may see DTS < PTS for the very first frames in the very first segment, potentially starting with PTS = 0, DTS < 0. This is expected for B-frames.

However, if mpegts is in use, it cannot take negative timestamps. To accompdate negative DTS, the muxer will set PTS = -DTS, DTS = 0 and delay (offset) the rest of the packets in the segment accordingly.

Unfortunately, subsequent transcodes will not know about this delay! This typically leads to an overlap between the first and second segments (but segments after that would be fine).

The normal way to fix this would be to add a constant delay to all segments - ffmpeg adds 1.4s to mpegts by default.

However, introducing a delay right now feels a little odd since we don't really offer any other knobs to control the timestamp (re-transcodes would accumulate the delay) and there is some concern about falling out of sync with the source segment since we have historically tried to make timestamps follow the source as closely as possible.

So we're leaving this particular bug as-is for now. There is some commented-out code that adds this delay in case we feel that we would need it in the future.

Note that FFmpeg CLI also has the exact same problem when the muxer delay is removed, so this is not a LPMS-specific issue. This is exercised in the test cases.

Example of non-monotonic DTS after encoding and after muxing:

Segment.Frame Encoder DTS Encoder PTS Muxer DTS Muxer PTS
1.1 -20 0 0 20
1.2 -10 10 10 30
1.3 0 20 20 40
1.4 10 30 30 50
2.1 20 40 20 40
2.2 30 50 30 50
2.3 40 60 40 60