Reducing latency caused by long-segments

Abstract

This is a proposal to reduce latency caused by extra-long segments caused by variable GOP sizes in bad network conditions. We can force a fixed small segment-size during segmenting the RTMP stream and maintain a buffer of segments since the last seen Keyframe on the Broadcaster node.

Motivation

Many end-user broadcast encoders (OBS/x264) have variable keyframe-intervals by default, which is usually not a problem, but under bad network conditions it can result in extremely long GOP sizes (>= 8s). This is problematic as we segment the incoming RTMP stream on Keyframe boundaries to make each segment individually decodeable.

We already support setting a fixed output GOP size by setting a configuration option (see https://github.com/livepeer/lpms/pull/198) - it does improve starting playback from any point in the output renditions - but we might still have extremely long segments travelling everywhere on the network, adding latency.

Proposed Solution

A simple initial fix to decrease the latency caused by long-GOPs is as follows:

Segmenter (Mist or LPMS) uses a fixed segment-size (eg 2s) regardless of keyframe position in the RTMP stream.
Segmenter also marks each segment with bool hasKeyframe if it contained a keyframe or not.
B receives each segment (marked as keyframe/non-keyframe) through the HTTP Push API.
B maintains a buffer of all segments since the last-seen segment with hasKeyframe == true. NB: In the 8s GOP example the max buffer length is ~4 segments
HappyPath: B sends segments one-by-one to the O. As segments are guaranteed to have a small fixed-size, we don't experience any buffering for long-GOPs.
FailoverPath: B has to switch to a different O mid-stream - it replays all the segments in the buffer to the new O. NB: ~1-2 segments unless the failover happened during a long-GOP

This should solve most of the latency caused by long GOPs. We might face trouble in reducing latency only if two independent rare events happen together - Long GOP x O Failover. IMO this will reduce most of the latency seen right now due to long-GOP lengths.

Future Consideration: In case we still see the rare-case occuring often enough we can iterate on this in a few possible ways -

When the B switches over to an O it might have already received txcoded renditions of a few initial segments in it's buffer. The new O only needs those segments to bootstrap its decoder and doesn't actually need to perform a full filter + transcode. The B can communicate this to the new O on failover easily by looking at its buffer and successfully returned renditions.

The B may also choose not to replay its buffer completely in case it has already received a segment with the next keyframe. Instead it may start sending from that segment, losing a few frames in process.

Alternatively, the Transcoder node can be improved by looking into the possibility of bootstrapping decoder using P/B frames only (it is theoretically possible but might need to use experimental decoder-implementation-specific features).

Alternatives

We can also consider this point to explore a more continouous approach where the O/T node processes the segment even while it's downloading i.e. a continous streaming-decoding workflow.

The simplest way to achieve this is by piping the incoming segment's file via stdin to the (ffmpeg) decoder. To significantly reduce latency here, we will still need to break a long input segment into multiple output segments around the new GOP-size, and return them to the B as they are ready on-the-fly.

This approach is quite intrusive wrt the low-level transcoding Cgo code and will also need significant changes in the node-to-node network calls to support one-input-many-output segments.

But in general, exploring a streaming workflow for go-livepeer v2 can be prioritized higher as that is the only way forward for ultra low-latency streaming.

livepeer / go-livepeer