Accept out-of-order segments

rkersche commented 6 years ago

In distributed encoding workflows, it’s natural, that segments arrive out of order. As the input is chunked, and the encoding is distributed over multiple nodes, the different chunks can take a different time to encode, e.g. depending on the complexity of the input. Additionally, a distributed instance that is encoding a segment may shut down unexpectedly so that encoding job would have to be restarted (e.g. spot / preempted instances). For these reasons, the encoding of segment n+1 may finish before segment n.

In addition, as the distributed nodes should not be synchronized and send the segment as fast as possible to the publishing endpoint so not to introduce additional latency by another process that is coordinating out of order segments, segment n+1 may be sent to the endpoint before segment n.

In section 6.2, point 4 states:

The fragment decode timestamps "tfdt" of fragments in the fragmentedMP4stream and the indexes base_mediadecode time SHOULD arrive in increasing order for each of the different tracks/streams that are ingested.

Our understanding is that means segments must be sent in-order, in that case, segment n must be sent before segment n+1. That causes a huge synchronization effort for encoding services and will also make the encoding process less efficient.

Our suggestion is to remove that part so that the segments can be sent out-of-order. Obviously, segment n+1 will only have decode timestamps higher than segment n.

Our proposal would be that the decode timestamp for each segment is used to impose a total order on the segments sent to the packager*. If the transcoder must send a discontinuous segment this is actively signaled via an http header indicating this segment is discontinuous in timestamps (this mechanism is likely desirable anyways).

*That is for each segment starting at time t with duration d the next segment must start at time t+d or signal a discontinuity. (Note that if the clock is specified as part of a galois field this still holds. Given a 2^33 90KHZ clock at time t_n-1 = 8589757592 d = 180000 the start of the next segment must be t_n-1 = (t_n-1 + d) % 2^33 = 90000).

unifiedstreaming commented 6 years ago

The encoding of segment n+1 may finish before segment n makes sense in a VOD workflow, but for Live it's problematic.

Could you provide insight into why with a live stream out of order may occur? For instance , a live signal implies sequential processing, as signal is arriving from a camera (or feed), there is no n+1 (as that is the future).

The reason the spec says SHOULD is that out of order introduces discontinuities which when a player would request would lead to 404's which the CDN will cache so a player will never be updated, even if the fragment does arrive.

It also spells trouble for low latency, there is no room to 'miss' chunks.

rkersche commented 6 years ago

The live stream ingested in our encoder, will be chunked on the master node, e.g. in 4 seconds segments. These segments will then be processed by the worker nodes. As the complexity of a segment may vary, it happens that segment n+1 will be finished before segment n. In addition, the failure of a node processing a chunk (e.g. when using spot instances) may trigger a retry of that chunk. That chunk retry will often mean a chunk will be completed after subsequent chunks. Thus segment n will be pushed to the Unified ingest after segment n+1.

So the “out-of-order” situation doesn’t come from the ingest into the encoder (e.g. camera feed) but from the distributed encoding and the different complexity/encoding time of each individual segment. For many, many broadcasters quality and reliability of the transcode is more important than being close to the live point. Restricting use cases which trade live point latency for performance seems counterproductive to us.

unifiedstreaming commented 6 years ago

latency is becoming more an important issue for broadcasters (see low latency dash), out of order posting will result in time line discontinuities resulting in 404 that will also be cached in the CDN, also it will lead to increased latency, this is why this is not recommended in the specification. If there are cases were latency is not an issue, you can apply a reorder process and post the segments in order to the ingest point aswell solving the issue aswell

ioreper commented 5 years ago

Negative error caching configuration and origin retry logic is handled by the CDNs so it's expected that any tuning/optimization for specific use cases, like low latency live linear, will be done by the service provider.

Segments posted to the origin from an encoder can arrive out of order for many reasons, including distributed encoding but also error/failure scenarios which are unavoidable (e.g. temporary network issues or pre-empted compute instances for example).

The requirement itself acknowledges this as it reads that the segments SHOULD arrive out of order. This means that there may exist valid reasons in particular circumstances to ignore this requirement while understanding the potential implications of doing so.

Is that valid or are you saying that this is a MUST requirement meaning that it is absolutely required in all cases? If it is indeed a SHOULD requirement, then I think we've discussed and understood both perspectives on this issue.

RufaelDev commented 5 years ago

this issue was discussed in the call on october 3rd https://github.com/unifiedstreaming/fmp4-ingest/blob/master/conference-calls/media-ingest-call-october-3.txt
discussion on the media track requirement 4. that fragments should arrive with increasing basemediadecodetime in the tdft, solution proposed:

introduce encoder delay buffering to push segments in order arguments
gap is cached as 404 in CDN resulting in many erroneous requests
should condition is required for conformance
The 404 is problematic as it will be cached in the CDN giving eronous responses to all clients requesting the segment. Will Law confirms that this is still the case even in akamai, but that new logic is under investigation to solve this issue in the future, it is not available. Akamai recommends that the encoder produces and posts the segments in order.
out of order delivery or making segments available out of order to a client is not correct behavior in dash.
breaks the simplicity of the protocol (MS) in the current protocol segments can be fetched and storing them will always result in a valid fmp4 file, which would not be the case with out of order arrival. This would break archival features of the protocol.
Another argument is that at the encoder you have more context on the nature of the out of order that can be exploited, for example a timed window. The media processing entitity can keep a buffer/time window but does not know what the maximum out of order is. As the encoder has more knowledge this would be the best place to fix the ordering instead of further downstream in the media processing, CDN or player. The out of order will potentially also confuse clients. For an origin it is strange as it would need to retrospectively fix a dynamic mpd.
in DASH sequential order of segments in manifest is requirement, low latency would easily fail. 404 would get cached and many clients would try to resolve the Base URLS as a failover behavior which is undesirable. Hence the argument taht this can be easily solved in the player is not good

Therefore, in this regard the ingest spec is accurate and for conformance this test should be passed.

unifiedstreaming / fmp4-ingest

Accept out-of-order segments #11