unifiedstreaming / fmp4-ingest

Repository on shared work on developing a fragmented MPEG-4 ingest specification
32 stars 11 forks source link

Accept out-of-order segments #11

Open rkersche opened 6 years ago

rkersche commented 6 years ago

In distributed encoding workflows, it’s natural, that segments arrive out of order. As the input is chunked, and the encoding is distributed over multiple nodes, the different chunks can take a different time to encode, e.g. depending on the complexity of the input. Additionally, a distributed instance that is encoding a segment may shut down unexpectedly so that encoding job would have to be restarted (e.g. spot / preempted instances). For these reasons, the encoding of segment n+1 may finish before segment n.

In addition, as the distributed nodes should not be synchronized and send the segment as fast as possible to the publishing endpoint so not to introduce additional latency by another process that is coordinating out of order segments, segment n+1 may be sent to the endpoint before segment n.

In section 6.2, point 4 states:

The fragment decode timestamps "tfdt" of fragments in the fragmentedMP4stream and the indexes base_mediadecode time SHOULD arrive in increasing order for each of the different tracks/streams that are ingested.

Our understanding is that means segments must be sent in-order, in that case, segment n must be sent before segment n+1. That causes a huge synchronization effort for encoding services and will also make the encoding process less efficient.

Our suggestion is to remove that part so that the segments can be sent out-of-order. Obviously, segment n+1 will only have decode timestamps higher than segment n.

Our proposal would be that the decode timestamp for each segment is used to impose a total order on the segments sent to the packager*. If the transcoder must send a discontinuous segment this is actively signaled via an http header indicating this segment is discontinuous in timestamps (this mechanism is likely desirable anyways).

*That is for each segment starting at time t with duration d the next segment must start at time t+d or signal a discontinuity. (Note that if the clock is specified as part of a galois field this still holds. Given a 2^33 90KHZ clock at time t_n-1 = 8589757592 d = 180000 the start of the next segment must be t_n-1 = (t_n-1 + d) % 2^33 = 90000).

unifiedstreaming commented 6 years ago

The encoding of segment n+1 may finish before segment n makes sense in a VOD workflow, but for Live it's problematic.

Could you provide insight into why with a live stream out of order may occur? For instance , a live signal implies sequential processing, as signal is arriving from a camera (or feed), there is no n+1 (as that is the future).

The reason the spec says SHOULD is that out of order introduces discontinuities which when a player would request would lead to 404's which the CDN will cache so a player will never be updated, even if the fragment does arrive.

It also spells trouble for low latency, there is no room to 'miss' chunks.

rkersche commented 6 years ago

The live stream ingested in our encoder, will be chunked on the master node, e.g. in 4 seconds segments. These segments will then be processed by the worker nodes. As the complexity of a segment may vary, it happens that segment n+1 will be finished before segment n. In addition, the failure of a node processing a chunk (e.g. when using spot instances) may trigger a retry of that chunk. That chunk retry will often mean a chunk will be completed after subsequent chunks. Thus segment n will be pushed to the Unified ingest after segment n+1.

So the “out-of-order” situation doesn’t come from the ingest into the encoder (e.g. camera feed) but from the distributed encoding and the different complexity/encoding time of each individual segment. For many, many broadcasters quality and reliability of the transcode is more important than being close to the live point. Restricting use cases which trade live point latency for performance seems counterproductive to us.

unifiedstreaming commented 6 years ago

latency is becoming more an important issue for broadcasters (see low latency dash), out of order posting will result in time line discontinuities resulting in 404 that will also be cached in the CDN, also it will lead to increased latency, this is why this is not recommended in the specification. If there are cases were latency is not an issue, you can apply a reorder process and post the segments in order to the ingest point aswell solving the issue aswell

ioreper commented 5 years ago

Negative error caching configuration and origin retry logic is handled by the CDNs so it's expected that any tuning/optimization for specific use cases, like low latency live linear, will be done by the service provider.

Segments posted to the origin from an encoder can arrive out of order for many reasons, including distributed encoding but also error/failure scenarios which are unavoidable (e.g. temporary network issues or pre-empted compute instances for example).

The requirement itself acknowledges this as it reads that the segments SHOULD arrive out of order. This means that there may exist valid reasons in particular circumstances to ignore this requirement while understanding the potential implications of doing so.

Is that valid or are you saying that this is a MUST requirement meaning that it is absolutely required in all cases? If it is indeed a SHOULD requirement, then I think we've discussed and understood both perspectives on this issue.

RufaelDev commented 5 years ago

this issue was discussed in the call on october 3rd https://github.com/unifiedstreaming/fmp4-ingest/blob/master/conference-calls/media-ingest-call-october-3.txt
discussion on the media track requirement 4. that fragments should arrive with increasing basemediadecodetime in the tdft, solution proposed:

Therefore, in this regard the ingest spec is accurate and for conformance this test should be passed.