unifiedstreaming / fmp4-ingest

Repository on shared work on developing a fragmented MPEG-4 ingest specification
32 stars 11 forks source link

MPEG DASH as an ingest format #1

Closed wilaw closed 6 years ago

wilaw commented 6 years ago

The proposed spec opens with the statement that "unfortunately MPEG-DASH [2] is a client only protocol. MPEG DASH seems not suitable for pushing content from a live encoder to a media processing function, as it only supports pull based requests based on existing manifest".

DASH is a presentation description. It is agnostic as to whether it is interpreted by a client or server. It does describe URLs where content can be retrieved. It also importantly describes the relationship between the media segments (are they part of a switching set for example), what their resolution, codec and mime-type is, frame rate etc. In general it describes in precise detail a collection of media segments that an encoder may be producing. In this respect it seems a good candidate to describe the output of an encoder which is being sent to an origin ingest point. If not, then we need to reinvent something very similar in order to describe the relationship between the segments that are being sent, which is basically what the proposed document does. This seems unnecessary in my opinion.

The encoder already knows how to produce a DASH manifest and segments. The only change it would need to make is for it to POST the segments to the origin instead (or perhaps in addition to) writing them to disk. The origin, upon receiving a manifest, can then prep itself to receive the content defined by that manifest. If we set the rules that

  1. only relative URLs be used,
  2. Unique paths are used for each new presentation then it would work like this:

Encoder POSTs a manifest at some origin http://someoriginservice.com/live/customerA/manifest.mpd

The origin receives this and checks to see if this is an update on an existing manifest. if not, then it cues up a new session using "http://someoriginservice.com/live/customerA/" as the keyID.

The encoder then starts POSTING segments, for example: http://someoriginservice.com/live/customerA/adaptionset1/representation1/segment1.cmfv

As each segment is received, the origin pattern-matches against its open sessions. IN this case, it sees that this post matches the ID "http://someoriginservice.com/live/customerA/" and that therefore this segment must belong to something described by that manifest. It looks through its stored copy of the manifest, and figures out that "adaptionset1/representation1/segment1.cmfv" represents content for a particular adaptionset and representation. Furthermore, the manifest contains all the info it might need to know about this segments (codec, duration, resolution etc).

The origin can then do what it needs to do with this segment. That may be to send it off for further transcoding (if this input stream is a SBR mezz stream), or simply to pass it through. Pass-through becomes simple, because all the origin has to do is rewrite the URL templates in the manifest to use the actual delivery hostname and path configuration.

Akamai has something very similar to this for HLS ingest today and we will be adding DASH support in the future. I'd prefer to use an MPEG standard that already exists for media ingest than create a new one.

unifiedstreaming commented 6 years ago

Hi Will, thanks for the feedback, these are good points. Re-use of existing standards like DASH or fragmented MP4 is key for sure for the draft specification. The one thing that we are not sure about is if the encoder has enough information to create the manifest (anticipating the future in a way). Also it could be challenging to keep track of manifest and streams in a synchronized way. Also DASH does not define at what pace segments should be send and how to handle failovers and so on in case of using POST we see this as a problem with this approach. To be honest, the draft specification was just to trigger the discussion, for the upcoming conference call it will also be important to nail down the requirements and so on, but it is also good to consider different options/directions for the specification. It will be interesting to hear the view of others on this,but for is this approach was quite problematic, some of the larger encoder vendors will join the discussion later today, talk to you soon, best Rufael

KevinAmazon commented 6 years ago

AWS Elemental endorses this position as well. Basing the contribution specification off of the MPEG-DASH specification (multiple segments, multiple POST requests) seems like a valuable way to go.

unifiedstreaming commented 6 years ago

DASH is a client only protocol, it does not have any parts related to the ingest to processing nodes such as unified origin, so if we want to go for this MPEG DASH still needs to be updated and this needs to be added to the MPEG DASH standard, is this the path we want to follow ? Regarding the approach proposed by Will, this may work for ingesting to a CDN, but for ingesting towards an origin/media processing node typically comes before the CDN and has other requirements, ingesting DASH raises many issues to video streaming platforms. I have uploaded a document describing some of the issues under ingest docs (under ingest docs) and the responsibilities of the different components in the workflow. For us these issues need to be discussed and adressed if we want to pursue this approach. Second if we pursue this approach we need a starting point, does anyone have text for this specification, it needs to be uploaded in order to discuss it in the next call, if we have text on this we can use it as a starting point if we can adress the issues that we raised in our document. If these issues can be adressed Unified Streaming can support such an approach, this would be good as we would like to be alligned with your preferred approach, however the current state of DASH ingest is not suitable for us as it needs a lot of improvement to be suitable for the processing done. Some of the issues encountered with DASH ingest are the following:

  1. Consistency manifest and fmp4 for A/V information is needed in both, this is often not the case leading to inconsitency and problem, information may not be present in manifest, missing in MP4, which is leading ? we recommend the fmp4, as cmaf showed that most important information can be contained in the fmp4 file.
  2. This leads to redundant representation (having the same information in the manifest and in the mp4 file
  3. Lack of knowledge of the complete media presentation at encoder to the client may lead to non-conformant manifests, media processing specific cues may not be supported in DASH
  4. Media processing needs to be supported, this is not well supported in DASH which aims at playback by clients, it is important to have markers that also apply to HLS or HDS in the stream, CMAF or fragmented mp4 is better for this,
  5. Reconnect and fault tolerance failure, do you want to keep re-sending the manifest, how to handle multiple short running posts, how do you detect a disconnect ?
  6. Synchronization of multiple encoders posting to a single publishing point/origin media processing, how is this achieved in this approach ? How is it checked that different encoders produce the same manifests, especially if it can change over time as specified in your approach.
  7. Manifest needs to be available before any of the segments are posted, how can this be guaranteed ? DASH ingest needs to specify that mpd is sent before the segment streams
  8. How to handle manifest changes when posting from multiple encoders ?
  9. Availability of the segment, what to do after N hours to delete the segment ?
  10. DASH offers too many degrees of freedom of client related information not relevant for media processing entity, how to limit the dash specification for DASH ?
  11. Encoder does not have full information of the client presentation resulting in non-compliant manifests
  12. How to support HLS from DASH ingest without active processing of the data in DASH, needs regenerate the manifest in any case, so why start with a full dash manifest ?
  13. Streams are not synchronized and can arrive out of order as they are seperate posts, how to detect disconnection as streams are separate posts
  14. Manifest coupling, manifest cannot have the exact time of segments everytime as it may not be known in advance, manifests receive out of order as they are seperately sent, this can raise a lot of issues with the timing please also see the document under ingest-docs,

To summarize the DASH manifest adds a lot of undesirable overhead to the video streaming workflows, we think the DASH manifest can be added but should not be leading the ingest, we think the manifest should just serve as a helper to CDN passthrough, but most information should be contained in the MP4 streams that contain both the meta and media data relevant for the media processing. thanks

mattjpoole commented 6 years ago

The wider problem for broadcasters is that we have two layers of contribution; mezzanine transport stream to distribution encode and distribution encoder to packager/origin.

CDN to broadcaster owned origin is a well established pattern.

There is a path where uncompressed HDSDI into contribution encode simply becomes transport streams contributed directly from broadcast systems (given transport stream is how broadcast itself is transmitted). Time based metadata in Transport Steam is a well tried and tested mechanism for contribution.

For me the the first question should be where does the 'IP push' layer reside, I see two possibilities:

Distribution encode push: Distribution encode converges 'closer' to broadcast systems to become the new contribution layer (I think this is the approach under discussion) - this does seem to require some standardisation if the approach carries on.

Mezzanine encode push: The final push step is effectively at the mezzanine layer and distribution encoder converges into the broadcaster owned origin where it is simply then a matter of writing transcodes to a file system of a origin operated by the same entity. The need for a standard here for interchange would seem less necessary as 'both sides' are owned by the origin vendor.

I'd be interested to hear views on this.

wilaw commented 6 years ago

Replying to last Unified Streaming post (and trying not to fork away from the question Matt is asking which is valuable), I wanted to address some of the questions around using DASH as an ingest format. Firstly, I agree that DASH as it stands today is clearly intended as a client playback format. To use it for contribution, we would need to apply additional restrictions that are appropriate for contribution (such as retry logic, limit to relative URLs, unique paths and maybe event addressing schemes). I provide some answers below to the issues raised. These are not intended to be authoritative in any sense and I am open to debate on any of them:

  1. Consistency manifest and fmp4 for A/V information is needed in both, this is often not the case leading to inconsitency and problem, information may not be present in manifest, missing in MP4, which is leading ? we recommend the fmp4, as cmaf showed that most important information can be contained in the fmp4 file. This leads to redundant representation (having the same information in the manifest and in the mp4 file [WL] - bad manifests as well as manifests that don't match the content they purport to describe are obviously out of spec. The same argument can be made against CMAF brands that don't match what is encoded in the mdat. Neither is desirable. Concerning all information being contained in a CMAF file, what about license URLs for DRM-protected content? Also, how would the language of the audio track be specified? If the encoder is sending 3 audio streams, how does the receiving entity know which is the English/Dutch/German stream? The current proposal allows for an identifier to be added to the POST, but this would then require a 3rd party DB lookup to tie the identifier to the language. The DASH manifest on the other hand provides a standard for defining content characteristics such as language, role, group etc, all of which are necessary for eventually constructing the player-facing presentation.
  2. Lack of knowledge of the complete media presentation at encoder to the client may lead to non-conformant manifests, media processing specific cues may not be supported in DASH [WL] - the encoder need only describe what it is producing and sending to the publishing point. It should have the knowledge to fully describe this. For media processing cues, since both CMAF and DASH support EMSG, I would be in favor of standardizing around that, and having the encoder convert all incoming ID3 and SCTE markers in to their EMSG equivalents.
  3. Media processing needs to be supported, this is not well supported in DASH which aims at playback by clients, it is important to have markers that also apply to HLS or HDS in the stream, CMAF or fragmented mp4 is better for this,. [WL] - does standardizing on EMSG help resolve this issue?
  4. Reconnect and fault tolerance failure, do you want to keep re-sending the manifest, how to handle multiple short running posts, how do you detect a disconnect ? [WL] - these are all valid points. These are things that an ingest spec which uses DASH as its base would need to specify.
  5. Synchronization of multiple encoders posting to a single publishing point/origin media processing, how is this achieved in this approach ? How is it checked that different encoders produce the same manifests, especially if it can change over time as specified in your approach. [WL] no different to requiring encoders to produce and POST the same CMAF files at the same time. Yes, we would require synchronized encoders to produce bit-equivalent manifests and segments.
  6. Manifest needs to be available before any of the segments are posted, how can this be guaranteed DASH ingest needs to specify that mpd is sent before the segment streams? [WL] - yes, ingest spec would specify that manifest must be sent before segments.
  7. How to handle manifest changes when posting from multiple encoders ? [WL] Encoders should be synched, manifests should be consistent between encoders. This is true of DASH used with CDNs today.
  8. Availability of the segment, what to do after N hours to delete the segment ? [WL] - this comment is a good one, since it exposes a large valid concern about using DASH for ingest which is the that the concept of availability is no longer applicable, nor is the need to describe a URL to request the content (since the receiving entity will never make the request, but rather receive the POST). The segment could be deleted instantly after the POST successfully completes. This raises the question of what to put in the manifests for the @media attribute. It is there to identify the POSTed content, but not will never be used to generate a request. It does not therefore need to be a valid URL.
  9. DASH offers too many degrees of freedom of client related information not relevant for media processing entity, how to limit the dash specification for DASH ? [WL] - absolutely agree. DASH offers 3 ways to do everything. It would be great for an ingest spec to restrict DASH to the minimum necessary to meet ingest needs.
  10. Encoder does not have full information of the client presentation resulting in non-compliant manifests [WL] - not sure what you are getting at here. Encoder need only describe what it is sending.
  11. How to support HLS from DASH ingest without active processing of the data in DASH, needs regenerate the manifest in any case, so why start with a full dash manifest ? [WL] - because DASH is a ready-made means of describing the content. The alternative is inventing another means of describing presentation data (using added fields on the POST) which feels like making a new standard when we have one that would work.
  12. Streams are not synchronized and can arrive out of order as they are seperate posts, how to detect disconnection as streams are separate posts [WL] firstly, don't we have the same problem if mp4 segments are posted directly and secondly, the DASH manifest describes the order of segments to be expected (and if anything is missing) which is information we loose if we just post fmp4s directly
  13. Manifest coupling, manifest cannot have the exact time of segments everytime as it may not be known in advance, manifests receive out of order as they are separately sent, this can raise a lot of issues with the timing please also see the document under ingest-docs, [WL] - I agree that manifest precision and timing is a concern, but I think these can be solvable issues both in specification and in practice.
unifiedstreaming commented 6 years ago

I think DASH ingest will work for the passive processing entities after adressing some of these issues incorporated in the updated draft spec. For media ingest to active processing entities like Unified Origin we reviewed the proposal and counter propose the following to get this working with USP which makes more sense to us than putting a lot of work in resolving this long list of issues which is only partially resolved by wills comments:

unifiedstreaming commented 6 years ago

OK we discussed many of the issues in the call, we will continue to work with separate issues addressing specific points in the draft specification, keeping it inline with what will initial proposed but also fitting USP requirements and work done in the past with different encoder vendors, I will close this issue in a few days.