shaka-project / shaka-player

JavaScript player library / DASH & HLS client / MSE-EME player
Apache License 2.0
7.18k stars 1.34k forks source link

Thumbnail track support #559

Closed pakerfeldt closed 3 years ago

pakerfeldt commented 8 years ago

This was previously discussed briefly in google groups but I figure it's better suited here, so I'm moving it.

The initial question from @wader:

Hello

Im researching different ways for embedding custom data in DASH manifests. In my case i would like to add custom information for seek thumbnails. For example one way would be to add a AdaptationSet with contentType like image/jpeg etc. But currently it seems shaka is quite picky with what it finds in the manifest so i run into various errors (DASH_EMPTY_ADAPTATION_SET, DASH_UNSUPPORTED_CONTAINER, ...) also this would also require some kind of API for accessting the raw manifest.

Is there already a way of doing this? if not would something like this be interesting for other people and fit into shaka?

-Mattias

And the response from @joeyparrish:

Hi Mattias,

It seems to me that AdaptationSets are not a great place for a custom extension because they are already used for something standard. It is common for DASH extensions to use their own unique elements in their own namespace to avoid conflicting with standard parts of DASH.

For example, you might do something like this:

<MPD xmlns="urn:mpeg:dash:schema:mpd:2011"
    xmlns:myApp="http://foo.bar/myApp"
    minBufferTime="PT2S" type="static">
  <Period id="0" duration="PT60S">
    <AdaptationSet id="0" contentType="video/mp4">
      <myApp:seekThumbs mimeType="image/jpeg">
        <myApp:thumb time="0" img="thumb-0.jpg" />
        <myApp:thumb time="2" img="thumb-2.jpg" />
        <myApp:thumb time="4" img="thumb-4.jpg" />
        ...
        <myApp:thumb time="56" img="thumb-56.jpg" />
        <myApp:thumb time="58" img="thumb-58.jpg" />
      </myApp:seekThumbs>
      <Representation ... />
      <Representation ... />
      <Representation ... />
    </AdaptationSet>
  </Period>
</MPD>

As for accessing that from Shaka Player, we don't currently have an API that is exactly for this, but we do have network request and response filters. You could listen for manifest responses and get access to the XML before it goes to the manifest parser, but correlating that to the internal manifest representation might get tricky.

Another approach would be to create your own manifest parser that inherits from our DASH parser. All manifest parsers are plugins, so you can easily register one that replaces ours. You can also customize the build to include your plugin: http://shaka-player-demo.appspot.com/docs/api/tutorial-plugins.html

pakerfeldt commented 8 years ago

So, we are apparently looking for ways to encode thumbnail metadata into the manifest itself and figured that AdaptationSet actually fits pretty well. The DASH spec says:

Within a Period, material is arranged into Adaptation Sets (see 5.3.3). An Adaptation Set represents a set of interchangeable encoded versions of one or several media content components (see 5.3.4).

Adding images here to me is not breaking the spec really. It could be argued as being another set of media content components. The spec seems pretty permissive. Whether used or not would be up to the client. Our current approach looks something like this:

<AdaptationSet id="10" contentType="image" mimeType="image/jpeg">
     <Representation id="0" width="1280" height="720" framerate="100">
         <BaseURL>http://cdn.example.com/c760d497c55344118cccfbe477e85315/dc806700ec1e4b33d0935f85a36bbf13/sprite_1280_720_100</BaseURL>
     </Representation>
     <Representation id="1" width="1920" height="1080" framerate="100">
         <BaseURL>http://cdn.example.com/c760d497c55344118cccfbe477e85315/dc806700ec1e4b33d0935f85a36bbf13/sprite_1920_1280_100</BaseURL>
     </Representation>
</AdaptationSet>

This works fine on Android using ExoPlayer. However, Shaka bails out since it seems to try to parse that media content. I would expect Shaka to ignore any adaptation sets it does not understand and proceed to playback the video/audio media defined.

We could of course, as suggested previously, go with a separate namespace in the manifest but given that ExoPlayer already handles this nicely (and even parses it out for us) it would be nice to at least have Shaka not fail playback completely. Another approach for us is to move away from DASH completely and use a proprietary format since we control both server and client side.

joeyparrish commented 8 years ago

You're right, the spec is very permissive. However, it is very difficult to make a generic client that supports everything in the spec. Instead, we try to stick to the guidelines in the DASH Interoperability Points (IOP), which are much more restrictive and reasonable to implement for a generic client such as ours.

I believe the best thing for thumbnails would be to follow the model of trick mode. Here's the relevant text on trick mode from IOP v3.3:

3.2.9. Trick Mode Support

Trick Modes are used by DASH clients in order to support fast forward, seek, rewind and other operations in which typically the media, especially video, is displayed in a speed other than the normal playout speed. In order to support such operations, it is recommended that the content author adds Representations at lower frame rates in order to support faster playout with the same decoding and rendering capabilities.

However, Representations targeted for trick modes are typically not be suitable for regular playout. If the content author wants to explicitly signal that a Representation is only suitable for trick mode cases, but not for regular playout, it the following is recommended:

  • add an Adaptation Set that that only contains trick modes Representations
  • annotate the Adaptation Set with an EssentialProperty descriptor or SupplementalProperty descriptor with URL "http://dashif.org/guidelines/trickmode" and the @value the value of @id attribute of the Adaptation Set to which these trick mode Representations belong. The trick mode Representations must be time-aligned with the Representations in the main Adaptation Set. The value may also be a white-space separated list of @id values. In this case the trick mode Adaptation Set is associated to all Adaptation Sets with the values of the @id.
  • signal the playout capabilities with the attribute @maxPlayoutRate for each Representation in order to indicate the accelerated playout that is enabled by the signaled codec profile and level.

If an Adaptation Set is annotated with the EssentialProperty descriptor with URI "http://dashif.org/guidelines/trickmode" then the DASH client shall not select any of the contained Representations for regular playout.

Similarly, thumbnails should be in an AdaptationSet which is labeled as not appropriate for normal playback, but associated with specific AdaptationSets which are for normal playback. That would allow clients to identify and skip thumbnails if they don't support them, or to invoke special logic if they do.

I think the best thing to do is to create a proposal for the formatting of the content and the description of it in the manifest, as well as some sample content for demo purposes. The client-side support could be incubated in Shaka Player under an experimental branch until we feel good about the specifics. The solution could be proposed to the DASH-IF for inclusion in the IOP. All of this is assuming, of course, that there is no similar work already underway in the DASH-IF.

@baconz, I know you were interested in something similar. Any thoughts?

@wilaw, is there any work underway in the DASH-IF for thumbnail tracks already?

wilaw commented 8 years ago

@Joey - the current test vectors for trick mode http://testassets.dashif.org/#feature/details/57cd83dfb626efae4d44d450 are all sized 1280x720, so not suitable for thumbnail purposes ☹

I raised this issue with our test vector group and they are going to create some new vectors suitable for thumbnails. The vectors will have pure thumbnail-sized tracks, but there will also be some which have multiple trick mode tracks at different sizes, to exercise player logic in selecting the best trick mode track to use as a thumbnail. Waqar (cc’d) is managing the creation of these new test vectors.

Cheers

Will

joeyparrish commented 7 years ago

Sorry, Will, that's not quite what I meant. Let me clarify.

I believe the use-case here is that people want to show a thumbnail from the video as part of the scrubbing UI. For example, as you drag the scrubber, a thumbnail of that part of the video would hover above the scrubber, and when you release the scrubber, the video seeks to that point and hides the thumb.

I don't think we should use trick play tracks as thumbnails. Trick play tracks are videos that must be decoded by the browser in a media element, so they can't be easily used for the UI scenario I just described. Further, they may be encrypted, which would prevent the frames from being copied to a canvas even if the track was loaded in an off-screen video element.

Instead, I think we should come up with another EssentialProperty URI, like the one for trick play, but specifically for thumbnails. The thumbs would be images like jpegs or pngs that could be easily used in the UI. Something along these lines:

    <AdaptationSet id="12">
      <EssentialProperty schemeIdUri="http://dashif.org/guidelines/thumbnails" value="10" />
      <Representation id="12" mimeType="image/jpeg" width="100" height="75" frameRate="1/5">
        <SegmentTemplate media="thumb$Number$.jpeg" duration="5" />
      </Representation>
    </AdaptationSet>

Is that crazy? Is that something the DASH-IF would be interested in incorporating if we incubate the concept here first?

wilaw commented 7 years ago

Hey Joey

The DASH IF view has been that thumbnails should be generated from trick view tracks, rather than adding hundreds of jpegs to a package. Yes, the thumbnail would need to be rendered by a videoElement, either offscreen (and copied to a canvas element) or on-screen, with the thumb video element superimposed over the primary element. The point about encryption is a fair point, although it would apply equally to the jpeg images themselves. In reality, small thumb videos that are keyframe-only would not require encryption.

Your proposal for a new EssentialProperty looks clean and not the least bit crazy ☺ My only comment is that the @frameRate attribute would not be required. I will take it to DASH IF. I realize that playing another video requires essentially another player instance and this has a lot more overhead than loading and displaying some jpegs, so it’s pretty clear to me personally that in a MSE player I would rather implement the jpeg approach than the second video element approach.

Have you made a sample video with this adaption set? If not, we can collaborate on a Buck Bunny sample.

wilaw commented 7 years ago

Actually it was pretty easy and fast to make a sample. I changed the manifest declaration a bit to make it more consistent. Let me know if this is what you had in mind:

http://dash.edgesuite.net/akamai/bbb_30fps/bbb_with_thumbnails.mpd

This sample content has frame and timecode burnt in which is handy for checking positional consistency.

joeyparrish commented 7 years ago

Thanks, Will! Does anybody have feedback on this proposal or the sample media provided by Will?

waqarz commented 7 years ago

All,

Added a few trick-mode vectors here:

http://testassets.dashif.org/#testvector/list

In the search box for testcases, please look for: "Thumbnail Trickmode" , there are 4 vectors there with description on each. Hope this helps.

joeyparrish commented 7 years ago

@waqarz, thanks! Those will be useful in #538. I'll copy the link over there for reference.

wilaw commented 7 years ago

@TobbeEdgeware has created a proposal (or rather two) for how to provide thumbnail imags for scrollbars . DASH IF would like to standardize this and is open to input from @joeyparrish and Shaka.

The presentation from the dashjs f2f, and the proposal is available at https://github.com/Dash-Industry-Forum/DASH-IF-IOP/blob/master/thumbnails/README.md

Please provide comments in the issue thread https://github.com/Dash-Industry-Forum/DASH-IF-IOP/issues/119, or make pull requests towards the files if there is anything you’d like to change.

Cheers Will

chrisfillmore commented 7 years ago

I was curious if there was any news about this topic. I see that the DASH-IF ticket 119 is closed and the spec defines support for tiled thumbnails. This solution is good news (Tizen TV's, including 2017 models, do not support multiple video elements).

We expect to be able to provide keyframes for our content in the near-ish future (I think within the next few months). I'm a bit fuzzy on the player's role in handling thumbnail data from the manifest. Will the player fetch the tiles, and display them over the timeline? Or just fetch, and expose it to the client application to draw as they see fit?

Does Shaka plan to add this support (whatever it looks like)? If so, any estimate of the effort required? Thanks.

wilaw commented 7 years ago

@chrisfillmore - yes, thumbnail proposal was accepted in to DASH IF IOP 4.1 (see section 6.2.6 in http://dashif.org/wp-content/uploads/2017/09/DASH-IF-IOP-v4.1-clean.pdf) and will also be contributed back to MPEG. dash.js has thumbnail support on track for next sprint 2.6.1. Would be great if Shaka could provide a parallel implementation.

chrisfillmore commented 7 years ago

Thanks @wilaw. I have a question (cc @joeyparrish on this): Looking at media events, it's not obvious to me that there is any event that signals a scrub. So I assume that the client UI application will need to listen for their own scrub events and ask the player for the thumbnail information.

On TV platforms, there is no scrubbing. Our content does not have trick mode tracks, so in order to support "fast forward", we enable the following behaviour:

We would like to add thumbnails over the playhead during this fake fast forward. It's not totally clear to me what the sequence of interactions will be between the client application and the player. Do you think our use case is supported by the spec?

I appreciate any insight you can offer. Let me know if I can provide more info.

joeyparrish commented 7 years ago

It would seem to me that showing thumbnails while scrubbing already necessitates a custom UI. You couldn't do that with built-in controls as far as I can tell. So the best way to trigger it, I think, is on some UI event like input, change, mousedown, touchstart, etc. I haven't worked out the details for our own demo UI yet, but I'm sure we will implement this in the near future.

joeyparrish commented 7 years ago

@wilaw, I'm sorry I haven't looked at this closely or provided feedback on what landed in the latest IOP. I'm grateful that you and your colleagues in the DASH-IF worked out the details, and I look forward to implementing in Shaka Player once we make time for it.

@chrisfillmore, we haven't looked at the details and we don't have a time estimate. It's a lower priority item for us than some of the HLS work we're doing, so this is still on the backlog. If you are interested in contributing, I would be happy to talk design and review a pull request.

avelad commented 6 years ago

Sample streams: http://dash.edgesuite.net/akamai/bbb_30fps/bbb_with_tiled_thumbnails.mpd http://dash.edgesuite.net/akamai/bbb_30fps/bbb_with_4_tiles_thumbnails.mpd http://dash.edgesuite.net/akamai/bbb_30fps/bbb_with_tiled_thumbnails_2.mpd http://dash.edgesuite.net/akamai/bbb_30fps/bbb_with_multiple_tiled_thumbnails.mpd http://vm2.dashif.org/livesim-dev/testpic_2s/Manifest_thumbs.mpd

avelad commented 4 years ago

Another sample stream:

https://image.roku.com/ZHZscHItc2Ft/roku/trickplay/bbb-with-multiple-tiled-thumbnails.mpd