shaka-project / shaka-player

JavaScript player library / DASH & HLS client / MSE-EME player
Apache License 2.0
7.05k stars 1.33k forks source link

DASH hard of hearing subtitle distinction #5211

Closed DDeis closed 1 year ago

DDeis commented 1 year ago

Have you read the FAQ and checked for duplicate open issues? Closed issue 2734 seems partially similar

What version of Shaka Player are you using? 4.3.6 (Not the UI version)

Can you reproduce the issue with our latest release version? Yes

Can you reproduce the issue with the latest code from main? Not tested

Are you using the demo app or your own custom app? Custom app

If custom app, can you reproduce the issue using our demo app? Yes

What browser and OS are you using? MacOS Ventura 13.3 / Chrome 113.0.5672.63

What did you do?

When loading a manifest with the following text adaptation set:

<!-- French -->

<AdaptationSet id="7" group="8" bitstreamSwitching="true" segmentAlignment="true" contentType="text" mimeType="application/mp4" lang="fre">
  <Role schemeIdUri="urn:mpeg:dash:role:2011" value="main"/>
  <SegmentTemplate timescale="10000000" media="S!d2ENZGFzaF9keW5fd2lkZRICVP7...8BFgSf/QualityLevels($Bandwidth$)/Fragments(fre_0=$Time$)" initialization="S!d2ENZGFzaF9keW5fd2lkZRICVP7...8BFgSf/QualityLevels($Bandwidth$)/Fragments(fre_0=Init)">
    <SegmentTimeline>
      <S t="16832248012912070" d="20000000" r="412"/>
    </SegmentTimeline>
  </SegmentTemplate>
  <Representation id="dxFknw.." bandwidth="100" codecs="stpp"/>
</AdaptationSet>

<AdaptationSet id="8" group="8" bitstreamSwitching="true" segmentAlignment="true" contentType="text" mimeType="application/mp4" lang="fre">
  <Accessibility schemeIdUri="urn:tva:metadata:cs:AudioPurposeCS:2007" value="2"/>
  <Role schemeIdUri="urn:mpeg:dash:role:2011" value="main"/>
  <SegmentTemplate timescale="10000000" media="S!d2ENZGFzaF9keW5fd2lkZRICVP7...8BFgSf/QualityLevels($Bandwidth$)/Fragments(fre_1=$Time$)" initialization="S!d2ENZGFzaF9keW5fd2lkZRICVP7...8BFgSf/QualityLevels($Bandwidth$)/Fragments(fre_1=Init)">
    <SegmentTimeline>
      <S t="16832248012912070" d="20000000" r="412"/>
    </SegmentTimeline>
  </SegmentTemplate>
  <Representation id="dxFlnw.." bandwidth="101" codecs="stpp"/>
</AdaptationSet>

<!-- German -->

<AdaptationSet id="9" group="8" bitstreamSwitching="true" segmentAlignment="true" contentType="text" mimeType="application/mp4" lang="ger">
  <Role schemeIdUri="urn:mpeg:dash:role:2011" value="alternate"/>
  <SegmentTemplate timescale="10000000" media="S!d2ENZGFzaF9keW5fd2lkZRICVP7...8BFgSf/QualityLevels($Bandwidth$)/Fragments(ger_2=$Time$)" initialization="S!d2ENZGFzaF9keW5fd2lkZRICVP7...8BFgSf/QualityLevels($Bandwidth$)/Fragments(ger_2=Init)">
    <SegmentTimeline>
      <S t="16832248012912070" d="20000000" r="412"/>
    </SegmentTimeline>
  </SegmentTemplate>
  <Representation id="dxFmnw.." bandwidth="102" codecs="stpp"/>
</AdaptationSet>

<AdaptationSet id="10" group="8" bitstreamSwitching="true" segmentAlignment="true" contentType="text" mimeType="application/mp4" lang="ger">
  <Accessibility schemeIdUri="urn:tva:metadata:cs:AudioPurposeCS:2007" value="2"/>
  <Role schemeIdUri="urn:mpeg:dash:role:2011" value="alternate"/>
  <SegmentTemplate timescale="10000000" media="S!d2ENZGFzaF9keW5fd2lkZRICVP7...8BFgSf/QualityLevels($Bandwidth$)/Fragments(ger_3=$Time$)" initialization="S!d2ENZGFzaF9keW5fd2lkZRICVP7...8BFgSf/QualityLevels($Bandwidth$)/Fragments(ger_3=Init)">
    <SegmentTimeline>
      <S t="16832248012912070" d="20000000" r="412"/>
    </SegmentTimeline>
  </SegmentTemplate>
  <Representation id="dxFnnw.." bandwidth="103" codecs="stpp"/>
</AdaptationSet>

What did you expect to happen?

As per https://dvb.org/wp-content/uploads/2022/07/A168r6_MPEG-DASH-Profile-for-Transport-of-ISO-BMFF-Based-DVB-Services_Interim-Draft-ts_103-285-v141_October_2022.pdf (page 57, table 19), <Accessibility schemeIdUri="urn:tva:metadata:cs:AudioPurposeCS:2007" value="2"/> allows to identify hard of hearing subtitle tracks.

What actually happened?

Text tracks returned by shaka provide no means to distinguish the hard of hearing track from the basic track of the same language.

[
    {
        "id": 19,
        "active": false,
        "type": "text",
        "bandwidth": 0,
        "language": "fr",
        "label": null,
        "kind": "subtitle",
        "width": null,
        "height": null,
        "frameRate": null,
        "pixelAspectRatio": null,
        "hdr": null,
        "mimeType": "application/mp4",
        "audioMimeType": null,
        "videoMimeType": null,
        "codecs": "stpp",
        "audioCodec": null,
        "videoCodec": null,
        "primary": true,
        "roles": [
            "main"
        ],
        "audioRoles": null,
        "forced": false,
        "videoId": null,
        "audioId": null,
        "channelsCount": null,
        "audioSamplingRate": null,
        "spatialAudio": false,
        "tilesLayout": null,
        "audioBandwidth": null,
        "videoBandwidth": null,
        "originalVideoId": null,
        "originalAudioId": null,
        "originalTextId": "dxFknw..",
        "originalImageId": null
    },
    {
        "id": 20,
        "active": false,
        "type": "text",
        "bandwidth": 0,
        "language": "fr",
        "label": null,
        "kind": "subtitle",
        "width": null,
        "height": null,
        "frameRate": null,
        "pixelAspectRatio": null,
        "hdr": null,
        "mimeType": "application/mp4",
        "audioMimeType": null,
        "videoMimeType": null,
        "codecs": "stpp",
        "audioCodec": null,
        "videoCodec": null,
        "primary": true,
        "roles": [
            "main"
        ],
        "audioRoles": null,
        "forced": false,
        "videoId": null,
        "audioId": null,
        "channelsCount": null,
        "audioSamplingRate": null,
        "spatialAudio": false,
        "tilesLayout": null,
        "audioBandwidth": null,
        "videoBandwidth": null,
        "originalVideoId": null,
        "originalAudioId": null,
        "originalTextId": "dxFlnw..",
        "originalImageId": null
    },
    {
        "id": 21,
        "active": false,
        "type": "text",
        "bandwidth": 0,
        "language": "de",
        "label": null,
        "kind": "subtitle",
        "width": null,
        "height": null,
        "frameRate": null,
        "pixelAspectRatio": null,
        "hdr": null,
        "mimeType": "application/mp4",
        "audioMimeType": null,
        "videoMimeType": null,
        "codecs": "stpp",
        "audioCodec": null,
        "videoCodec": null,
        "primary": false,
        "roles": [
            "alternate"
        ],
        "audioRoles": null,
        "forced": false,
        "videoId": null,
        "audioId": null,
        "channelsCount": null,
        "audioSamplingRate": null,
        "spatialAudio": false,
        "tilesLayout": null,
        "audioBandwidth": null,
        "videoBandwidth": null,
        "originalVideoId": null,
        "originalAudioId": null,
        "originalTextId": "dxFmnw..",
        "originalImageId": null
    },
    {
        "id": 22,
        "active": false,
        "type": "text",
        "bandwidth": 0,
        "language": "de",
        "label": null,
        "kind": "subtitle",
        "width": null,
        "height": null,
        "frameRate": null,
        "pixelAspectRatio": null,
        "hdr": null,
        "mimeType": "application/mp4",
        "audioMimeType": null,
        "videoMimeType": null,
        "codecs": "stpp",
        "audioCodec": null,
        "videoCodec": null,
        "primary": false,
        "roles": [
            "alternate"
        ],
        "audioRoles": null,
        "forced": false,
        "videoId": null,
        "audioId": null,
        "channelsCount": null,
        "audioSamplingRate": null,
        "spatialAudio": false,
        "tilesLayout": null,
        "audioBandwidth": null,
        "videoBandwidth": null,
        "originalVideoId": null,
        "originalAudioId": null,
        "originalTextId": "dxFnnw..",
        "originalImageId": null
    }
]

_I'm willing to send a PR, I suppose it would be by handling urn:tva:metadata:cs:AudioPurposeCS:2007 in dashparser.parseAdaptationSet but I don't know how it would be convenient to map the value in shaka.extern.Track_

avelad commented 1 year ago

@theodab can you help here? Thanks!

theodab commented 1 year ago

It's called "urn:tva:metadata:cs:AudioPurposeCS:2007" but it's also used for subtitles? How wacky. Anyway, this shouldn't be too hard.