google / ExoPlayer

This project is deprecated and stale. The latest ExoPlayer code is available in https://github.com/androidx/media
https://developer.android.com/media/media3/exoplayer
Apache License 2.0
21.74k stars 6.03k forks source link

AudioPurpose parsing of subtitle track from DASH manifest #5529

Closed szaboa closed 5 years ago

szaboa commented 5 years ago

Issue description

Request for enhancement Parse <Accessibility> tag of subtitle tracks from mpd (DASH), more specifically the AudioPurpose part. Use case: display HOH (hard of hearing) suffix after the subtitle track's language in the chooser. Useful when you have two subtitle tracks in the same language, but one of them is HOH.

Link to test content

We have some production streams, but trying to find a sample stream where this is present.

Version of ExoPlayer being used

I've searched whether this is already implemented or not in version 2.9.5. I didn't find support for this.

Note. If this request is accepted, then I would like to work on this and submit a pull request when it's done.

ojw28 commented 5 years ago

Where is AudioPurpose defined? I can't find it in the DASH spec, or in the DASH-IF guidelines. The only relevant result when I try and search is this issue.

szaboa commented 5 years ago

Accessibility is a metadata (or descriptor?) of a <AdaptationSet> which can contain this AudioPurpose.

I guess it should be defined here: https://dashif.org/identifiers/role-and-accessibility but this link gives 404.

A small description can be found here: https://www.brendanlong.com/common-informative-metadata-in-mpeg-dash.html

A more official doc is this, where the AudioPurpose is described: https://www.etsi.org/deliver/etsi_ts/103200_103299/103285/01.01.01_60/ts_103285v010101p.pdf

ojw28 commented 5 years ago

Could you please paste a relevant snippet from one of your manifests (it doesn't need to be a fully functioning piece of test content)?

ojw28 commented 5 years ago

As an additional point, is it not possible for your manifests to use a DASH-IF recommended way of doing this? Which I think is probably to use:

@schemeIdUri="urn:mpeg:dash:role:2011" @value="enhanced-audio-intelligibility"

Similarly I think:

@schemeIdUri="urn:mpeg:dash:role:2011" @value="description"

is probably equivalent to "visually impaired" in the scheme you're referring to. There's not going to be a good DASH interoperability story if everyone is using different schemes for the same thing.

szaboa commented 5 years ago

Relevant snippet:

<!--Text-->
<AdaptationSet id="2" lang="nor" segmentAlignment="true">
      <Role schemeIdUri="urn:mpeg:dash:role:2011" value="subtitle"/>
      <SegmentTemplate timescale="10000000" duration="20000000" startNumber="1" initialization="..." media="..."/>
      <Representation id="subtitle_06" mimeType="application/mp4" codecs="stpp" bandwidth="9600" startWithSAP="1"/>
    </AdaptationSet>
    <!--Text-->
    <AdaptationSet id="3" lang="nor" segmentAlignment="true">
      <Accessibility schemeIdUri="urn:tva:metadata:cs:AudioPurposeCS:2007" value="2"/>
      <Role schemeIdUri="urn:mpeg:dash:role:2011" value="subtitle"/>
      <SegmentTemplate timescale="10000000" duration="20000000" startNumber="1" initialization="..." media="..."/>
      <Representation id="subtitle_07" mimeType="application/mp4" codecs="stpp" bandwidth="9600" startWithSAP="1"/>
    </AdaptationSet>

As an additional point, is it not possible for your manifests to use a DASH-IF recommended way of doing this?

I am not sure if this can be changed in the manifest, I don't think so unfortunately.

If you say this shouldn't be a supported scheme, then there is a possibility to extend some class and write our parsing without forking the repo and use a "modified" version locally?

szaboa commented 5 years ago

It seems the Accessiblity descriptor is already parsed, but we would need to pass that somehow into the TrackGroup. Do you think this is possible?

See com.google.android.exoplayer2.source.dash.manifest.DashManifestParser#parseAdaptationSet.

ojw28 commented 5 years ago

There are two things that I think need to be done:

  1. Add role and accessibility fields to Format. I'd propose we stick to having well defined values that these fields are allowed to take. In other words I do not think we should allow arbitrary schemes and values.
  2. Parse role and accessibility fields in the manifest. We should definitely support parsing the urn:mpeg:dash:role:2011 scheme for both, as defined in the DASH specification and DASH-IF IOP. There's probably no harm in us parsing urn:tva:metadata:cs:AudioPurposeCS:2007 values 1 and 2 as well, given how simple they are. Although we should make sure the code is written in a way that makes it possible to parse other schemes by extending DashManifestParser as well.
szaboa commented 5 years ago

Got it, good idea!

I will work on this if that's ok.

szaboa commented 5 years ago

@ojw28, I've created an initial pull request.

Add role and accessibility fields to Format.

Should I add the scheme also, or just the value is enough? If needs scheme, then should I add a Descriptor?

I'd propose we stick to having well defined values that these fields are allowed to take. In other words I do not think we should allow arbitrary schemes and values.

Currently we add the value to the Format only if it has urn:mpeg:dash:role:2011 as scheme, or urn:tva:metadata:cs:AudioPurposeCS:2007 - in this case the value can be only 1 or 2.

Although we should make sure the code is written in a way that makes it possible to parse other schemes by extending DashManifestParser as well.

Not sure how it would be the best, but the easiest would be to just put parseRole and parseAccessibility to non-static, in that case everyone could override those and provide custom scheme parsing, no?

I intend to extend the DashManifestParserTest class when the changes are finalized.

ojw28 commented 5 years ago

Should I add the scheme also, or just the value is enough? If needs scheme, then should I add a Descriptor?

When I said we should stick to well defined values, what I really meant was that role and accessibility should be integers that have their allowed values defined by IntDefs. We don't want strings where it's unclear what the allowed values are.

The IntDef's themselves should probably live in the C class. I'd propose we initially define values corresponding to what DASH-IF IOP allows. So for @Role we'd have ROLE_UNSET, ROLE_MAIN, ROLE_ALTERNATE, ROLE_SUPPLEMENTARY, ROLE_COMMENTARY, ROLE_DUB, ROLE_EMERGENCY, ROLE_CAPTION, ROLE_SIGN.

For @Accessibility we'd have ACCESSIBILITY_UNSET, ACCESSIBILITY_ENHANCED_AUDIO_INTELLIGIBILITY, ACCESSIBILITY_DESCRIPTION, ACCESSIBILITY_CAPTION, ACCESSIBILITY_SIGN.

Definitions for all of these can be found in ISO 23009-1.

Currently we add the value to the Format only if it has urn:mpeg:dash:role:2011 as scheme, or urn:tva:metadata:cs:AudioPurposeCS:2007 - in this case the value can be only 1 or 2.

The scheme specific values should be mapped onto the appropriate IntDef values, once those are defined.

szaboa commented 5 years ago

I see, I am on it.

szaboa commented 5 years ago

So for @Role we'd have ROLE_UNSET, ROLE_MAIN, ROLE_ALTERNATE, ROLE_SUPPLEMENTARY, ROLE_COMMENTARY, ROLE_DUB, ROLE_EMERGENCY, ROLE_CAPTION, ROLE_SIGN.

In ISO-23009-1:2014 - 5.8.5.5 DASH role scheme, I can't find ROLE_EMERGENCYand ROLE_SIGN. Where did you find those? Also I can't find the accessibility values.

However, the standard says this:

One Adaptation Set or one media content component may have assigned multiple roles even within the same scheme.

For both role and accessibility, should we have an int array inside just parsing the first one?

szaboa commented 5 years ago

Ok, I found it in https://dashif.org/docs/DASH-IF-IOP-v4.2-clean.htm.

szaboa commented 5 years ago

Currently we add the value to the Format only if it has urn:mpeg💨role:2011 as scheme, or urn:tva:metadata:cs:AudioPurposeCS:2007 - in this case the value can be only 1 or 2.

The scheme specific values should be mapped onto the appropriate IntDef values, once those are defined.

Not sure how to map value "1", "2" of urn:tva:metadata:cs:AudioPurposeCS:2007 to one of these ACCESSIBILITY_UNSET, ACCESSIBILITY_ENHANCED_AUDIO_INTELLIGIBILITY, ACCESSIBILITY_DESCRIPTION, ACCESSIBILITY_CAPTION, ACCESSIBILITY_SIGN.

"1" stands for for the visually impaired, and "2" is hard of hearing.

I didn't find any docs which says what are the correspondent values of these in urn:mpeg:dash:role:2011 scheme. But I would map those like this:

"1" -> ACCESSIBILITY_ENHANCED_AUDIO_INTELLIGIBILITY

"2" -> ACCESSIBILITY_CAPTION

Do you think we can go with this?

ojw28 commented 5 years ago

In ISO-23009-1:2014 - 5.8.5.5 DASH role scheme, I can't find ROLE_EMERGENCY and ROLE_SIGN. Where did you find those? Also I can't find the accessibility values.

I'm looking in a draft third edition of 23009-1. "sign" is:

Visual media component representing a sign-language interpretation of an audio component.

"emergency" is:

Experience that provides information, about a current emergency, that is intended to enable the protection of life, health, safety, and property, and may also include critical details regarding the emergency and how to respond to the emergency.

For both role and accessibility, should we have an int array inside just parsing the first one?

I'd prefer to define the IntDefs with flag = true and then OR in more than one into the value, if present. I hope there will never be more than 32 of them... :). If we do this, I wonder whether we should bother having accessibility and role as separate fields at all, as opposed to a single roleFlags field that can incorporate both. That would be more consistent with how urn:mpeg:dash:role:2011 is defined in 23009-1. It's weird that the DASH-IF doc splits its use across Role and Accessibility elements. In ExoPlayer's model I don't think we lose anything by merging them back together into a single roleFlags field when parsing the manifest. What do you think? If we go with this approach, lets prefix all the constants with ROLE_FLAGS_ and call the int def RoleFlags.

Not sure how to map value "1", "2" of urn:tva:metadata:cs:AudioPurposeCS:2007 to one of these

According to the ETSI document:

 <Term termID="1">
 <Name xml:lang="en">Audio description for the visually impaired</Name>
 </Term>
 <Term termID="2">
 <Name xml:lang="en">Audio description for the hard of hearing</Name>
 </Term>

So I think the mapping is "1" -> "description" and "2" -> "enhanced-audio-intelligibility". You could also map "3" -> "supplementary", "4" -> "commentary" and "6" -> "main". Which also kind of suggests we should have a single roleFlags, rather than separating them out.

ojw28 commented 5 years ago

Let's move all further discussion to the pull request, to avoid splitting the conversation. Thanks!

szaboa commented 5 years ago

PR #5578 was merged in, probably this can be closed?