Closed szaboa closed 5 years ago
Where is AudioPurpose
defined? I can't find it in the DASH spec, or in the DASH-IF guidelines. The only relevant result when I try and search is this issue.
Accessibility
is a metadata (or descriptor?) of a <AdaptationSet>
which can contain this AudioPurpose
.
I guess it should be defined here: https://dashif.org/identifiers/role-and-accessibility but this link gives 404.
A small description can be found here: https://www.brendanlong.com/common-informative-metadata-in-mpeg-dash.html
A more official doc is this, where the AudioPurpose
is described: https://www.etsi.org/deliver/etsi_ts/103200_103299/103285/01.01.01_60/ts_103285v010101p.pdf
Could you please paste a relevant snippet from one of your manifests (it doesn't need to be a fully functioning piece of test content)?
As an additional point, is it not possible for your manifests to use a DASH-IF recommended way of doing this? Which I think is probably to use:
@schemeIdUri="urn:mpeg:dash:role:2011"
@value="enhanced-audio-intelligibility"
Similarly I think:
@schemeIdUri="urn:mpeg:dash:role:2011"
@value="description"
is probably equivalent to "visually impaired" in the scheme you're referring to. There's not going to be a good DASH interoperability story if everyone is using different schemes for the same thing.
Relevant snippet:
<!--Text-->
<AdaptationSet id="2" lang="nor" segmentAlignment="true">
<Role schemeIdUri="urn:mpeg:dash:role:2011" value="subtitle"/>
<SegmentTemplate timescale="10000000" duration="20000000" startNumber="1" initialization="..." media="..."/>
<Representation id="subtitle_06" mimeType="application/mp4" codecs="stpp" bandwidth="9600" startWithSAP="1"/>
</AdaptationSet>
<!--Text-->
<AdaptationSet id="3" lang="nor" segmentAlignment="true">
<Accessibility schemeIdUri="urn:tva:metadata:cs:AudioPurposeCS:2007" value="2"/>
<Role schemeIdUri="urn:mpeg:dash:role:2011" value="subtitle"/>
<SegmentTemplate timescale="10000000" duration="20000000" startNumber="1" initialization="..." media="..."/>
<Representation id="subtitle_07" mimeType="application/mp4" codecs="stpp" bandwidth="9600" startWithSAP="1"/>
</AdaptationSet>
As an additional point, is it not possible for your manifests to use a DASH-IF recommended way of doing this?
I am not sure if this can be changed in the manifest, I don't think so unfortunately.
If you say this shouldn't be a supported scheme, then there is a possibility to extend some class and write our parsing without forking the repo and use a "modified" version locally?
It seems the Accessiblity
descriptor is already parsed, but we would need to pass that somehow into the TrackGroup
. Do you think this is possible?
See com.google.android.exoplayer2.source.dash.manifest.DashManifestParser#parseAdaptationSet
.
There are two things that I think need to be done:
Format
. I'd propose we stick to having well defined values that these fields are allowed to take. In other words I do not think we should allow arbitrary schemes and values.urn:mpeg:dash:role:2011
scheme for both, as defined in the DASH specification and DASH-IF IOP. There's probably no harm in us parsing urn:tva:metadata:cs:AudioPurposeCS:2007
values 1 and 2 as well, given how simple they are. Although we should make sure the code is written in a way that makes it possible to parse other schemes by extending DashManifestParser
as well.Got it, good idea!
I will work on this if that's ok.
@ojw28, I've created an initial pull request.
Add role and accessibility fields to
Format
.
Should I add the scheme also, or just the value is enough? If needs scheme, then should I add a Descriptor
?
I'd propose we stick to having well defined values that these fields are allowed to take. In other words I do not think we should allow arbitrary schemes and values.
Currently we add the value to the Format
only if it has urn:mpeg:dash:role:2011
as scheme, or urn:tva:metadata:cs:AudioPurposeCS:2007
- in this case the value can be only 1 or 2.
Although we should make sure the code is written in a way that makes it possible to parse other schemes by extending
DashManifestParser
as well.
Not sure how it would be the best, but the easiest would be to just put parseRole
and parseAccessibility
to non-static, in that case everyone could override those and provide custom scheme parsing, no?
I intend to extend the DashManifestParserTest
class when the changes are finalized.
Should I add the scheme also, or just the value is enough? If needs scheme, then should I add a Descriptor?
When I said we should stick to well defined values, what I really meant was that role
and accessibility
should be integers that have their allowed values defined by IntDef
s. We don't want strings where it's unclear what the allowed values are.
The IntDef
's themselves should probably live in the C
class. I'd propose we initially define values corresponding to what DASH-IF IOP allows. So for @Role
we'd have ROLE_UNSET
, ROLE_MAIN
, ROLE_ALTERNATE
, ROLE_SUPPLEMENTARY
, ROLE_COMMENTARY
, ROLE_DUB
, ROLE_EMERGENCY
, ROLE_CAPTION
, ROLE_SIGN
.
For @Accessibility
we'd have ACCESSIBILITY_UNSET
, ACCESSIBILITY_ENHANCED_AUDIO_INTELLIGIBILITY
, ACCESSIBILITY_DESCRIPTION
, ACCESSIBILITY_CAPTION
, ACCESSIBILITY_SIGN
.
Definitions for all of these can be found in ISO 23009-1.
Currently we add the value to the Format only if it has urn:mpeg:dash:role:2011 as scheme, or urn:tva:metadata:cs:AudioPurposeCS:2007 - in this case the value can be only 1 or 2.
The scheme specific values should be mapped onto the appropriate IntDef
values, once those are defined.
I see, I am on it.
So for @Role we'd have ROLE_UNSET, ROLE_MAIN, ROLE_ALTERNATE, ROLE_SUPPLEMENTARY, ROLE_COMMENTARY, ROLE_DUB, ROLE_EMERGENCY, ROLE_CAPTION, ROLE_SIGN.
In ISO-23009-1:2014 - 5.8.5.5 DASH role scheme, I can't find ROLE_EMERGENCY
and ROLE_SIGN
. Where did you find those? Also I can't find the accessibility values.
However, the standard says this:
One Adaptation Set or one media content component may have assigned multiple roles even within the same scheme.
For both role and accessibility, should we have an int array inside just parsing the first one?
Ok, I found it in https://dashif.org/docs/DASH-IF-IOP-v4.2-clean.htm.
Currently we add the value to the Format only if it has urn:mpeg💨role:2011 as scheme, or urn:tva:metadata:cs:AudioPurposeCS:2007 - in this case the value can be only 1 or 2.
The scheme specific values should be mapped onto the appropriate IntDef values, once those are defined.
Not sure how to map value "1", "2" of urn:tva:metadata:cs:AudioPurposeCS:2007
to one of these ACCESSIBILITY_UNSET
, ACCESSIBILITY_ENHANCED_AUDIO_INTELLIGIBILITY
, ACCESSIBILITY_DESCRIPTION
, ACCESSIBILITY_CAPTION
, ACCESSIBILITY_SIGN
.
"1" stands for for the visually impaired, and "2" is hard of hearing.
I didn't find any docs which says what are the correspondent values of these in urn:mpeg:dash:role:2011
scheme. But I would map those like this:
"1" -> ACCESSIBILITY_ENHANCED_AUDIO_INTELLIGIBILITY
"2" -> ACCESSIBILITY_CAPTION
Do you think we can go with this?
In ISO-23009-1:2014 - 5.8.5.5 DASH role scheme, I can't find ROLE_EMERGENCY and ROLE_SIGN. Where did you find those? Also I can't find the accessibility values.
I'm looking in a draft third edition of 23009-1. "sign" is:
Visual media component representing a sign-language interpretation of an audio component.
"emergency" is:
Experience that provides information, about a current emergency, that is intended to enable the protection of life, health, safety, and property, and may also include critical details regarding the emergency and how to respond to the emergency.
For both role and accessibility, should we have an int array inside just parsing the first one?
I'd prefer to define the IntDef
s with flag = true
and then OR in more than one into the value, if present. I hope there will never be more than 32 of them... :). If we do this, I wonder whether we should bother having accessibility
and role
as separate fields at all, as opposed to a single roleFlags
field that can incorporate both. That would be more consistent with how urn:mpeg:dash:role:2011
is defined in 23009-1. It's weird that the DASH-IF doc splits its use across Role and Accessibility elements. In ExoPlayer's model I don't think we lose anything by merging them back together into a single roleFlags
field when parsing the manifest. What do you think? If we go with this approach, lets prefix all the constants with ROLE_FLAGS_
and call the int def RoleFlags
.
Not sure how to map value "1", "2" of urn:tva:metadata:cs:AudioPurposeCS:2007 to one of these
According to the ETSI document:
<Term termID="1">
<Name xml:lang="en">Audio description for the visually impaired</Name>
</Term>
<Term termID="2">
<Name xml:lang="en">Audio description for the hard of hearing</Name>
</Term>
So I think the mapping is "1" -> "description" and "2" -> "enhanced-audio-intelligibility". You could also map "3" -> "supplementary", "4" -> "commentary" and "6" -> "main". Which also kind of suggests we should have a single roleFlags
, rather than separating them out.
Let's move all further discussion to the pull request, to avoid splitting the conversation. Thanks!
PR #5578 was merged in, probably this can be closed?
Issue description
Request for enhancement Parse
<Accessibility>
tag of subtitle tracks from mpd (DASH), more specifically theAudioPurpose
part. Use case: display HOH (hard of hearing) suffix after the subtitle track's language in the chooser. Useful when you have two subtitle tracks in the same language, but one of them is HOH.Link to test content
We have some production streams, but trying to find a sample stream where this is present.
Version of ExoPlayer being used
I've searched whether this is already implemented or not in version 2.9.5. I didn't find support for this.
Note. If this request is accepted, then I would like to work on this and submit a pull request when it's done.