w3c / ttml2

Timed Text Markup Language 2 (TTML2)
https://w3c.github.io/ttml2/
Other
40 stars 16 forks source link

Allow markup in metadata element contents, e.g. for direction #1266

Open nigelmegitt opened 12 months ago

nigelmegitt commented 12 months ago

See w3c/dapt#164 - metadata elements whose content is specified only as #PCDATA cannot contain markup, yet it would be an improvement to permit it, for internationalisation. It is possible to use Unicode control characters to signify changes in direction (ltr/rtl) within metadata element contents, but not to signify other styling changes, e.g. ruby, color etc. An ability to do that could be useful in e.g. authoring environments where authors need to make notes - some editing tools allow rich text to be used.

My suggestion would be to permit limited versions of <span> and <br>, but not timings, and therefore not animations. It seems like unnecessary complexity to include <div> and <p>, but I could be persuaded otherwise.

See also https://www.w3.org/TR/international-specs/#bidi_inline_change for the motivation behind allowing markup; in particular, if/while we do not allow markup, we should require support for Unicode bidirectional control characters in elements whose character content is only #PCDATA; no attributes appear to be affected by this constraint since they do not contain unconstrained text content that needs to support different direction scripts.

skynavga commented 12 months ago

IMO, the content specification for <metadata/> should not be expanded from #PCDATA to mixed content since it breaks downward compatibility going back to TTML 1. An alternative that doesn't suffer from this problem would be to introduce a new element type, e.g., <styledMetadata/>, with an expanded content model that supports non-plaintext markup.

nigelmegitt commented 12 months ago

Thanks @skynavga - just to check I understand, breaking downward compatibility with TTML1 would occur at the parser level because a parser expecting TTML1 document structure would not know how to process this non-PCDATA content? In other words it is not just a vocabulary issue, e.g. not knowing an attribute or element name, it's a "not expecting a tag at all here" issue?

I see that the (non-normative) XSD specifies that the ttm:desc element type is mixed content, at https://github.com/w3c/ttml2/blob/63d7c2f3f9dc2edd5f669a846b4df85dbf3ab632/spec/xsd/ttml2-metadata-items.xsd#L57 so at least from an XSD perspective, a parser should not fall over here. Our RNC schema is a bit clearer that nothing other than text is expected - see https://github.com/w3c/ttml2/blob/63d7c2f3f9dc2edd5f669a846b4df85dbf3ab632/spec/rnc/ttml2-metadata-items.rnc#L54-L66

skynavga commented 12 months ago

In other words it is not just a vocabulary issue, e.g. not knowing an attribute or element name, it's a "not expecting a tag at all here" issue?

Correct.

Re: XSD vs RNC, yes that is a discrepancy. I would have to do some historical research to attempt to learn how that happened. I do know that we (I) have not paid as much attention to the RNC schema as the XSD schema as we have evolved the language.