w3c / ttml2

Timed Text Markup Language 2 (TTML2)
https://w3c.github.io/ttml2/
Other
41 stars 16 forks source link

LWSP in attribute and value expressions? #315

Closed palemieux closed 6 years ago

palemieux commented 7 years ago

I apparently missed the three day window on https://github.com/w3c/ttml2/pull/310.

As I understand it, and please correct me if I am wrong, expressions of the form rgba( 233, 1,0 ) are now permitted.

If so, this will require TTML1 implementations to be revised and tested, without stated benefits as far as I can tell.

skynavga commented 7 years ago

It turns out that XSL-FO and CSS2 permits this. Furthermore, this is consistent with behavior regarding LWSP appearing around delimiters that goes back to the original CSS2 YACC grammar. However, I'm open to having a discussion about this point.

palemieux commented 7 years ago

To minimize impact on implementations and divergence from TTML1, I would not introduce LWSP if another delimiter is already present between adjacent non-terminal components.

The TTML1 test file FontFamily009.ttml is the first TTML document I have seen with extra spaces added where another delimiter is already present between adjacent non-terminal components -- it is not clear if the space was intended or not.

palemieux commented 7 years ago

Furthermore, TTML1 explicitly states in 8.3 that in the syntax representations defined in this section, no linear whitespace (LWSP) is implied or permitted between tokens unless explicitly specified

TTML2 should do the same unless the absence of LWSP has proven an obstacle to the use of TTML, something I have seen no evidence of.

skynavga commented 7 years ago

@palemieux please review the language found in the preambles of TTML2 8.3, 9.3, 10.3, and 13.3; the restriction in these sections

andreastai commented 7 years ago

Although there are the benefit to get better compatibility with other specs, it could make (as @palemieux commented) a document that uses an otherwise unchanged feature of TTML1 non-conformant against TTML1. A TTML processor would possibly fail processing a document where whitespace are added around "non space" separator. If for example @tts:color is set to rgb(255 ,255,255) this should also fail validation (e.g. it fails validation by ttv modul of the Timed Text Toolkit), so the document may already be rejected during a QC check.

Apart from that, the affected features in TTML2 (as #color) could not have the same feature designator in TTML1 (but this seems currently be the case).

Taking the missing backward compatibility into account the change brings more disadvantages than advantages and should possibly be reverted.

skynavga commented 7 years ago

My position is that we should allow a less restricted syntax for whitespace in all COMMA separated lists in TTML2. Let's review what uses a comma separated list:

TTML1

In both these cases, it can be argued that lwsp is permitted, based on XSL-FO and CSS2 syntax; however, there is scant evidence that implementations support lwsp in tts:color, but there is evidence of support for lwsp in tts:fontFamily. As a consequence, TTML1 may be considered both ambiguous and inconsistent in treatment of lwsp around COMMA as a delimiter.

TTML2

In addition, TTML2 makes use of a number of SEMICOLON separated lists:

We need to ensure that we resolve any ambiguities about lwsp in these contexts, and we need to apply a single set of rules for consistency (and reduce special cases in code and test content).

nigelmegitt commented 7 years ago

We need to ensure that we resolve any ambiguities about lwsp in these contexts, and we need to apply a single set of rules for consistency (and reduce special cases in code and test content).

+1

nigelmegitt commented 7 years ago

This is somewhat related to #191 - one of the proposals I made there is to add feature designators for the support/non-support in documents/processors of additional lwsp. I'm not really delighted by that bifurcation but it would at least allow profiles and processors to be clear about what they do and do not deal with.

andreastai commented 7 years ago

We need to ensure that we resolve any ambiguities about lwsp in these contexts, and we need to apply a single set of rules for consistency (and reduce special cases in code and test content).

For sure it is not a question with a straight forward answer. To be more consistent is always good. To break existing implementations may be worse. To have an additional "whitespace" feature as @nigelmegitt suggests maybe a way out (depending on what exactly goes into it).

In any case the use of whitespace in comma separated value lists for attributes need to be clarified for TTML1.

@skynavga @nigelmegitt Do you also propose an Errata on this for TTML 1 and if yes, what would it say?

palemieux commented 7 years ago

We need to ensure that we resolve any ambiguities about lwsp in these contexts

TTML2 is not broken without LWSP around delimiters AFAIK, so I do not see a clear motivation to add them. The downside is significant amount of additional testing and a bifurcation with TTML1.

andreastai commented 7 years ago

Regarding the use of spaces in tts:fontFamily: If you take the following text from Section 8.2 in TTML...

Unless explicitly stated otherwise, linear white-space (LWSP) must appear between adjacent non-terminal components of a value of a TT Style or TT Style Extension Property value unless some other delimiter is permitted and used.

...I can see how you may come to the conclusion that if other delimiters are used (e.g. the COMMA) then spaces are not allowed (because they are neither required nor explicitly allowed).

Unfortunately the TTML1 spec fells short here and is not unambiguous. But regardless of the "legal" discussion it is clear that it led to different interpretations.

At least for tts:fontFamily I had always the assumption that space in comma separated lists of atomic values are allowed. Others had the same interpretation. So for example the TTML profile EBU-TT-D-Basic-DE has samples with spaces (page 3 and 16). According to this specifications the open source framework SCF also uses spaces (see the corresponding XSLT, Code line 66). As some broadcasters use this framework for the subtitles of their online playout there are already a lot of files with spaces in this attribute.

But also others had the same interpretation. After a first check I found for example the BBC Subtitle Guidelines where you can find an example with three font family names in one attribute and in addition to the COMMA separator there are spaces (28.5.1 tts:fontFamily, Document Requirements.

nigelmegitt commented 7 years ago

Thanks @tairt for the additional data points. I'm beginning to feel that there are enough examples of spaces around commas in tts:fontFamily that we cannot now prohibit them in TTML1 or TTML2.

How about this proposal:

?

palemieux commented 7 years ago

Re: TTML1, I see two options:

Feedback on both options should be sought on public-tt.

palemieux commented 7 years ago

in TTML2 we explicitly permit spaces around list delimiters.

Do you mean LWSP around commas of tts:fontFamily only, or something more?

nigelmegitt commented 7 years ago

Do you mean LWSP around commas of tts:fontFamily only, or something more?

My preference would be a single rule applicable across all style attributes, and possibly parameter attributes if applicable, but I could accept a per-attribute restriction if there's a good reason for it.

palemieux commented 7 years ago

but I could accept a per-attribute restriction if there's a good reason for it.

The reason is to avoid burdening implementers and creating confusion by introducing new requirements that do not solve an actual interoperability and/or use case.

The absence of LSWP between rgba components is not a documented problem AFAIK.

nigelmegitt commented 7 years ago

The requirement for no LWSP between rgba components is an interoperability issue with CSS.

CSS3 Color Module Level 3 specifically permits spaces:

White space characters are allowed around the numerical values.

It is a burden on document authors used to CSS to have to learn a new extra restriction that, for example, may prevent CSS color values from being pasted into TTML color style attributes. I expect there are more document authors than implementers, so I would rank their needs higher.

palemieux commented 7 years ago

It is a burden on document authors used to CSS to have to learn a new extra restriction that, for example, may prevent CSS color values from being pasted into TTML color style attributes. I expect there are more document authors than implementers, so I would rank their needs higher.

If compatibility with CSS is truly a requirement, then we have much bigger issues than LWSP since we will need to examine every new feature in that light.

Compatibility with CSS has not been an issue re: rgba values so far. I suggest we do not make it an issue now.

skynavga commented 7 years ago

On Wed, May 24, 2017 at 6:56 AM, Andreas Tai notifications@github.com wrote:

Regarding the use of spaces in tts:fontFamily: If you take the following text from Section 8.2 in TTML...

Unless explicitly stated otherwise, linear white-space (LWSP) must appear between adjacent non-terminal components of a value of a TT Style or TT Style Extension Property value unless some other delimiter is permitted and used.

...I can see how you may come to the conclusion that if other delimiters are used (e.g. the COMMA) then spaces are not allowed (because they are neither required nor explicitly allowed).

Unfortunately the TTML1 spec fells short here and is not unambiguous. But regardless of the "legal" discussion it is clear that it led to different interpretations.

At least for tts:fontFamily I had always the assumption that space in comma separated lists of atomic values are allowed. Others had the same interpretation. So for example the TTML profile EBU-TT-D-Basic-DE http://www.irt.de/webarchiv/showdoc.php?z=NjMzOSMxMDA2MDE4I3BkZg== has samples with spaces (page 3 and 16). According to this specifications the open source framework SCF also uses spaces (see the corresponding XSLT https://github.com/IRT-Open-Source/scf/blob/master/modules/EBU-TT-D2EBU-TT-D-Basic-DE/EBU-TT-D2EBU-TT-D-Basic-DE.xslt, Code line 66). As some broadcasters use this framework for the subtitles of their online playout there are already a lot of files with spaces in this attribute.

But also others had the same interpretation. After a first check I found for example the BBC Subtitle Guidelines where you can find an example with three font family names in one attribute and in addition to the COMMA separator there are spaces (28.5.1 tts:fontFamily, Document Requirements http://bbc.github.io/subtitle-guidelines/#tts-fontFamily.

SKYNAV has always assumed whitespace is permitted around COMMA in tts:fontFamily, which is born out by at least four implementations and numerous test files. However, as I reported previously, we did not make that assumption about tts:color, even though in retrospect we see it is permitted by the same grammar constructs in XSL-FO and CSS2.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/ttml2/issues/315#issuecomment-303715214, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXCb68q5BJWBPflDT4BIXrtv-NfOAx_ks5r9CjpgaJpZM4NdXR5 .

skynavga commented 7 years ago

On Wed, May 24, 2017 at 8:42 AM, Nigel Megitt notifications@github.com wrote:

Thanks @tairt https://github.com/tairt for the additional data points. I'm beginning to feel that there are enough examples of spaces around commas in tts:fontFamily that we cannot now prohibit them in TTML1 or TTML2.

How about this proposal:

  • in TTML1 we add an informative note saying that both interpretations exist, and

Agreed.

  • in TTML2 we explicitly permit spaces around list delimiters.

There is a somewhat broader question here, which is whether to extend this to all COMMA and SEMICOLON separated lists, including, in particular tts:color. As I have pointed out, we are going from two contexts of use in TTML1 to nine contexts of use in TTML2 (as currently drafted). My position is that we should have a consistent approach in TTML2 meaning that whitespace be allowed around delimiters in all of these nine contexts, including tts:color.

?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/ttml2/issues/315#issuecomment-303745405, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXCb2NuPkysa0Amn_EkGjyf7-gRsGABks5r9EHFgaJpZM4NdXR5 .

skynavga commented 7 years ago

On Wed, May 24, 2017 at 9:01 AM, Pierre-Anthony Lemieux < notifications@github.com> wrote:

Re: TTML1, I see two options:

-

forbid LWSP around commas in document (SHALL NOT be present), but recommend that processors accept them (SHOULD accept them). The idea is encourage the authoring of most compatible documents, but encourage processors to be tolerant given the existence of documents with LWSP in them

this is clearly not feasible since it would invalidate numerous existing TTML1 documents, i.e., would make conforming documents non-conforming

-

explicitly permit LWSP around commas of tts:fontFamily components only, and recommend not using them (SHOULD NOT be present) since some implementations might not accept them

yes to the first part, no to the second; I prefer Nigel's suggestion, to add note indicating both types of implementation exist; further, I would suggest going further and recommend (in a note) that content processors accept whitespace here even though the original specification was ambiguous

-

Feedback on both options should be sought on public-tt.

Presumably, readers of public-tt already have visibility to this conversation and can chime in at any time.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/ttml2/issues/315#issuecomment-303751632, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXCb0tdZTNChPVGJAW89Bk2e_6hukIYks5r9EZIgaJpZM4NdXR5 .

skynavga commented 7 years ago

On Wed, May 24, 2017 at 9:49 AM, Nigel Megitt notifications@github.com wrote:

The requirement for no LWSP between rgba components is an interoperability issue with CSS.

CSS3 Color Module Level 3 https://www.w3.org/TR/css3-color/#rgb-color specifically permits spaces:

White space characters are allowed around the numerical values.

It is a burden on document authors used to CSS to have to learn a new extra restriction that, for example, may prevent CSS color values from being pasted into TTML color style attributes. I expect there are more document authors than implementers, so I would rank their needs higher.

Agreed

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/ttml2/issues/315#issuecomment-303766521, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXCb6SnwzXgSos-dfey1AtNLxPoFx2iks5r9FF1gaJpZM4NdXR5 .

palemieux commented 7 years ago

the same grammar constructs in XSL-FO and CSS2.

Let's be factual here: they are not the same.

CSS2 has an extensive grammar and lexical scanner specifications Appendix D, there is no equivalent in TTML1.

They are perhaps similar.

skynavga commented 7 years ago

On Wed, May 24, 2017 at 11:38 AM, Pierre-Anthony Lemieux < notifications@github.com> wrote:

the same grammar constructs in XSL-FO and CSS2.

Let's be factual here: they are not the same.

What is your point here? Of course they are not the same documents. But TTML1 makes normative reference to XSL-FO and is clearly based on certain sub-grammars of XSL-FO, which is in turn, based on sub-grammars of CSS2

  1. The syntax of rgb() expressions is one such sub-grammar.

CSS2 has an extensive grammar and lexical scanner specifications Appendix D https://www.w3.org/TR/REC-CSS2/grammar.html, there is no equivalent in TTML1.

So what? It makes direct or indirect reference to Appendix D.

They are perhaps similar.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/ttml2/issues/315#issuecomment-303797399, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXCb2ZV6U2ZlOwMcvI4bQJK6z5vAol8ks5r9GsTgaJpZM4NdXR5 .

cconcolato commented 6 years ago

In discussing this with @palemieux and @skynavga, we recommend the group to consider the following:

skynavga commented 6 years ago

@cconcolato suggests that when incorporating whitespace syntax rules into 2.3 that we should sub-divide 2.3 into multiple subsections since 2.3 is getting rather long;

css-meeting-bot commented 6 years ago

The Working Group just discussed LWSP in rgba expressions? ttml2#315, and agreed to the following resolutions:

The full IRC log of that discussion <nigel> Topic: LWSP in rgba expressions? ttml2#315
<nigel> github: https://github.com/w3c/ttml2/issues/315
<nigel> Cyril: I'll create an issue for TTML1 for the equivalent of this, in color.
<nigel> .. It's w3c/ttml1#322.
<nigel> Nigel: When you say "allow lwsp everywhere", what exactly is "everywhere"? Is it all tta, ttp and tts attributes?
<nigel> Glenn: Pretty much, yes. I'm not proposing to allow lwsp between value names and ( for example.
<nigel> Chris: That matches what is in CSS too.
<nigel> Glenn: I should change `<color>` to combine the `"rgb" "("` into `"rgb("` and `"rgba" "("` into `"rgba("`
<nigel> Nigel: In `<number>`, `<non-negative-number>` and `<percentage>` there shouldn't be any lwsp allowed.
<nigel> Chris: Implementers prefer no space between a number and %.
<nigel> Nigel: So that applies between number and pitch-units in `<pitch>` too.
<nigel> Chris: Yes
<nigel> Cyril: It may be clearer and easier to put `<lwsp>` explicitly everywhere it is allowed.
<nigel> .. We should do that.
<nigel> Glenn: I may have to accept that.
<nigel> Nigel: `<position>` already does it.
<nigel> Glenn: `<condition>` does too.
<nigel> Glenn: There are some animation value expressions too that need it.
<nigel> RESOLUTION: Specify explicitly where lwsp is permitted.
nigelmegitt commented 6 years ago

Note that a partial change was made that does not address this issue, in #525.