w3c / imsc

TTML Profiles for Internet Media Subtitles and Captions (IMSC)
https://w3c.github.io/imsc/
Other
31 stars 17 forks source link

foreign namespace usage is underspecified #213

Closed mikedo closed 6 years ago

mikedo commented 7 years ago

The only conformance text on the subject is:

6.2 Foreign Element and Attributes A Document Instance MAY contain elements and attributes that are neither specifically permitted nor forbidden by a profile. A transformation processor SHOULD preserve such elements or attributes whenever possible.

G and H have some brief informative material.

The XML schemas are informative and make various assumptions not traceable to the spec.

FYI, TTML has no conformance language like either the above or the following additional provisions that I believe are needed for document interop when extending it with foreign namespaces.

  1. Where exactly can they be used? Everywhere? a. Specifically for , do they follow, lead, or can they be placed between any elements?
  2. What namespaces are permitted? Is ##other assumed or ##any?
  3. What processContents setting is to be used(this affects the validation behavior on future schemas and extended documents)?
  4. (probably other things I have not thought about like union)
skynavga commented 7 years ago

On Fri, Feb 10, 2017 at 11:59 AM, Michael A Dolan notifications@github.com wrote:

The only conformance text on the subject is:

6.2 Foreign Element and Attributes A Document Instance MAY contain elements and attributes that are neither specifically permitted nor forbidden by a profile. A transformation processor SHOULD preserve such elements or attributes whenever possible.

G and H have some brief informative material.

The XML schemas are informative and make various assumptions not traceable to the spec.

FYI, TTML has no conformance language like either the above or the following additional provisions that I believe are needed for document interop when extending it with foreign namespaces.

  1. Where exactly can they be used? Everywhere? a. Specifically for , do they follow, lead, or can they be placed between any elements?

I believe the current specification is clear. They can be placed anywhere. That is, foreign namespace elements can appear anywhere as long as they are a descendant of tt:tt (and not the root). And foreign namespace attributes can appear on any element

I say this because TTML content validity is assessed only after pruning foreign namespaces as defined in [1][2].

[1] https://www.w3.org/TR/ttml1/#conformance-content [2] https://www.w3.org/TR/ttml1/#doctypes

  1. What namespaces are permitted? Is ##other assumed or ##any?

Any element in any namespace as long as they are "not members of the collection of element types defined by the associated Abstract Document Type".

Any attribute in any namespace as long as "the namespace URI of the expanded names are not listed in Table 1 – Namespaces https://www.w3.org/TR/ttml1/#namespace-vocab-table".

  1. What processContents setting is to be used(this affects the validation behavior on future schemas and extended documents)?

Undefined. Schemas are informative, but note under [3] the following:

Note:

The schemas referenced by this specification do not validate all syntactic constraints defined by this specification, and, as such, represent a superset of conformant TTML Content. In particular, performing validation with one of the above referenced schemas may result in a false positive indication of validity. For example, both the RNC and XSD schemas specify that a tts:fontFamily attribute must satisfy the xs:string XSD data type; however, this data type is a superset of the values permitted to be used with the tts:fontFamily attribute.

In addition, the RNC schema may produce a false negative indication of validity when using the xml:id attribute with an element in a foreign namespace, thus representing a subset of conformant TTML Content. This is due to a specific limitation in expressing wildcard patterns involving xsd:ID typed attributes in Relax NG schemas. Note that this specification defines the formal validity of a Document Instance to be based on an Abstract Document Instance from which all foreign namespace elements and attributes have been removed. Therefore, the exceptional reporting of this false negative does not impact the formal assessment of Document Instance validity. [3] https://www.w3.org/TR/ttml1/#ttml-content-doctype

  1. (probably other things I have not thought about like union)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/imsc/issues/213, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXCbyoMSaNsAnPv2J4_V9mjCJ0IwwmPks5rbLN8gaJpZM4L9uz_ .

mikedo commented 7 years ago

The cited text from TTML at best hints that foreign namespaces were contemplated. There is zero conformance language to support their inclusion in TTML. And, if it is already clearly enabled in TTML, then there would have been no need for the existing IMSC1 conformance language I cited. And, the fact that there is a disclaimer about XML Schemas being imperfect for validation, although true, is not relevant. This issue is about where foreign namespaces can be placed and what happens to derivative work schema processing when IMSC1 is extended (processContents affects the addition of foreign namespace extensions, not the base TTML and IMSC1). All this said, I suspect we may agree on the final answer. I will try to draft some specific "clarifying" text to test that theory...

nigelmegitt commented 7 years ago

In TTML1 there is conformance language in §3.1, §3.2 and §4, and also relevant conformance statements in §4.1:

  1. pruning all element information items whose names are not members of the collection of element types defined by the associated Abstract Document Type, then

and

  1. pruning all attribute information items having expanded names such that the namespace URI of the expanded names are not listed in Table 1 – Namespaces,

Together this means that when processing documents foreign namespace elements and attributes are ignored.

There's something much more obvious for attributes which is that all the elements contain in the description of permitted attributes the following text:

{any attribute not in default or any TT namespace}

I'm not convinced that any further clarification is necessary, but maybe I haven't fully understood the issue that @mikedo is raising.

mikedo commented 7 years ago

By "conformance" statements, I mean sentences with the conformance terms, (e.g. "shall, should, or may") to enable foreign namespaces. The cites that you and Glenn keep providing are about instance document conformance steps including the removal of foreign namespaces, only inferring foreign namespaces could exist, not enabling them. Where does it say: "Foreign namespace elements may be present anywhere (including every position in a )"? In fact the situation is worse than this since there is one instance where a foreign element is explicitly permitted (metadata, 12.1.1) thus more clearly communicating that they are not, in fact, permitted elsewhere. I simply don't see how a reader could infer that foreign namespace elements can be used everywhere..

Yes, foreign attributes are clearly permitted where that statement is present. I include attributes in this discussion since they are affected by processContents. Sorry for not being more clear.

skynavga commented 7 years ago

On Mon, Feb 13, 2017 at 12:02 PM, Michael A Dolan notifications@github.com wrote:

By "conformance" statements, I mean sentences with the conformance terms, (e.g. "shall, should, or may") to enable foreign namespaces.

If some expression is syntactically permitted, and it is not explicitly excluded with a shall not or should not, then it is in fact permitted (syntactically).

TTML1 content conformance is defined in [1], and validity is defined in the following step of that section:

  1. The Reduced XML Infoset that corresponds to the Document Instance is a Valid Abstract Document Instance of the associated Abstract Document Type.

[1] https://www.w3.org/TR/ttml1/#conformance-content

Because all foreign elements and foreign attributes have been pruned from the Document Instance [2] by the time this language applies, then nothing need be said either way about their presence in a concrete representation of an Abstract Document Instance.

[2] https://www.w3.org/TR/ttml1/#doctypes

Note well that TTML1 (and TTML2) do not mandate any particular concrete representation (encoding) of a TTML document instance [3].

You are effectively asking for conformance language about such a concrete representation, which has been historically outside the scope of definition of TTML.

[3] https://www.w3.org/TR/ttml1/#concrete-encoding

The cites that you and Glenn keep providing are about instance document conformance steps including the removal of foreign namespaces, only inferring foreign namespaces could exist, not enabling them. Where does it say: "Foreign namespace elements may be present anywhere (including every position in a )"? In fact the situation is worse than this since there is one instance where a foreign element is explicitly permitted (metadata, 12.1.1) thus more clearly communicating that they are not, in fact, permitted elsewhere. I simply don't see how a reader could infer that foreign namespace elements can be used everywhere..

Yes, foreign attributes are clearly permitted where that statement is present. I include attributes in this discussion since they are affected by processContents. Sorry for not being more clear.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/imsc/issues/213#issuecomment-279488496, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXCbxQQLNGWqrY8Y4AaHa2tCEal1uXWks5rcKjXgaJpZM4L9uz_ .

nigelmegitt commented 7 years ago

TTWG Meeting 2017-02-23: Agreed that @mikedo will draft a proposed informative paragraph explaining that foreign namespace elements are permitted on content elements.

palemieux commented 7 years ago

I had the opportunity to review TTML1 following the telecon.

I am now thinking TTML1 is unambiguous: the XML Representation and prose for content elements clearly indicate that {any attribute not in default or any TT namespace} are permitted but that children element are within a very specific set, e.g. Metadata.class*, Animation.class*, div* for body. Only metadata allows ({any element in TT Metadata namespace}|{any element not in any TT namespace})*.

This is compatible with extension attributes and metadata elements introduced by SMPTE-TT and EBU-TT-D and IMSC1, AFAIK.

[EDIT: There is apparently an error in ST 2052-1 that implies that smpte:information is a direct child of head]

If anything, this means that the prose in IMSC1 (A Document Instance MAY contain elements and attributes that are neither specifically permitted nor forbidden by a profile ) is perhaps misleading since it implies that foreign elements can be present anywhere.

nigelmegitt commented 7 years ago

I have also reviewed TTML1 and IMSC, and have lead myself down the following logical path (which I am happy to be challenged on!):

TTML1 §4 says that the first step in establishing validity is to prune element information items not in the collection of elements defined by the associated abstract document type.

The abstract document type for, say, the tt element or the head element, excludes elements defined in non-TTML namespaces. Therefore such elements are not in the collection of elements and are pruned for validity checking. This is not a prohibition of them being present, just a defined behaviour (prune them) when they are present.

The metadata element differs however in that those "foreign" namespace elements are included in the collection of elements in the abstract document type and are therefore not pruned. However it is left unstated how their validity should be assessed. The clause in §4 "the descendants of the document element satisfy their respective element type's content specifications," suggests that foreign namespace element types have some content specification somewhere but goes no further, which is probably reasonable. It does mean that validity checking is required on foreign namespace elements if and only if they are descendants of metadata elements.

Presumably any validator should inform when pruning foreign namespace element content but not warn about it, since it is excluded from the validity checking, but should warn about non-pruned foreign namespace elements (descendants of metadata) for which there is no content specification and therefore no validity checking is possible.

Presumably also any presentation processor will also ignore foreign namespace elements and also any metadata elements, since they are not expected to affect presentation semantics.

Now in the case of IMSC1 we are no longer talking about an abstract document instance but a concrete encoding. So the question becomes: should we permit within a concretely encoded document foreign namespace elements? And secondarily, if we do permit them, what should a processor do?

On the first question I am a little torn each way.

On the "yes, permit foreign namespace elements anywhere" side:

On the "no, only permit foreign namespace elements under metadata" side:

I'm swayable on this but on balance it seems like the best thing is to continue with the existing phrasing in IMSC 1 §6.2 Foreign Elements and Attributes, i.e. make no substantive change, but possibly to strengthen it by adding some clarification points, in notes or normative text:

An example of the freedom referred to in the last bullet is if someone wants to add metadata to control the block progression scrolling of lines of subtitles that are added word by word, as opposed to making it a system setting. They could do so in principle by adding foreign namespace elements. I would not support defining them in the TTML or IMSC 1 specs however. This freedom of interpretation is explicitly called out in https://www.w3.org/TR/ttml1/#semantics-smooth-scrolling-recommendation by the way.

skynavga commented 7 years ago

Allowing metadata content to affect presentation is the beginning of a very slippery slope. I strongly discourage this. I would argue that any foreign elements or attributes that do affect presentation (in some processor) should not place those elements/attributes in a metadata wrapper or metadata namespace.

On Wed, Mar 1, 2017 at 6:42 PM, Nigel Megitt notifications@github.com wrote:

I have also reviewed TTML1 and IMSC, and have lead myself down the following logical path (which I am happy to be challenged on!):

TTML1 §4 says that the first step in establishing validity is to prune element information items not in the collection of elements defined by the associated abstract document type.

The abstract document type for, say, the tt element or the head element, excludes elements defined in non-TTML namespaces. Therefore such elements are not in the collection of elements and are pruned for validity checking. This is not a prohibition of them being present, just a defined behaviour (prune them) when they are present.

The metadata element differs however in that those "foreign" namespace elements are included in the collection of elements in the abstract document type and are therefore not pruned. However it is left unstated how their validity should be assessed. The clause in §4 "the descendants of the document element satisfy their respective element type's content specifications," suggests that foreign namespace element types have some content specification somewhere but goes no further, which is probably reasonable. It does mean that validity checking is required on foreign namespace elements if and only if they are descendants of metadata elements.

Presumably any validator should inform when pruning foreign namespace element content but not warn about it, since it is excluded from the validity checking, but should warn about non-pruned foreign namespace elements (descendants of metadata) for which there is no content specification and therefore no validity checking is possible.

Presumably also any presentation processor will also ignore foreign namespace elements and also any metadata elements, since they are not expected to affect presentation semantics.

Now in the case of IMSC1 we are no longer talking about an abstract document instance but a concrete encoding. So the question becomes: should we permit within a concretely encoded document foreign namespace elements? And secondarily, if we do permit them, what should a processor do?

On the first question I am a little torn each way.

On the "yes, permit foreign namespace elements anywhere" side:

  • permitting foreign namespace elements allows for "mix ins" where that's deemed appropriate, which seems like a helpful thing (and is something that I would in general like to support, e.g. the use of EmotionML to add more information about elements, which could be in or out of the metadata space);
  • there is one data point of an example where this is explicitly specified, in SMPTE-TT. The SMPTE ST2052-1-2010 specification defines a smpte:information element that is required to be a child of the head element. So saying yes means it may be possible to create SMPTE-TT documents with this element that are also conformant IMSC1 documents.
  • since IMSC 1 already says "A Document Instance may contain elements and attributes that are neither specifically permitted nor forbidden by a profile." it follows that conformant processors should already deal with this scenario.

On the "no, only permit foreign namespace elements under metadata" side:

  • there may be (non-conformant?) IMSC 1 processors out there that will fail on unexpected content;
  • there are certainly profiles that prohibit it explicitly, such as EBU-TT and EBU-TT-D;
  • it does not preclude conformant processors from being forgiving by pruning or ignoring foreign namespace elements elsewhere if they find them.
  • it is slightly simpler to implement since no pruning is required.

I'm swayable on this but on balance it seems like the best thing is to continue with the existing phrasing in IMSC 1 §6.2 Foreign Elements and Attributes, i.e. make no substantive change, but possibly to strengthen it by adding some clarification points, in notes or normative text:

  • Foreign elements and attributes are excluded from validation as IMSC 1 document by processors except when present as descendants of the metadata element.
  • When validating foreign namespace element descendants of the metadata element a content specification should be provided for validation purposes.
  • When validating foreign namespace attributes a content specification should be provided for validation purposes.
  • Foreign namespace elements and attributes shall be ignored by presentation processors for the purpose of providing strictly conformant IMSC 1 presentation processing semantics; they may be used to modify the presentation where IMSC 1 presentation processing semantics allow for such freedom.

An example of the freedom referred to in the last bullet is if someone wants to add metadata to control the block progression scrolling of lines of subtitles that are added word by word, as opposed to making it a system setting. They could do so in principle by adding foreign namespace elements. I would not support defining them in the TTML or IMSC 1 specs however. This freedom of interpretation is explicitly called out in https://www.w3.org/TR/ttml1/#semantics-smooth-scrolling-recommendation by the way.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/imsc/issues/213#issuecomment-283529774, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXCbyKY12uM2jNSko81S4gPTX1_BxxWks5rhh5ugaJpZM4L9uz_ .

mikedo commented 7 years ago

In principle, I agree with Glenn. That said, I (still) believe that TTML1 is quite clear that only metadata elements can contain foreign namespace elements. I do not subscribe to the position that silence is permission when the TTML1 spec is so very, very prescriptive on this topic. There are no known extensions by TTWG or others that violate this today and we should stay the course, clarifying it in IMSC1. I look forward to continuing this live in 23 minutes...

mikedo commented 7 years ago

With s/children/descendants, it is not universally true. This explicitly enables nested use of foreign namespaces within foreign namespaces. This may or may not be true depending on each foreign namespace that is defined. For example, METADATA/FOREIGN1:META/FOREIGN2:META is only valid if FORIEGN1 itself permits foreign namespaces.

I'm a bit unclear what this proposal fixes, but in any event, it is now too broad I think without further qualifying text.

palemieux commented 7 years ago

@mikedo See revised PR

nigelmegitt commented 6 years ago

Following discussion in F2F today reopening since the group consensus now appears to be that foreign namespace elements are pruned prior to validation processing and therefore are in fact permitted anywhere by TTML. See w3c/ttml1#251.