qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
27 stars 15 forks source link

Support processing HTML 5 template element content #75

Open rhdunn opened 3 years ago

rhdunn commented 3 years ago

The Problem

The HTML 5 specification introduces a template element [1], [2] where the content of that element doesn't represent children of it, but are part of a content property. The root node of the content property is a DocumentFragment which is a light-weight document node. These specifications provide some non-normative guidelines for interacting with XSLT and XPath [3].

The DocumentFragment interface is defined in the HTML DOM 4.1 [4] as an instance of a Node. Within the HTML 5 specification, it is only referenced in relation to the template element.

This affects the proposed fn:parse-html (issue #74) function as well as databases and query processors that support storing and accessing HTML5 content via fn:doc and other APIs.

Requirements

  1. Accurately represent the contents of the template element in the DOM/data model.
  2. Allow a conforming implementation to process the template content as if it was XML content -- i.e. using the child:: axis to access the content.
  3. Allow a conforming implementation to process the template content separately from child content -- e.g. if the implementation has support for the HTML DOM.
  4. Allow authors to select the content of a template element.
  5. Minimize changes to the data model specification. [*]

[*] I don't believe it is possible to support this without some changes to the data model (see the Design section below).

Design

Storing the content of the template

There are 3 options to handling the content of a template element.

1. As children

Store the content as child elements of the template element.

This is how conforming processors that only understand XML content will process and view the document.

2. As a document node

Store the content as children of a document node, where the parent of the document node is the template element.

This would be the minimal amount of changes needed to make the HTML5 model work. The only change I can see is that this won't conform to section 6.1.2 Accessors of the data model, in that:

dm:parent Returns the empty sequence

becomes:

dm:parent If this is a document fragment for a template element, returns the template element. Otherwise, returns the empty sequence.

Implementors using the HTML DOM would need to map DocumentFragment nodes to document-node().

3. As a new document-fragment node

Store the content as children of a new document-fragment node type, where the parent of the document-fragment node is the template element.

This is the option that is most compatible with the HTML DOM as it mirrors the DocumentFragment interface from that, but is also the one that is the most invasive. It will require (among other things):

  1. Defining rules in section 6. Nodes of the data model for Document Fragment Nodes -- accessors, construction from infoset and PSVI, and infoset mapping.
  2. Adding a new document-fragment() KindTest to the supported node/item types.
  3. Adding subtype-itemtype rules for the document fragment nodes.
  4. Adding a new document-fragment { ... } computed constructor for XQuery.

Selecting template content

A new forward axis should be added that supports selecting fragment nodes. Some of the possible names include:

  1. fragment:: -- following the pattern defined by the attribute:: axis; or
  2. content:: -- following the nomenclature from the HTML specification for the template element contents.

The behaviour will depend on which of the 3 options above is selected for storing the content type:

  1. If an implementation only supports XML (option 1), the new axis will work the same as child::. The principle node kind is element.
  2. If option 2 is chosen (reuse the document node), the new axis will match document nodes whose parent is a template element. The principle node kind is document. Note: This has an ambiguity with the reverse axes, as it is checking the parent of the node as well as the node type.
  3. If option 3 is chosen (create a document fragment node), the new axis will match any document fragment nodes. The principle node kind is document fragment. Note: This makes more sense when the fragment:: name is used for the axis, and would be more generally applicable, such as for computed constructor created fragments, or HTML DocumentFragments created from a JavaScript or web browser XPath/XSLT/XQuery binding such as Saxon-JS.

References

[1] https://www.w3.org/TR/html52/semantics-scripting.html#the-template-element [2] https://html.spec.whatwg.org/#the-template-element [3] https://www.w3.org/TR/html52/semantics-scripting.html#interaction-of-template-elements-with-xslt-and-xpath [4] https://www.w3.org/TR/dom41/#documentfragment

rhdunn commented 3 years ago

My preferences would be for a) adding a new document-fragment node type, and b) using fragment:: as the axis name.

martin-honnen commented 3 years ago

Why is a document fragment node needed? Isn't XQuery/XPath already able to parse e.g. parse-xml-fragment('<p>p1</p><p>p2</p>') into a document-node() with two element child nodes? I am not familiar enough with the HTML5 template syntax and semantics but fragments are already respresentable in the XDM, based on my knowledge of XSLT and XPath 2 and 3 and XQuery 3.

AlainCouthures commented 3 years ago

I still think that adding support for user-defined new axes in XML (http://lists.xml.org/archives/xml-dev/200802/msg00419.html) would be a great feature.

As in Fore (https://github.com/Jinntec/Fore), HTML5 templates is an interesting feature for XForms, too.

I would also appreciate an "action::" axis to separate children elements within an XForms control.

It could also be useful for meta data to be associated with nodes because we cannot always serialize them currently, such as the data type of an attribute content. It could be serialized like this: <attribute::duration>P1M<meta::xsi:type>xs:yearMonthDuration</meta::xsi:type></attribute::duration>

rhdunn commented 3 years ago

Why is a document fragment node needed? Isn't XQuery/XPath already able to parse e.g. parse-xml-fragment('<p>p1</p><p>p2</p>') into a document-node() with two element child nodes? I am not familiar enough with the HTML5 template syntax and semantics but fragments are already respresentable in the XDM, based on my knowledge of XSLT and XPath 2 and 3 and XQuery 3.

Using a document-node to represent this is detailed in the 2. As a document node option in this proposal. See also the references for the links to the relevant parts of the HTML/DOM specifications.

The issue is that fn:parse-html("<template><p>pT</p></template><p>p1</p><p>p2</p>") results in the following HTML tree:

#document
  html
    body
      template
        #document-fragment
          p
            #text "pT"
      p
        #text "p1"
      p
        #text "p2"

Here, there are 2 differences to the current data model:

  1. template's document fragment is not visible as a child element of the tree, but is a content property of the template element, so using $html//template/* should (on a HTML5 conforming processor) yield an empty sequence instead of the <p>pT</p> element.
  2. the document fragment node can have a parent node (what the HTML/DOM specifications call the host).

With option 2 (using a document-node() it requires modifying the data model so that dm:parent($html//template/document-node()) returns the template node and not (per spec) an empty sequence. It would be possible for an implementor to map the HTML DocumentFragment interface to a document-node, but would need some additional logic to make something like <template>{ document-node { <p>pT</p> } }</template> work as a Document is not a DocumentFragment.

With option 3 (using a new document-fragment() node, that is modelled on the DocumentFragment interface in the HTML DOM specification which is defined as a node and is distinct from a document-node() in that specification. As I describe in this proposal, this may make implementing the new axis cleaner and avoid the "it's both a forward axis (selecting the document fragment down in the tree) and a reverse axis (checking the parent node is a template node)" -- that is, a document-node would have the axis working like child::document-node()[./parent::template].

martin-honnen commented 3 years ago

BTW: At https://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!DOCTYPE%20html%3E%0A%3Ctemplate%3E%3Cp%3EpT%3C%2Fp%3E%3C%2Ftemplate%3E%3Cp%3Ep1%3C%2Fp%3E%3Cp%3Ep2%3C%2Fp%3E the tree is different from the one you show, it seems the template element ends up in a parser inserted head element.

rhdunn commented 3 years ago

Ah, yes, it is added to the head element if it occurs before any other content (https://html.spec.whatwg.org/#parsing-main-inhead), otherwise it is included in the body element, so fn:parse-html("<p>p1</p><template><p>pT</p></template><p>p2</p>") would place the template in the body element.

That DOM viewer does not include the document fragment, but if you inspect template examples in a web browser (e.g. from https://developer.mozilla.org/en-US/docs/Web/HTML/Element/template, or setting the address bar url to data:text/html,<p>p1</p><template><p>pT</p></template><p>p2</p>), it shows #document-fragment as the child of template and the content as a child of that document fragment. This is how browsers show the content property/data of the template element, but from the specification (referenced in the proposal and detailed in the MDN doc) the document fragment is accessed via the content property of the HTMLTemplate DOM element.

michaelhkay commented 3 years ago

I think it would be much simpler to solve this without data model or syntax changes. The template element in the XDM model should be given an @href attribute containing a system-generated URI, and a call on doc() supplying that URI should be guaranteed to return a document node representing the content of the template element.

rhdunn commented 3 years ago

@michaelhkay That could work for adding documents in a database. For the case of fn:parse-html and when the html file is bound to the context item (e.g. the input for an XSLT document), representing that is a bit more complex. Also, it makes things like fn:serialize and custom serialization logic more complex. -- A user would need to know that in the XSLT/XPath/XQuery context that there is a special href attribute.

rhdunn commented 3 years ago

I see how that would work for fn:parse-html and the context item, as those would create the system-generated URIs automatically and make them available via doc as you said. I think my point about fn:serialize is still valid, as that would need specialist logic to handle the document lookup, and if using an XML serialization may write out the template with the href attribute (especially if the html does not have a namespace).

It would complicate traversal, where instead of $html//template/content::p, you would need to do something like $html//template ! doc(./@html)/p.

I would also prefer to keep the generated data/content model as accurate to the HTML5 spec as possible. -- While it is unfortunate that they decided to make changes to XSLT/XPath without consulting the working group for those specifications, doing the same in reverse would only increase tensions/animosity between the two groups. It would also mean that web browsers are even less likely to support the mechanisms we provide, as they differ from the HTML specification which is more important to them than XSLT/XPath.

rhdunn commented 3 years ago

Another point -- doing something like $html//template//p/ancestor::body would not work with the separate document implementation, as the root element of the template document will not have the template as a parent node.

michaelhkay commented 3 years ago

Anything that involves changing the data model puts it pretty much out of scope as far as I'm concerned; it's far too disruptive and I don't think we'd get the spec finished, let alone implemented, in my lifetime. It's also far too specialised a requirement to justify doing that. Extending the HTML serialisation spec to handle the situation specially is far more feasible.

rhdunn commented 3 years ago

I'm happy to go with option 1 in that case (treating the template sub-items as children) for now, and deferring this to a future XSLT/XPath/XQuery version. That would mean that nothing needs changing for XSLT/XPath/XQuery 4.0 (including anything like a non-standard href attribute).

michaelhkay commented 1 year ago

I think my preferred approach here would be to prototype support in the form of SaxonJS extensions, and gain some field experience, before trying to enshrine anything in a language standard. I'm not convinced it's possible to do anything useful in XSLT to support this with considering the question of HTML5 support much more comprehensively - and that includes processing model (including event handling) as well as data model.

sashafirsov commented 1 year ago

template from HTML5 and XSLT are competing solutions which most likely would be mutually exclusive as providing similar kind of coding needs. Hence can be handled on own layers without interfering. Which makes the capability of interpreting the template by XSLT is a minority case if not completely imaginary.

There is definitely a niche of using template as either

For the 1st case HTML gives the usual namespaced XML inclusion, perhaps safeguarded by hidden attribute. As for last, the XSLT template can reside in own tag, also hidden: custom-element sample

For FO kind of transformation it is definitely a challenge as there is a need for browser simulation. The template simulation without shadow dom is not simple algorithm. Perhaps the JS work around can help. See the Light DOM API in css-chain as sample. JS source