Open rhdunn opened 3 years ago
My preferences would be for a) adding a new document-fragment
node type, and b) using fragment::
as the axis name.
Why is a document fragment node needed? Isn't XQuery/XPath already able to parse e.g. parse-xml-fragment('<p>p1</p><p>p2</p>')
into a document-node()
with two element child nodes? I am not familiar enough with the HTML5 template syntax and semantics but fragments are already respresentable in the XDM, based on my knowledge of XSLT and XPath 2 and 3 and XQuery 3.
I still think that adding support for user-defined new axes in XML (http://lists.xml.org/archives/xml-dev/200802/msg00419.html) would be a great feature.
As in Fore (https://github.com/Jinntec/Fore), HTML5 templates is an interesting feature for XForms, too.
I would also appreciate an "action::" axis to separate children elements within an XForms control.
It could also be useful for meta data to be associated with nodes because we cannot always serialize them currently, such as the data type of an attribute content. It could be serialized like this:
<attribute::duration>P1M<meta::xsi:type>xs:yearMonthDuration</meta::xsi:type></attribute::duration>
Why is a document fragment node needed? Isn't XQuery/XPath already able to parse e.g.
parse-xml-fragment('<p>p1</p><p>p2</p>')
into adocument-node()
with two element child nodes? I am not familiar enough with the HTML5 template syntax and semantics but fragments are already respresentable in the XDM, based on my knowledge of XSLT and XPath 2 and 3 and XQuery 3.
Using a document-node
to represent this is detailed in the 2. As a document node option in this proposal. See also the references for the links to the relevant parts of the HTML/DOM specifications.
The issue is that fn:parse-html("<template><p>pT</p></template><p>p1</p><p>p2</p>")
results in the following HTML tree:
#document
html
body
template
#document-fragment
p
#text "pT"
p
#text "p1"
p
#text "p2"
Here, there are 2 differences to the current data model:
content
property of the template element, so using $html//template/*
should (on a HTML5 conforming processor) yield an empty sequence instead of the <p>pT</p>
element.With option 2 (using a document-node()
it requires modifying the data model so that dm:parent($html//template/document-node())
returns the template
node and not (per spec) an empty sequence. It would be possible for an implementor to map the HTML DocumentFragment
interface to a document-node
, but would need some additional logic to make something like <template>{ document-node { <p>pT</p> } }</template>
work as a Document
is not a DocumentFragment
.
With option 3 (using a new document-fragment()
node, that is modelled on the DocumentFragment
interface in the HTML DOM specification which is defined as a node and is distinct from a document-node()
in that specification. As I describe in this proposal, this may make implementing the new axis cleaner and avoid the "it's both a forward axis (selecting the document fragment down in the tree) and a reverse axis (checking the parent node is a template
node)" -- that is, a document-node would have the axis working like child::document-node()[./parent::template]
.
BTW: At https://software.hixie.ch/utilities/js/live-dom-viewer/?%3C!DOCTYPE%20html%3E%0A%3Ctemplate%3E%3Cp%3EpT%3C%2Fp%3E%3C%2Ftemplate%3E%3Cp%3Ep1%3C%2Fp%3E%3Cp%3Ep2%3C%2Fp%3E the tree is different from the one you show, it seems the template
element ends up in a parser inserted head
element.
Ah, yes, it is added to the head element if it occurs before any other content (https://html.spec.whatwg.org/#parsing-main-inhead), otherwise it is included in the body element, so fn:parse-html("<p>p1</p><template><p>pT</p></template><p>p2</p>")
would place the template in the body element.
That DOM viewer does not include the document fragment, but if you inspect template examples in a web browser (e.g. from https://developer.mozilla.org/en-US/docs/Web/HTML/Element/template, or setting the address bar url to data:text/html,<p>p1</p><template><p>pT</p></template><p>p2</p>
), it shows #document-fragment
as the child of template
and the content as a child of that document fragment. This is how browsers show the content property/data of the template
element, but from the specification (referenced in the proposal and detailed in the MDN doc) the document fragment is accessed via the content
property of the HTMLTemplate DOM element.
I think it would be much simpler to solve this without data model or syntax changes. The template element in the XDM model should be given an @href attribute containing a system-generated URI, and a call on doc() supplying that URI should be guaranteed to return a document node representing the content of the template element.
@michaelhkay That could work for adding documents in a database. For the case of fn:parse-html
and when the html file is bound to the context item (e.g. the input for an XSLT document), representing that is a bit more complex. Also, it makes things like fn:serialize
and custom serialization logic more complex. -- A user would need to know that in the XSLT/XPath/XQuery context that there is a special href attribute.
I see how that would work for fn:parse-html
and the context item, as those would create the system-generated URIs automatically and make them available via doc
as you said. I think my point about fn:serialize
is still valid, as that would need specialist logic to handle the document lookup, and if using an XML serialization may write out the template with the href attribute (especially if the html does not have a namespace).
It would complicate traversal, where instead of $html//template/content::p
, you would need to do something like $html//template ! doc(./@html)/p
.
I would also prefer to keep the generated data/content model as accurate to the HTML5 spec as possible. -- While it is unfortunate that they decided to make changes to XSLT/XPath without consulting the working group for those specifications, doing the same in reverse would only increase tensions/animosity between the two groups. It would also mean that web browsers are even less likely to support the mechanisms we provide, as they differ from the HTML specification which is more important to them than XSLT/XPath.
Another point -- doing something like $html//template//p/ancestor::body
would not work with the separate document implementation, as the root element of the template document will not have the template as a parent node.
Anything that involves changing the data model puts it pretty much out of scope as far as I'm concerned; it's far too disruptive and I don't think we'd get the spec finished, let alone implemented, in my lifetime. It's also far too specialised a requirement to justify doing that. Extending the HTML serialisation spec to handle the situation specially is far more feasible.
I'm happy to go with option 1 in that case (treating the template sub-items as children) for now, and deferring this to a future XSLT/XPath/XQuery version. That would mean that nothing needs changing for XSLT/XPath/XQuery 4.0 (including anything like a non-standard href attribute).
I think my preferred approach here would be to prototype support in the form of SaxonJS extensions, and gain some field experience, before trying to enshrine anything in a language standard. I'm not convinced it's possible to do anything useful in XSLT to support this with considering the question of HTML5 support much more comprehensively - and that includes processing model (including event handling) as well as data model.
template
from HTML5 and XSLT are competing solutions which most likely would be mutually exclusive as providing similar kind of coding needs. Hence can be handled on own layers without interfering. Which makes the capability of interpreting the template
by XSLT is a minority case if not completely imaginary.
There is definitely a niche of using template
as either
For the 1st case HTML gives the usual namespaced XML inclusion, perhaps safeguarded by hidden
attribute.
As for last, the XSLT template can reside in own tag, also hidden: custom-element sample
For FO kind of transformation it is definitely a challenge as there is a need for browser simulation. The template
simulation without shadow dom is not simple algorithm. Perhaps the JS work around can help. See the Light DOM API in css-chain as sample. JS source
The Problem
The HTML 5 specification introduces a
template
element [1], [2] where the content of that element doesn't represent children of it, but are part of a content property. The root node of the content property is a DocumentFragment which is a light-weight document node. These specifications provide some non-normative guidelines for interacting with XSLT and XPath [3].The DocumentFragment interface is defined in the HTML DOM 4.1 [4] as an instance of a Node. Within the HTML 5 specification, it is only referenced in relation to the
template
element.This affects the proposed
fn:parse-html
(issue #74) function as well as databases and query processors that support storing and accessing HTML5 content viafn:doc
and other APIs.Requirements
template
element in the DOM/data model.template
content as if it was XML content -- i.e. using the child:: axis to access the content.template
content separately from child content -- e.g. if the implementation has support for the HTML DOM.template
element.[*] I don't believe it is possible to support this without some changes to the data model (see the Design section below).
Design
Storing the content of the template
There are 3 options to handling the content of a
template
element.1. As children
This is how conforming processors that only understand XML content will process and view the document.
2. As a document node
This would be the minimal amount of changes needed to make the HTML5 model work. The only change I can see is that this won't conform to section 6.1.2 Accessors of the data model, in that:
becomes:
Implementors using the HTML DOM would need to map DocumentFragment nodes to
document-node()
.3. As a new document-fragment node
This is the option that is most compatible with the HTML DOM as it mirrors the
DocumentFragment
interface from that, but is also the one that is the most invasive. It will require (among other things):document-fragment()
KindTest
to the supported node/item types.subtype-itemtype
rules for the document fragment nodes.document-fragment { ... }
computed constructor for XQuery.Selecting template content
A new forward axis should be added that supports selecting fragment nodes. Some of the possible names include:
fragment::
-- following the pattern defined by theattribute::
axis; orcontent::
-- following the nomenclature from the HTML specification for thetemplate
element contents.The behaviour will depend on which of the 3 options above is selected for storing the content type:
child::
. The principle node kind is element.template
element. The principle node kind is document. Note: This has an ambiguity with the reverse axes, as it is checking the parent of the node as well as the node type.fragment::
name is used for the axis, and would be more generally applicable, such as for computed constructor created fragments, or HTML DocumentFragments created from a JavaScript or web browser XPath/XSLT/XQuery binding such as Saxon-JS.References
[1] https://www.w3.org/TR/html52/semantics-scripting.html#the-template-element [2] https://html.spec.whatwg.org/#the-template-element [3] https://www.w3.org/TR/html52/semantics-scripting.html#interaction-of-template-elements-with-xslt-and-xpath [4] https://www.w3.org/TR/dom41/#documentfragment