qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

Namespace nodes and the namespace axis #1340

Open michaelhkay opened 1 month ago

michaelhkay commented 1 month ago

It would be nice to bring XSLT, XPath, and XQuery into line here.

The current state of play seems to be:

XQuery: the namespace axis is not supported. Namespace nodes can be constructed, but they exist only as detached orphans; they can never be attached to a parent element.

XPath: the namespace axis is deprecated and support is optional. There is no mechanism for constructing namespace nodes.

XSLT: the namespace axis is mandatory. Namespace nodes can be constructed and can be attached to elements.

I believe that the only reason for the differences is that XQuery implementors were concerned that it would be difficult to implement namespace nodes efficiently. I think XSLT has clearly demonstrated that this concern is unjustified.

However, there are implementation complexities, primarily around the fact that namespace nodes have identity and parentage, so if a namespace is declared on a root element, then every element in the document has a namespace node for this namespace, and these have distinct identity. To implement this efficiently, the implementation has to instantiate namespace nodes lazily on demand, and then has to ensure that if the "same" namespace node is instantiated again, it has the same "identity".

I suggest a solution along the following lines, applied to all three languages:

(a) the namespace axis is supported and delivers namespace nodes

(b) operations that depend on the ordering, identity, or parentage of namespace nodes are deprecated and implementation-defined.

(c) the data model says that the in-scope namespaces of an element are in the form of a (prefix, URI) map. The semantics of the namespace axis are described in terms of constructing transient namespace nodes from this map.

ChristianGruen commented 1 month ago

I suggest a solution along the following lines, applied to all three languages:

+1 Sounds like a good compromise.

michaelhkay commented 1 month ago

I've been trying to write this up as a PR, and struggling a little bit with some of the finer points.

Firstly, I've changed the data model spec so the in-scope namespaces of an element are now defined by a prefix-to-uri map, and not by namespace nodes. Namespace nodes are now never part of a document tree, they are only used transiently (a) during tree construction in XQuery and XSLT, and (b) in the result of expressions using the namespace axis, which is now optional in all three languages (but required when 1.0 compatibility mode is in force). So far so good.

Where it gets tricky is now the properties and behaviour of these namespace nodes, in particular those returned by the namespace axis (if supported). I would like to relax the rules on node identity, parentage, and document order, since these rules greatly constrain the implementation options and deliver very little user value. But what exactly should the new rules be, and how much can we relax them without having too much effect on compatibility?

We could start by saying that namespace nodes have no parent property. I think it's rather unlikely that any real applications (as distinct from test cases) will be affected by that change. But an XSLT stylesheet that uses match="abc/namespace::x" in an attempt to change namespace prefixes would be affected, so I could be wrong.

With no parent property, it becomes possible to share and reuse namespace nodes: for example, if all elements in a document have the same in-scope namespaces, then they can all return the same nodes when the namespace axis is used. But then we have to define new rules for document order and identity of namespace nodes, and it's not clear what those rules should be.

We could be more radical and drop the namespace axis entirely. It's been deprecated for a while, after all. And dropping it would cause a clean compile-time failure for applications that use it, which is a lot safer than having subtle changes to the semantics. I'm leaning towards that - even in 1.0 compatibility mode.

ChristianGruen commented 1 month ago

We could be more radical and drop the namespace axis entirely.

Fine as well (I cannot remember users missing it).

michaelhkay commented 1 month ago

Fine as well (I cannot remember users missing it).

Perhaps I have a longer memory. The problem is that we do have users upgrading directly from XSLT 1.0, and in 1.0 the namespace axis was the only way of achieving certain things.

But I think I'm prepared to bite the bullet on this. Users upgrading from a language defined in 1999 to one defined in 2026 (?) should surely expect a few glitches.

ChristianGruen commented 1 month ago

Perhaps I have a longer memory.

I would never object ;·)

ndw commented 1 month ago

I'm a little gobsmacked by the suggestion that the namespace axis be removed. I've certainly got stylesheets that use it. The DocBook stylesheets for example include this template:

<xsl:template match="ls:group">
  <l:group>
    <xsl:copy-of select="@*,namespace::*[local-name(.) != '']"/>
    <xsl:apply-templates select="ls:template"/>
  </l:group>
</xsl:template>

Is there some obvious alternative to the namespace axis that I'm not immediately seeing, or are we considering removing a substantial function from the languages?

michaelhkay commented 1 month ago

Theoretically this code has been non-portable since 2.0, when support for the namespace axis was made optional. Perhaps it has actually been portable in practice since all XSLT processors have chosen to support it.

The portable way to write this in 2.0 seems to be

<xsl:copy-of select="@*"/>
<xsl:variable name="e" select="."/>
<xsl:for-each select="in-scope-prefixes(.)[. != '']">
   <xsl:namespace name="{.}" select="namespace-for-prefix($e, .)"/>
</xsl:for-each>

which isn't exactly an improvement.

The main thing I'd like to achieve is to get rid of the indefensible differences between the different specs: deprecated in XPath, optional but effectively required in XSLT, not allowed in XQuery.

Also, if we're going to retain the namespace axis, can we get rid of the parent property in the returned namespace nodes, so that namespace nodes can become reusable prefix/uri pairs rather than being replicated for every element?

davidcarlisle commented 1 month ago

I'm a little gobsmacked by the suggestion that the namespace axis be removed. I've certainly got stylesheets that use it. The DocBook stylesheets for example include this template:

<xsl:template match="ls:group">
  <l:group>
    <xsl:copy-of select="@*,namespace::*[local-name(.) != '']"/>
    <xsl:apply-templates select="ls:template"/>
  </l:group>
</xsl:template>

Is there some obvious alternative to the namespace axis that I'm not immediately seeing, or are we considering removing a substantial function from the languages?

another example "in the wild"

https://github.com/davidcarlisle/web-xslt/blob/main/htmlparse/htmlparse.xsl#L244

that htmlparse stylesheet seems to be more or less exactly 20 years old, but surprisingly it still comes up from time to time eg this from Martin last year:

https://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/202306/msg00027.html

Breaking that wouldn't be the end of the world, but surely Norm and I can't be the only people ever to have used this?

Note that stylesheet is xslt 2 not 1 (and was mostly written to explore the "new" xslt 2 features) so it's not just people upgrading from 1.0 who might be impacted.