w3c / qtspecs

XSLT and XQuery Specifications - the source used to build the specs, and the errata
Other
30 stars 24 forks source link

Add default element/type namespace to XPath 1.0 #63

Open ByteEater-pl opened 1 month ago

ByteEater-pl commented 1 month ago

I suggest releasing a second edition of XPath 1.0 supporting the default element/type namespace feature (added in XPath 2.0).

Most systems running an implementation of XPath 1.0, namely Web browsers, already implement it, thereby deviating from XPath 1.0, as prescribed by a "willful violation" in the HTML5 spec. (And unfortunately they generally don't intend to implement any higher version of XPath.) It seems a low hanging fruit to just add it, allowing one of those infamous willful violations to be removed. Existing implementations needn't change anything to claim conformance, as IIUC there may be no default element/type namespace initially, and it cannot be changed (unlike in XQuery).

michaelhkay commented 1 month ago

The test of whether a specification is useful is whether implementors take any notice of it. Since the browser vendors have been ignoring anything the XML community does for a quarter of a century now, I very much doubt this would change anything. And users don't need a spec that documents the known limitations of fossilised products, they need some improved functionality.

ByteEater-pl commented 1 month ago

I believe it would be changed in HTML5, quickly even. They've been willing so far to remove other willful violations when they became moot, and some of them express the sentiment that the more can (from their point of view) be removed, the better. Indeed, several years ago there was much disdain, and even hostility (resulting in incompatibilities and interoperability blockers being introduced on purpose) towards the XML community and XML technologies from them in general, but now, having assured their victory in the realm of browsers and content they support, to the point of taking the reins of HTML and DOM from W3C (which now just rubberstamps those specs), they (especially newcomers) no longer feel the need to be so vicious. We can do better than them (so far we have, I believe) and not reciprocate. After so many years. Especially given that the nastiest bulldogs among them (I'm not going to mention names here), whose excesses Mr Last Week so aptly documented, are either no longer active or only participate sporadically. As hard as it may be to believe, the climate has changed for the better (look at their cooperation with ECMA TC39 to convince yourself; albeit not without hiccups, I wish it could have looked similarly with XML folks in the 00s), and if we hope, even decades from now, to overcome the chasm they (though mostly their predecessors) tore in the fabric of Web technologies, this would be a solid step towards that goal.

More technically, I disagree with your assessment of this willful violation as a limitation. It seems for this very purpose that XPath 2.0 introduced this feature, and were some major browser team to undertake the huge task of upgrading to at least XPath 2.0, they'd naturally reach for it. And the range of what's possible, called expressiveness, is the same with http://www.w3.org/1999/xhtml/ as the default namespace or without any. It's a default, each setting of which makes some potentially desirable things easier at the cost of complicating others.

michaelhkay commented 1 month ago

It seems for this very purpose that XPath 2.0 introduced this feature

The feature in XPath 2.0 is very different from the "wilful violation" [despite WhatWG's claim to the contrary]. In XPath 2.0 the default namespace for elements and types is part of the static context. In the "wilful violation" it depends on whether the context node is an XML or HTML node, which isn't known until runtime. And the spec is very unclear as to which "context node" it is talking about - is it the context node for the particular axis step, or the context node for the XPath expression as a whole? In XPath 1.0 this isn't often going to make a difference, because the facilities for handling multiple documents are very limited; but for an expression like $n1[x = $n2/y] where $n1 and $n2 are bound externally to nodes in different documents, it's rather significant.

Anyone trying to write a new spec would have to spend a lot of time studying such edge cases, and it would probably end up being one of those reverse-engineered specs when you end up saying that different browsers handle edge cases differently.

I disagree with your assessment of this willful violation as a limitation

I wasn't specifically referring to the wilful violation as a limitation. I was referring to the entire set of known limitations in XPath 1.0 that are fixed in subsequent versions, for example the inability to do any kind of join query.

ByteEater-pl commented 1 month ago

It is a little bit more intricate, but, given the context (the HTML5 spec), which defines some underpinnings, I'm not sure it can be called even slightly underspecified and requiring reverse engineering. I might be wrong about that. But they almost certainly do know OTOH and would be willing to explain if asked. If it'd help were I to to it, I volunteer, otherwise maybe somebody in a more official capacity (a CG member, an XPath spec editor, former or current…) should. I bet that even if their spec does turn out to be underspecified, they'd consider it appropriate to fix it.

Namely, IIUC, the HTML5 spec defines an SGML-inspired (though not conformant to either SGML or XML) syntax for HTML documents, whereas for documents parsed from XML syntax and for Document objects created dynamically it defines when they're considered HTML documents and when XML documents. You're right that a call to the evaulate method (defined in the Document Object Model (DOM) Level 3 XPath Specification and adapted in WHATWG's DOM Standard) only dynamically determines whether the Document object on which it's called is an XMLDocument, and the special handling ("willful violation") applies only if not. But that's outside XPath's purview, and defined and used for much stuff in the HTML5 spec and a plethora of dependent spec (with unfortunately much too tight coupling, lack of orthogonality and layering violations). The two modes may therefore be thought of as browsers having two implementations of XPath 1.0: a compliant one, for Documents which are XMLDocuments, and one with the default element/type namespace feature from XPath 2.0 added and the namespace set to http://www.w3.org/1999/xhtml/, for other (i.e. treated as HTML5) Documents. For each call to evaluate the rules are chosen beforehand by the implementation of WHATWG's DOM (according to rules therein or elsewhere, quite often in the HTML5 spec) and set until the ball comes back from the XPath side to the WHATWG side by returning. Therefore, unless I'm mistaken, it's indeed the only thing required for eliminating the willful violation that a spec like XPath 1.0 (Second Edition, a profile with just one paragraph, as a Note even, or whatever's feasible) with the default element/type namespace feature exists.