Discussion: On the implementability of the specs and helping implementors

Functions and Operators

There are 4 classes of function here:

functions that have to be implemented natively -- e.g. fn:parse-html;
functions that are implemented in terms of native operations -- i.e. the dm:* and op:* functions;
functions that can be implemented in XSLT or XQuery but can be done more efficiently natively;
functions that can be implemented in XSLT or XQuery as efficiently as they can natively.

It could be useful to generate a function library of the form namespace/function.xqy and namespace/function.xsl that has the implementation of the functions that can be implemented in XSLT and XQuery. This would allow implementors to import/include those implementations into their processors/engines. -- This is more flexible than providing them all in a single file as implementors can include the functions they don't have implementations for without having to edit the files every time the spec changes.

Note: JavaScript supports polyfill files for new classes/functions so that engines that don't support those features can get a functioning implementation of that function/class.

Note: Many JavaScript engines implement various functions in JavaScript itself.

XPath and XQuery

We could make the EBNF available as a separate file in addition to the iXML grammar that has been discused/worked on. This would help implementors on the lexer and parser at least. There's not much else we can do here as the language is custom.

XSLT

We have the XMLSchema and RelaxNG grammars to help with validation. Implementors could use these in their build systems to provide API bindings to the data model.

XDM

We could provide the XDM/XPath specific XMLSchema extensions as a separate XMLSchema definition to allow implementors to get access to the type infomation for these such as for xs:numeric.

See also #666 and #652

I did in fact explore this avenue, hoping that it would be a good way to get portable implementations of functions that worked across SaxonJ and SaxonJS. What I found was that the functions that could be easily implemented in XSLT were also trivial to implement in Java or Javascript. The functions that are expensive for implementors are those that either have a lot of complexity in the specification, or that have external dependencies.

Some of the performance results were a little surprising. Basically with system functions we place more trust in the implementation than we do with user functions, for example we trust it to deliver a result of the declared type, and we trust it to accept an input sequence in a streamable form that can only be read in a forwards direction. We found that when functions were implemented in XSLT, we had to treat them as if they were user-written, which meant a higher function-calling overhead. Since the functions that are suitable for implementing in XSLT are often the simplest of functions, the calling overhead is often higher than the actual execution cost. A potential counter-argument is that an XSLT/XPath implementation should allow function inlining which permits further optimisations; but for the commonest functions such as count() we're effectively inlining it anyway.

qt4cg / qtspecs