Open line-o opened 4 months ago
See also issue #490 and issue #305
Other options related to XML parsing include:
Related: #285
I am in the process of writing a draft to extend fn:doc
with an options argument.
If we were to specify and thus allow other formats than XML at least some of these would not return a document-node() but rather a function item. For JSON and CSV we would ideally just return the result of parse-json
and parse-csv
.
That would change the return type of fn:doc()
to ( document-node() | array(*) | map(*) )?
Would this be considered a change that breaks backwards compatibility?
I did think about doc("/my/example/api", { "format": "json" })
to return a document-node() containing the parse result transformed to XML. It just seems unintuitive to me.
It wouldn't break backwards compatibility to widen the result type (we did that with fn:collection()
) but I'm not sure it's a good idea. A new function resource()
that handles different kinds of resource might be better.
Perhaps resource("/my/example/api", {"content-type": "application/json"})
?
That is an interesting approach. I would still want to see fn:doc
to have an options parameter with the properties we gathered here.
This new function should have its separate issue in order for us to be able to close this one once the required changes are merged.
fn:doc
,fn:collection
andfn:uri-collection
currently expect only a single argument, a URI.There is no way of adding additional parameters to those functions.
Several implementations of XPath have worked around that limitation by
passing of parameters via query string as part of the URI:
uri-collection
works similarlycreate custom functions in other namespaces to add an options map as a second parameter
saxon:doc
in Saxon https://www.saxonica.com/documentation10/index.html#!changes/extensions/9.7-9.8fetch:doc
in baseX https://docs.basex.org/wiki/Fetch_Module#fetch:docWhile both approaches do work well, they do fall flat in terms of interoperability and discoverability. A script written for Saxon leveraging
saxon:doc
will not work on baseX in vice versa even though they offer options with some overlap. And a developer looking at the language specification will not discover that these options even exist.I would like to add a second signature to the above functions with an options map as a second argument.
NOTE: Looking at the other two functions below I believe the first parameter should be defined as
$href as xs:string? := ()
Since a lot of those options depend on the current runtime most of them will be "free" options. This will also help us get to a specification quickly and circumvent long infighting about some very specific details.
I do see, however, a good chance of specifying a small set of options that would work across implementations.
Possible standard options
For
fn:doc
validation
: wether and how to validate the input files against a schemawhitespace
: (strip-space
,stripws
) what to do with whitespace in the input documentparser
: could be used to define a different parser (for html documents)For
collection
anduri-collection
I see the following:recurse
: traverse collection trees down into its subcollectionsstable
: this is already vaguely mentioned in the spec and would benefit from a clearer specificationtype
: (akamedia-type
orcontent-type
) while the allowed values will be implementation defined the key should be standardisedThis would bring the above functions to follow a pattern developers are already familiar with (see
fn:serialize
and others)Thanks for initial input by @ChristianGruen, Liam Quin and @michaelhkay