Closed xatapult closed 5 years ago
Example: <p:variable name="count" as="xs:integer" collection="true" select="count(collection())"/>
⇒ 3, if 3 documents are on the DRP.
Ok. Aha. So this also means that you can access the n-th document by writing collection()[n]
, that's nice.
But what happens when non-XML documents are on the DRP?
I think non-XML documents will just be an empty document node (citation needed), or a text node within a document node for text documents.
Here’s a more elaborate definition of the context item: http://spec.xproc.org/master/head/xproc/#err.inline.D0008
JSON document are represented by their XDM representation, that is, an array or a map?
So if collection()[2]
is of content type text/json
, it is not a text node wrapped in a document node, but instead an array or map?
Binary documents are implementation defined. So contrary to what I believed, they are not necessarily represented as a document node. However, document-properties(collection()[3])
should return the document property map if the third document is a binary file.
You should be able to do the following, irrespective of the representation:
<p:variable name="binary-doc" as="???" collection="true" select="collection()[3]"/>
<p:store href="myfile.bin">
<p:with-input name="source" select="$binary-doc">
<p:empty/>
</p:with-input>
</p:store>
Should you be able to give a document in a select
attribute? The context for p:store
here is still the DRP with 3 documents on it. So could we also say, if collection
were allowed on p:with-input
, <p:with-input name="source" select="collection()[3]" collection="true"/>
?
What do we (interoperably) specify as the as
attribute value if the variable is supposed to hold a binary document?
As always, complications...
May I propose the following:
<aside>I don't think collection()[2]
is going to do what you want; the order of documents in the collection may not be stable.</aside>
The question of what to do with maps is an interesting one. We want JSON to be able to flow through the pipeline. We want to represent JSON as XDM maps. XDM maps aren't nodes. So I think we've just painted ourselves into a corner that says what flows between steps are XDM instances not documents. Bah, humbug.
Non-node values can't go into collections so either we have to serialize them and make them nodes or we have to leave them out of collections. Bah, double humbug.
Non-node values can't go into collections so either we have to serialize them and make them nodes or we have to leave them out of collections.
Why not? The XPath 3.1 specification say:
Default collection. This is the sequence of items that would result from calling the fn:collection function with no arguments.
So in my reading, any instance of item (document nodes, text nodes etc, and maps) can be part of the default collection. What did I miss?
So I think we've just painted ourselves into a corner that says what flows between steps are XDM instances not documents.
Yes, we actually use document in a double sense, this was why I introduced the term "XProc document" in my London paper in June: What follows between steps in XProc is an (XProc) document.
XProc document are pair of properties and representations. A representation may be an XDM document or a map.
@gimsieke
What do we (interoperably) specify as the as attribute value if the variable is supposed to hold a binary document?
Answer: item()*
Sorry. My bad. I was looking at the XPath 3.0 functions and operators spec where fn:collection()
returns node()*
. I see that in 3.1 it returns item()*
. Ignore that bit.
Ok, looks fine. So summarizing:
Ok. I'm unsure about 4. @xml-project, Is that what you meant.
@ndw We'll have to say something about the order of documents. But why wouldn't that be stable. Documents flow in a certain order, right?
@eriksiegel
Ok. I'm unsure about 4. @xml-project, Is that what you meant.
Yes. You will get what you get, because we define the behavior of binary documents only on the XProc level, not on the XPath level were we are now.
I think your conclusion for JSON is not quite right: For documents with content-type application/json
we decided to use fn:parse-json()
and I think this is also true for collection().
The function specs say:
JSON-object -> Map JSON-array -> Array JSON-string -> xs:string JSON-number -> xs:double JSON-boolean -> s:boolean JSON-null -> EMPTY-Sequence
So IMHO the correct answer (expressed as SequenceType) for JSON is item()?
.
About order:
In
<p:identity>
<p:with-input port="source">
<p:document href="doc1.xml"/>
<p:document href="doc2.json"/>
<p:document href="image.png"/>
</p:with-input>
</p:identity>
<p:variable name="png" select="collection()[3]" collection="true"/>
$png
is guaranteed to contain the image.png
document. This is stated in the note that immediately precedes http://spec.xproc.org/master/head/xproc/#documentation.
Order would not be guaranteed if you connect to the secondary port of a p:xslt
step and, for ex., expect the text document to be the first output document on this port, see https://github.com/xproc/1.0-specification/issues/17
@gimsieke Sorry, but I thought we were talking about the order in which the XPath-function collection() returns the documents, not about the order on an XProc port (the passage you have quoted).
I agree which @ndw that the specs of XPath-function collection() does not define an order for the sequence, so you can NOT be sure, that image.png
is returned by collection()[3].
I think that is why XPath has function fn:collection(arg as s:string?)
(arg is interpreted as uri) and the function will return the document (in the default collection) with this URI (if any).
Implementations should be required to let collection()
return the documents in the order in which they appear on the port. Is there a reason not to stipulate this?
@gimsieke
Is there a reason not to stipulate this?
We are not in a position to stipulate this, because we are not the XPath next community group. collection() is an XPath function defined in their specs. How can we change their specs?
I don’t see anything in https://www.w3.org/TR/xpath-functions-31/#func-collection that would prevent us from returning the default collection in a specific order.
Sorry @gimsieke , I failed to make my point: We (which means in this case the XProc implementors) do not return anything here. We call an XPath processor to execute the XPath expression containing "fn:collection()". And the XPath processor evaluate the expression according to the XPath specs. And since the specs do not guarantee order, there might be order or not.
I do not see, what we (the XProc next community group) could do about this?
Saxon for example has no built-in default collection. If I read this code correctly, @ndw constructs a default collection that he passes to net.sf.saxon.lib.CollectionURIResolver
. This is for XSLT. For XProc 3.0 constructs that accept @collection
, I assume that Norm will continue to use Saxon as the XPath processor. For these XPath expressions (outside of XSLT), you have your own XPath processor. What prevents you from defining the default collection in a specific way?
As far as I remember "CollectionURIResolver" is deprecated since 9.7. I looked up the APIs yesterday to see whether there are informations, but there are none. There is a new interface "CollectionFinder", but there is also no hint about order (and stability).
I do not think Saxon Api can count as argument, because we are not building "XProc on Saxon".
I do not think the problem is worth the whole discussion because you could easily use p:split-sequence
to solve the problem. So IMHO there is no need to deviate from XPath standards or tie our specification to a specific XPath processor.
I have no reason to believe that the collection()
function returns the documents int he same order that I passed them to the collection URI resolver (or whatever the new interface is).
It's called collection not sequence because it's an unordered collection, I believe.
I am not suggesting to tie XProc to a specific XPath processor. I am just proposing that each implementation be required to return the default collection in the order that the documents that appear on the corresponding port already have. In certain circumstances, the order in which they appear is already specified by the XProc spec.
And I’m asserting that this does not deviate from the XPath spec.
That is not within my control. I pass a bunch of documents off to Saxon to put in a collection. I don't know how Saxon keeps track of those. Maybe Michael puts them in a map and the insertion-order is lost. Maybe he doesn't. Whether or not they come back in the order I added them is at best implementation-dependent.
I think @ndw comment should be the bottom line under the "order"-discussion Gentleman.
Ok. Returning to @eriksiegel’s comment, maybe we should add a note to the default collection. Something like: “A specific XProc processor in a specific version might return collection items in a certain order, and maybe it is the order that the items appeared on a port. However, you should not rely on accessing collection items by position (for example, collection()[3]
). Use other criteria, such as base URIs and other document properties, top-level element names or namespaces, or map keys in order to select specific items from a collection.”
Fine with me. I'll add some more prose to this to clarify.
@eriksiegel proposes that #565 also fixes this. I'm happy with that.
I think I misinterpreted the meaning of
@collection
onp:variable
andp:with-option
. The spec's description is currently rather sparse and IMHO needs some clarification:I'll try to write some more prose on this if somebody can explain what is meant and/or send me a simple code example?