xproc / 3.0-specification

A community-driven effort to define an XProc 3.0 specification (formerly 1.1)
http://spec.xproc.org/
33 stars 10 forks source link

Collections must not contain duplicates #1094

Closed ndw closed 5 months ago

ndw commented 6 months ago

In various places related to expression evaluation, we say that if the collection attribute is true, then the context item is undefined and all of the documents that appear on the port providing the context item are available in the default collection.

What we don't say is that if two or more documents that appear on that port have the same document-uri(), only one of them can appear in the collection. And we have tests in the test suite that rely on violating this constraint. The constraint isn't ours, it's in XPath:

For every document node D that is in the target of a mapping in available collections, or that is the root of a tree containing such a node, the document-uri property of D must either be absent, or must be a URI U such that available documents contains a mapping from U to D.

That's not the clearest prose in the world, but I think it establishes that there is a single mapping from U to D so you can't have more than one document with the same document-uri() (or, consequently the same document more than once).

I expect we should clarify this in an errata.

The question is, should it be an error to attempt to construct a collection that contains two documents with the same document-uri() or should we say that the implementation must avoid this by including only one such document. I think the latter is better, since the user may have no obvious way to fix the error. But it does potentially change the behavior of existing pipelines.

A "filter out duplicates" step might be something to add.

ndw commented 6 months ago

All hope is not lost. Mike observes that the definition of fn:collection says explicitly:

There is no requirement that any nodes in the result should be in document order, nor is there a requirement that the result should contain no duplicates.

So maybe my reading of the definition of "available collections" was too narrow.

ndw commented 5 months ago

Saxon bug. Fixed in 12.5