Closed rdeltour closed 9 years ago
[ Not sure how best to manage the split conversation. Here's what I sent in reply to Jim ]
Yes, this proposal definitely mixes things in p:input. And p:filter is probably a poor choice of names given that we already have a p:filter step. So, imagine we'll call it something else eventually.
I was being lazy about the content model; for the sake of Romain's question about ordering, let's force all the filters to be at the end of the content model.
The filter element is a child of p:input because there's no where else to put it, really. The idea is that it applies (they apply) to the sequence of documents appearing on that input port.
Now that we have non-XML documents in the pipeline, I can imagine that there will be more streams of "mixed" documents (contents of ZIP files, contents of directories globbed, etc.). Some steps will want to process only the images, some only the XML, etc. Rather than having to filter the streams in separate steps, my thinking is that this simplifies a common case.
It was inspired by ant and gradle features that allow you to grab a bunch of files, because that simplifies the selection, and then explicitly exclude some. I suppose exclude is all you really need, but I liked the parallelism of include/exclude.
Using the select attribute on p:input only solves the very simple case. But maybe that's enough.
(The case it doesn't handle is when you want to use @select to process some interior portion of the documents because then you can't (easily) make @select do both.)
Actually, I don't think this can work at all:
<p:input port="source" select="collection()[contains(
map:get(p:document-properties(.),'content-type'),
'xml')]">
The select expression on p:input
applies to each document in turn. The collection()
function isn't meaningfully defined.
I suppose select=".[contains(...)]/expr"
would work, but it's a little subtle.
[ Not sure how best to manage the split conversation. Here's what I sent in reply to Jim ]
(yeah, sorry again for this duplication. The list is not open to public posting though, so I'll keep using the tracker.)
Using the
select
attribute onp:input
only solves the very simple case. But maybe that's enough.(The case it doesn't handle is when you want to use
@select
to process some interior portion of the documents because then you can't (easily) make@select
do both.)
I'm curious to see concrete examples of where it w/b limited. IMO using @select
is more powerful than either-or; it can do both because XPath can (it's a matter of applying a predicate to the collection sequence and then selecting nodes for each item in the sequence.
<p:input port="source" select="collection()[contains(
map:get(p:document-properties(.),'content-type'),
'xml')]">
<p:pipe step="someSource" port="result"/>
</p:input>
<p:input port="source" select="//html:div">
<p:pipe step="someSource" port="result"/>
</p:input>
which w/b specified as being equivalent to
<p:input port="source" select="collection()//html:div">
<p:pipe step="someSource" port="result"/>
</p:input>
<p:input port="source" select="collection()[f:my-filter-expression()]//html:div">
<p:pipe step="someSource" port="result"/>
</p:input>
As with before, there w/b rules to specify what kind of result sequence is allowed, how nodes are wrapped in documents, etc.
Instead of thinking about "filter" as a new child element of p:input, what about considering it a new type of binding/connection, in addition to p:document, p:pipe, p:data etc.? For instance, if it is defined as follows:
<p:filter
include? = XPathExpression
exclude? = XPathExpression>
(p:document |
p:inline |
p:pipe |
p:data |
p:filter)+
</p:filter>
then it can be a very transparent feature with the added advantage that you could use it anywhere where you can use the other bindings.
(oops I hadn't seen you replied in between)
The select expression on p:input applies to each document in turn.
right, although the v2 spec might be able to change that ?
The collection() function isn't meaningfully defined.
the idea was to define this default collection of the XPath context, consistently to what is proposed in #137
At the 10 June 2015 face-to-face, we determined that the editor's current draft of input filtering was poorly conceived and decided to abandon it.
(sorry to hijack the thread on
public-xml-processing-model-wg
list by creating an issue here, but that was the easiest way I found to comment).I had a quick look at @ndw's proposal to filtering document in ndw/specification@f4e6b74a9ee51539bbb7b5684625a62879c8c973, some comments:
p:filter
element being used for a step in the standard library, there's a risk of confusion using it in this context (the 1.0 step is a bit of a false friend IMO –I always find myself trying to use it to filter a sequence of documents when it isn't capable of that– but it's probably too late to step back on this naming).p:input
introduces an inconsistency; there's also a risk of confusion wrt to where it is inserted (e.g. does thep:filter
only applies to documents produced by previous-sibling connectors or all?)input/@select
attribute for filtering input documents? It would be possible if the sequence of connected documents was available in the default collection of the XPath context, in a similar fashion to what is proposed in #137.Instead of what is currently proposed with a
p:filter
:You'd have:
I'm surely overlooking things, but wanted to jot that down while it's fresh...