Specification of p:run - Githubissues

xatapult commented 4 years ago

The description of the p:run step still starts with the text:

THIS IS UNREVIEWED PLACHOLDER TEXT

I'm not sure what to make of this: Is the step description unfinished? Was it forgotten to remove this line?

xml-project commented 4 years ago

I am pretty sure that the description is not finished yet, e.g. how to pass parameters to the step?

xatapult commented 4 years ago

Let me give the specification of p:run a try and let's try to discuss some things before creating the final prose. I've looked at the Calabash cx:eval step for some inspiration, but in XProc 3, some things are rather different...

<p:declare-step type="p:run">
  <p:input port="source" primary="true" sequence="true" content-types="any"/>
  <p:input port="step" sequence="false" content-types="xml"/>
  <p:output port="result" primary="true" sequence="true" content-types="any"/>
  <p:option name="options" as="map(xs:QName, item()*)?"/>
  <p:option name="static-options" as="map(xs:QName, item()*)?"/>
  <p:option name="parameters" as="map(xs:QName, item()*)?"/>
  <p:option name="step-type" as="xs:QName?"/>
</p:declare-step>

The source port connects to the primary port of the step-to-run, whatever its name. If there is no primary port on the step-to-run whatever appears on the source port is thrown away.
The step port receives the step-to-run or a step library (see step-type option)
The result port emits whatever comes out of the primary port of the step-to-run, whatever its name. If there is no primary output port on the step-to-run, it stays empty.
The purpose of the options options and static-options are self documenting I think.
parameters is for any additional processor dependent flags and parameter settings.
step-type can specify the type of the step to run. It is mainly there to allow running a specific step from a library. If it is specified for a non-library step it must be the same as the type of the step. If the input is a library and step-type is absent, the first public step in the library is run.

This might be a basic setup. What we now cannot do is use additional ports. See next comment.

xatapult commented 4 years ago

How to handle additional input and output ports on the step-to-run? Some options:

We do some syntax magic and allow connecting to ports on a p:run that are not defined. Nogo as far as I'm concerned. Feature creep.
We disallow them, only primary ports are allowed
We ignore them as far is this does not lead to an error. This means that additional input ports must have a default. Results on additional output ports disappear.
We do something nifty... For instance (this is just a wild first idea):
- We add an additional input port called additional-inputs (or whatever). Documents appearing on this port must have a document-property called p:port (or whatever) with the QName of the port they're for. The p:run step internally splits this document stream and sends them to the right input port(s).
- Likewise there is an output port called additional-outputs. Documents appearing on this port have this document-property with the name of the step's output port that produced them.

For my current personal use-case, option 3 would be enough. But it would be nice to have at least something for additional input and output ports.

gimsieke commented 4 years ago

How to handle additional input and output ports on the step-to-run? Some options:

We do some syntax magic and allow connecting to ports on a p:run that are not defined. Nogo as far as I'm concerned. Feature creep.

I think this is exactly what Norm proposed.

Rewriting the signature so that it hopefully reflects the changes that happened in the meantime:

<p:run
  name? = NCName
  p:pipeline? = NCName>
    (p:with-input | p:output | 
     p:with-option)*
</p:run>

The port called pipeline must be connected. It may have a different name if specified in the p:pipeline option. Apart from that, every connection given in the optional p:with-input elements will be connected to the supplied pipeline’s ports with the same names.

Usage example:

<p:run name="runme" xslt-parameters="{map{{'foo':'bar'}}}">
  <p:with-input port="pipeline">
    <p:inline expand-text="true">
      <p:declare-step name="transform-n-validate">
        <p:input name="source" primary="true"/>
        <p:input name="stylesheet"/>
        <p:input name="xsd"/>
        <p:output port="result" primary="true"/>
        <p:output port="report" pipe="report@xsdval"/>
        <p:xslt parameters="{$xslt-parameters}">
          <p:with-input port="stylesheet" pipe="stylesheet"/>
        </p:xslt>
        <p:validate-with-xml-schema assert-valid="false" name="xsdval">
          <p:with-input port="schema" pipe="xsd"/>
        </p:validate-with-xml-schema>
    </p:inline>
  </p:with-input>
  <p:with-input name="source" href="my.xml"/>
  <p:with-input name="stylesheet" href="my.xsl"/>
  <p:with-input name="xsd" href="my.xsd"/>
  <p:output port="result" primary="true"/>
  <p:output port="report"/>
</p:run>

I think here the p:run/p:output elements don’t accept @pipe etc. The names of the declared output ports must match the names of the output ports in the pipeline. Primary status also needs to match.

If p:run/p:output refers to ports not present in the pipeline, you still can connect to that output from the outside, but no document will appear on the port.

All output ports of the pipeline that are not declared in p:run/p:output will not be visible from the outside.

We disallow them, only primary ports are allowed

This would severely limit the utility of p:run.

We ignore them as far is this does not lead to an error. This means that additional input ports must have a default. Results on additional output ports disappear.

hmm…

We do something nifty... For instance (this is just a wild first idea):

We add an additional input port called additional-inputs (or whatever). Documents appearing on this port must have a document-property called p:port (or whatever) with the QName of the port they're for. The p:run step internally splits this document stream and sends them to the right input port(s).

Likewise there is an output port called additional-outputs. Documents appearing on this port have this document-property with the name of the step's output port that produced them.

This is similar to the multiplexing that XML Calabash 1 does. I think what Norm proposed is easier to use.

For my current personal use-case, option 3 would be enough. But it would be nice to have at least something for additional input and output ports.

xatapult commented 4 years ago

Ok. fine with me. Hadn't seen that older issue. Thanks.

It would mean that processors have to deal for p:run with a step where the input and output ports are dynamic and not pre-declared. Up to now (AFAIK) when a step is invoked, the processor knows which input and output ports it has because they're declared. Not so for p:run which now can have a different port configuration for every separate invocation.

I wonder how difficult that might be and whether this is stretching the whole framework of how XProc 3.0 works too much...

Opinions please. And let's discuss this on out next call.

gimsieke commented 4 years ago

A spec is there now. It is discussed in other issues.

xproc / 3.0-steps

Specification of p:run #331