microsoft / rats

Rats is a collection of tools to help researchers define and run experiments. It is designed to be a modular and extensible framework currently supporting building and running pipelines, integrating configs and services.
MIT License
13 stars 4 forks source link

Rethink how PipelineBuilder.combine 'inputs' and 'outputs' arguments #123

Open elonp opened 7 months ago

elonp commented 7 months ago

Inputs:

Current behaviour:

If inputs argument is not provided:

All sub-pipeline inputs that are not connected to some other sub-pipeline's output are exposed as inputs to the new pipeline. They are grouped by name, so if two sub-pipelines have an (unconnected) input called INPUT1 the new pipeline will have an input called INPUT1 wired to the two sub-pipelines.

If inputs argument is provided:

It needs to be a dictionary mapping name (str) to list of sub-pipeline's inputs. Each unconnected sub-pipeline input must appear exactly once.

Suggestion:

Remove the inputs argument, and stay with the existing default behaviour. All desired outcomes can be achieved by renaming inputs in the sub-pipelines before calling combine. That's the way I am building pipelines for about a year now.

Outputs:

Current behaviour:

If outputs argument is not provided:

All sub-pipeline outputs that are not connected to some other sub-pipeline's input are exposed as outputs of the new pipeline. If more than one sub-pipeline has an (unconnected) output of a particular name, a concatenation stage is added taking all those outputs and exposing a single output that is a list of the original ones.

If outputs argument is provided:

It needs to be a dictionary mapping name (str) to list of sub-pipeline's outputs. If the list is > 1 then a concatenation stage is added as above. The provided dictionary defines the outputs, and is not an addition to them.

Suggestion 1:

Remove the outputs argument. Add a drop_output and clone_output methods to Pipeline. All desired behaviour can be achieve by a combination of drops, clones and renames.

Suggestion 2:

Replace the outputs argument with an additional_outputs argument. Add a drop_output method to Pipeline.

Suggestion 3 (orthogonal to suggestions 1 and 2):

Remove the auto-concatenation of outputs when multiple sub-pipelines have an unconnected output of the same name. Instead error. If users want a concatenation stage, they can add it themselves. The auto-concatenation is confusing and not very helpful because it is not possible to know which of the outputs came from which sub-pipeline.

elonp commented 7 months ago

@jzazo