Rats is a collection of tools to help researchers define and run experiments. It is designed to be a modular and extensible framework currently supporting building and running pipelines, integrating configs and services.
MIT License
13
stars
4
forks
source link
Rethink how PipelineBuilder.combine 'inputs' and 'outputs' arguments #123
All sub-pipeline inputs that are not connected to some other sub-pipeline's output are exposed as inputs to the new pipeline. They are grouped by name, so if two sub-pipelines have an (unconnected) input called INPUT1 the new pipeline will have an input called INPUT1 wired to the two sub-pipelines.
If inputs argument is provided:
It needs to be a dictionary mapping name (str) to list of sub-pipeline's inputs. Each unconnected sub-pipeline input must appear exactly once.
Suggestion:
Remove the inputs argument, and stay with the existing default behaviour. All desired outcomes can be achieved by renaming inputs in the sub-pipelines before calling combine. That's the way I am building pipelines for about a year now.
Outputs:
Current behaviour:
If outputs argument is not provided:
All sub-pipeline outputs that are not connected to some other sub-pipeline's input are exposed as outputs of the new pipeline. If more than one sub-pipeline has an (unconnected) output of a particular name, a concatenation stage is added taking all those outputs and exposing a single output that is a list of the original ones.
If outputs argument is provided:
It needs to be a dictionary mapping name (str) to list of sub-pipeline's outputs. If the list is > 1 then a concatenation stage is added as above. The provided dictionary defines the outputs, and is not an addition to them.
Suggestion 1:
Remove the outputs argument. Add a drop_output and clone_output methods to Pipeline. All desired behaviour can be achieve by a combination of drops, clones and renames.
Suggestion 2:
Replace the outputs argument with an additional_outputs argument. Add a drop_output method to Pipeline.
Suggestion 3 (orthogonal to suggestions 1 and 2):
Remove the auto-concatenation of outputs when multiple sub-pipelines have an unconnected output of the same name. Instead error. If users want a concatenation stage, they can add it themselves. The auto-concatenation is confusing and not very helpful because it is not possible to know which of the outputs came from which sub-pipeline.
Inputs:
Current behaviour:
If
inputs
argument is not provided:All sub-pipeline inputs that are not connected to some other sub-pipeline's output are exposed as inputs to the new pipeline. They are grouped by name, so if two sub-pipelines have an (unconnected) input called INPUT1 the new pipeline will have an input called INPUT1 wired to the two sub-pipelines.
If
inputs
argument is provided:It needs to be a dictionary mapping name (str) to list of sub-pipeline's inputs. Each unconnected sub-pipeline input must appear exactly once.
Suggestion:
Remove the
inputs
argument, and stay with the existing default behaviour. All desired outcomes can be achieved by renaming inputs in the sub-pipelines before callingcombine
. That's the way I am building pipelines for about a year now.Outputs:
Current behaviour:
If
outputs
argument is not provided:All sub-pipeline outputs that are not connected to some other sub-pipeline's input are exposed as outputs of the new pipeline. If more than one sub-pipeline has an (unconnected) output of a particular name, a concatenation stage is added taking all those outputs and exposing a single output that is a list of the original ones.
If
outputs
argument is provided:It needs to be a dictionary mapping name (str) to list of sub-pipeline's outputs. If the list is > 1 then a concatenation stage is added as above. The provided dictionary defines the outputs, and is not an addition to them.
Suggestion 1:
Remove the
outputs
argument. Add adrop_output
andclone_output
methods toPipeline
. All desired behaviour can be achieve by a combination of drops, clones and renames.Suggestion 2:
Replace the
outputs
argument with anadditional_outputs
argument. Add adrop_output
method toPipeline
.Suggestion 3 (orthogonal to suggestions 1 and 2):
Remove the auto-concatenation of outputs when multiple sub-pipelines have an unconnected output of the same name. Instead error. If users want a concatenation stage, they can add it themselves. The auto-concatenation is confusing and not very helpful because it is not possible to know which of the outputs came from which sub-pipeline.