Now that REANA v0.3.0 does not distinguish between input data and code, we have amended the client scenarios and simplified reana.yaml accordingly.
However there is one remaining thing that may be good to amend regarding the directory structure of REANA examples. This is because currently it may be somewhat confusing for users to see:
Here, note how the term "inputs" is used sort of recursively (inputs = code + inputs) to specify both the workflow inputs and the code data inputs.
We may consider amending the terms to make things clearer.
Proposol
An easy solution to the above double use of the term "inputs" could be to rename the latter "inputs" to "data". However let us consider broader perspective to make sure about the choices so that we don't have to redo the examples yet again.
Consider a simple reusable analysis example consisting of some input data, some runtime code, the computational workflow specification, and which generates some filtered and simplified high-level output data and some output plots.
Here are several file organisation examples that illustrate various options as a basis for our RFC discussion.
Whatever we do in the REANA demo examples, we are fully committed to supporting whatever directory organisation structure the original researcher communities have a habit of using.
... but the good default choice is important, as there is a considerable probability that users will simply clone and emulate existing examples.
Different REANA examples could demonstrate different file organisation techniques mentioned above; we don't have to settle for the same structure everywhere.
Problem
Now that REANA v0.3.0 does not distinguish between input data and code, we have amended the client scenarios and simplified
reana.yaml
accordingly.However there is one remaining thing that may be good to amend regarding the directory structure of REANA examples. This is because currently it may be somewhat confusing for users to see:
Here, note how the term "inputs" is used sort of recursively (
inputs = code + inputs
) to specify both the workflow inputs and the code data inputs.We may consider amending the terms to make things clearer.
Proposol
An easy solution to the above double use of the term "inputs" could be to rename the latter "inputs" to "data". However let us consider broader perspective to make sure about the choices so that we don't have to redo the examples yet again.
Consider a simple reusable analysis example consisting of some input data, some runtime code, the computational workflow specification, and which generates some filtered and simplified high-level output data and some output plots.
Here are several file organisation examples that illustrate various options as a basis for our RFC discussion.
Option A: fully flat structure
Everyting in the same directory:
Option B: flat inside inputs/workflow/outputs
The overall structure is based on the concept of workflow with its separated inputs and outputs:
Option C: logical structure inside inputs/workflow/outputs
The overall structure allows some subdirectories in the inputs and outputs to make easier separation of concepts or steps:
Option D: logical structure inside root
The overall structure does not separate clearly between "inputs" and "outputs" which are considered somehow self-understandable:
Option E: many inptus/workflows/outputs for separate steps
The overall structure contains an arbitrarily nested variant of one of the above options:
Notes
Whatever we do in the REANA demo examples, we are fully committed to supporting whatever directory organisation structure the original researcher communities have a habit of using.
... but the good default choice is important, as there is a considerable probability that users will simply clone and emulate existing examples.
Different REANA examples could demonstrate different file organisation techniques mentioned above; we don't have to settle for the same structure everywhere.