RFC directory organisation and naming

Problem

Now that REANA v0.3.0 does not distinguish between input data and code, we have amended the client scenarios and simplified reana.yaml accordingly.

However there is one remaining thing that may be good to amend regarding the directory structure of REANA examples. This is because currently it may be somewhat confusing for users to see:

$ git ls-files
code/worldpopulation.ipynb
inputs/World_historical_and_predicted_populations_in_percentage.csv
workflow/workflow.yaml

$ head -5 reana.yaml 
version: 0.3.0
inputs:
  files:
    - code/worldpopulation.ipynb
    - inputs/World_historical_and_predicted_populations_in_percentage.csv

Here, note how the term "inputs" is used sort of recursively (inputs = code + inputs) to specify both the workflow inputs and the code data inputs.

We may consider amending the terms to make things clearer.

Proposol

An easy solution to the above double use of the term "inputs" could be to rename the latter "inputs" to "data". However let us consider broader perspective to make sure about the choices so that we don't have to redo the examples yet again.

Consider a simple reusable analysis example consisting of some input data, some runtime code, the computational workflow specification, and which generates some filtered and simplified high-level output data and some output plots.

Here are several file organisation examples that illustrate various options as a basis for our RFC discussion.

Option A: fully flat structure

Everyting in the same directory:

mycode.C
mydata.csv
myworkflow.yaml
mystats.csv
myplot.png

Option B: flat inside inputs/workflow/outputs

The overall structure is based on the concept of workflow with its separated inputs and outputs:

inputs/mycode.C
inputs/mydata.csv
workflow/myworkflow.yaml
outputs/mystats.csv
outputs/myplot.png

Option C: logical structure inside inputs/workflow/outputs

The overall structure allows some subdirectories in the inputs and outputs to make easier separation of concepts or steps:

inputs/code/mycode.C
inputs/data/mydata.csv
workflow/myworkflow.yaml
outputs/stats/mystats.csv
outputs/plots/myplot.png

Option D: logical structure inside root

The overall structure does not separate clearly between "inputs" and "outputs" which are considered somehow self-understandable:

code/mycode.C
data/mydata.csv
workflow/myworkflow.yaml
plots/myplot.png
stats/mystats.csv

Option E: many inptus/workflows/outputs for separate steps

The overall structure contains an arbitrarily nested variant of one of the above options:

fitting/code/myfit.C
fitting/data/lmydata.csv
fitting/workflow/fitting-workflow.yaml
fitting/outputs/mystats.csv
plotting/code/myplot.C
plotting/workflow/plotting-workflow.yaml
plotting/outputs/myplot.png

Notes

Whatever we do in the REANA demo examples, we are fully committed to supporting whatever directory organisation structure the original researcher communities have a habit of using.
... but the good default choice is important, as there is a considerable probability that users will simply clone and emulate existing examples.
Different REANA examples could demonstrate different file organisation techniques mentioned above; we don't have to settle for the same structure everywhere.

reanahub / reana-demo-worldpopulation