reanahub / reana-demo-worldpopulation

REANA example - parametrised Jupyter notebooks
MIT License
3 stars 36 forks source link

RFC directory organisation and naming #23

Closed tiborsimko closed 6 years ago

tiborsimko commented 6 years ago

Problem

Now that REANA v0.3.0 does not distinguish between input data and code, we have amended the client scenarios and simplified reana.yaml accordingly.

However there is one remaining thing that may be good to amend regarding the directory structure of REANA examples. This is because currently it may be somewhat confusing for users to see:

$ git ls-files
code/worldpopulation.ipynb
inputs/World_historical_and_predicted_populations_in_percentage.csv
workflow/workflow.yaml

$ head -5 reana.yaml 
version: 0.3.0
inputs:
  files:
    - code/worldpopulation.ipynb
    - inputs/World_historical_and_predicted_populations_in_percentage.csv

Here, note how the term "inputs" is used sort of recursively (inputs = code + inputs) to specify both the workflow inputs and the code data inputs.

We may consider amending the terms to make things clearer.

Proposol

An easy solution to the above double use of the term "inputs" could be to rename the latter "inputs" to "data". However let us consider broader perspective to make sure about the choices so that we don't have to redo the examples yet again.

Consider a simple reusable analysis example consisting of some input data, some runtime code, the computational workflow specification, and which generates some filtered and simplified high-level output data and some output plots.

Here are several file organisation examples that illustrate various options as a basis for our RFC discussion.

Option A: fully flat structure

Everyting in the same directory:

mycode.C
mydata.csv
myworkflow.yaml
mystats.csv
myplot.png

Option B: flat inside inputs/workflow/outputs

The overall structure is based on the concept of workflow with its separated inputs and outputs:

inputs/mycode.C
inputs/mydata.csv
workflow/myworkflow.yaml
outputs/mystats.csv
outputs/myplot.png

Option C: logical structure inside inputs/workflow/outputs

The overall structure allows some subdirectories in the inputs and outputs to make easier separation of concepts or steps:

inputs/code/mycode.C
inputs/data/mydata.csv
workflow/myworkflow.yaml
outputs/stats/mystats.csv
outputs/plots/myplot.png

Option D: logical structure inside root

The overall structure does not separate clearly between "inputs" and "outputs" which are considered somehow self-understandable:

code/mycode.C
data/mydata.csv
workflow/myworkflow.yaml
plots/myplot.png
stats/mystats.csv

Option E: many inptus/workflows/outputs for separate steps

The overall structure contains an arbitrarily nested variant of one of the above options:

fitting/code/myfit.C
fitting/data/lmydata.csv
fitting/workflow/fitting-workflow.yaml
fitting/outputs/mystats.csv
plotting/code/myplot.C
plotting/workflow/plotting-workflow.yaml
plotting/outputs/myplot.png

Notes

tiborsimko commented 6 years ago

Discussions with @diegodelemos:

Leaving RFC open for a few days...