reanahub / reana

REANA: Reusable research data analysis platform
https://docs.reana.io
MIT License
127 stars 54 forks source link

storage: use EOS as a staging-in/staging-out space #246

Closed diegodelemos closed 4 years ago

diegodelemos commented 4 years ago

REANA uses CEPHFS as a scratch space for running workflows. This space is nothing more than a transient location, we cannot guarantee that runs won't be deleted, what is more, we will garbage collect them to make the system performant. Therefore, we should test a way of moving out of the workspace the important files, and a solution is to advise users to move the important files to their own private EOS workspace.

For example, taking the roofit REANA example:

version: 0.6.0
inputs:
  files:
    - code/gendata.C
    - code/fitdata.C
  parameters:
    events: 20000
    data: results/data.root
    plot: results/plot.png
workflow:
  type: serial
  specification:
    steps:
      - name: gendata
        environment: 'reanahub/reana-env-root6:6.18.04'
        commands:
        - mkdir -p results
        - root -b -q 'code/gendata.C(${events},"${data}")'
      - name: fitdata
        environment: 'reanahub/reana-env-root6:6.18.04'
        commands:
        - root -b -q 'code/fitdata.C("${data}","${plot}")'
      - name: publish
        kerberos: true
        environment: 'reanahub/krb5'
        commands:
        - cp results/*.png /my/eos/private/directory
outputs:
  files:
    - results/plot.png

Keep in mind that for this to work one has to create the workflow with Kerberos credentials pushed to REANA and declare it in the reana.yaml).

diegodelemos commented 4 years ago

Confirmed to work on https://reana-qa.cern.ch:

reana.yaml:

version: 0.6.0
inputs:
  files:
    - code/gendata.C
    - code/fitdata.C
  parameters:
    events: 20000
    data: results/data.root
    plot: results/plot.png
workflow:
  type: serial
  specification:
    steps:
      - name: gendata
        environment: 'reanahub/reana-env-root6:6.18.04'
        commands:
        - mkdir -p results
        - root -b -q 'code/gendata.C(${events},"${data}")'
      - name: fitdata
        environment: 'reanahub/reana-env-root6:6.18.04'
        commands:
        - root -b -q 'code/fitdata.C("${data}","${plot}")'
      - name: publish
        kerberos: true
        environment: 'reanahub/krb5'
        commands:
        - cp results/*.png /eos/user/r/rodrigdi/
outputs:
  files:
    - results/plot.png

Accessible from lxplus and CERNBox:

$ ssh lxplus-cloud.cern.ch
...
[rodrigdi@lxplus733 ~]$ ls -la /eos/user/r/rodrigdi/plot.png 
-rw-r--r--. 1 rodrigdi it 15450 Feb  5 15:29 /eos/user/r/rodrigdi/plot.png

image