reanahub / reana-job-controller

REANA Job Controller
http://reana-job-controller.readthedocs.io/
MIT License
2 stars 38 forks source link

submission: auxiliary one-off data as files mounted into job container #45

Open lukasheinrich opened 7 years ago

lukasheinrich commented 7 years ago

There is a need for having data mounted into the container that is not part of the workflow work directory but rather encapsulated information that is only needed by the specific job.

Examples are

1) normally, we submit a container and a cmd to the job controller, where the cmd is prepared by the workflow controller (it constructs the from a template, and workflow specific data, like file paths that are only known at run-time. Sometimes the cmd is pretty long and a one-off multi-linescript is a better choice. The script can be constructed by the workflow controller, but needs to be mounted into the container by the job controller

Example:

cat /path/only/known/at/runtime/by/wflowcontroller/input.txt
echo some
echo very
echo long
echo script
cat /path/only/known/at/runtime/by/wflowcontroller/output.txt

we would like this to be mounted at some well-defined location in the container say /reana/script, such that we can submit a job with command: bash /reana/script

The Job manifest could look like this

experiment: ATLAS
docker_img: my_atlas_analysis
cmd: bash /reana/script
aux_mounts: 
   -  mountpath: /reana/script
      data: |
         echo some
         echo very
         echo long
         echo script

2) a related Example deals with situations when the commands/script become to large, we'd like to mount some of the data into the container. Take the example of merging 500 ROOT files into a single output file. For few files this is possible via hadd merged.root inputA.root inputB.root. For large lists (of absolute paths) this can become unworkable, and we'd rather write a script such as merge.py merged.root inputfiles.json. The inputfiles.json can be constructed by the workflow controller and submitted like so:

experiment: ATLAS
docker_img: my_atlas_analysis
cmd: merge.py /reana/inputfiles.json /workdir/location/merged.root
aux_mounts: 
   -  mountpath: /reana/inputfiles.json
      data: |
         {"inputsfiles": [
            "/one/very/long/path/to/a/file"
            "/one/very/long/path/to/a/file"
            "/one/very/long/path/to/a/file"
            "/one/very/long/path/to/a/file"
           ... 100s of more file paths
            "/one/very/long/path/to/a/file"
            "/one/very/long/path/to/a/file"
           ]
        }

Implementation:

Kubernetes should transparently support this via either secrets or configmaps

lukasheinrich commented 6 years ago

this is also relevant for https://github.com/reanahub/reana-workflow-engine-serial/pull/17#issuecomment-400965211

if we could mount the desired stdin as a one-off file we could have a nicer command without the base64 hack e.g.

command: ['sh','-c','root < /my/mounted/script']