reanahub / reana-workflow-controller

REANA Workflow Controller
http://reana-workflow-controller.readthedocs.io/
MIT License
2 stars 38 forks source link

k8s: user home directory and WORKDIR management #273

Open tiborsimko opened 4 years ago

tiborsimko commented 4 years ago

Problem

The user containers sometimes set a certain WORKDIR in Dockerfile where temporary files may be written, or they assume that the workflow user can write temporary files in the user's home directory, all perfectly valid assumptions.

Before, when REANA was running user workflows with super privileges, this was OK.

Now, when REANA runs user workflows under non-privileged user identity -- when the user identity is "imposed" and the workflow does not know about it -- this may cause write permission problems. For example, uid=0 containers may often have / as home, but the REANA runtime uid=1000 user cannot write there.

Solution

Several options:

(1) Improve documentation. Document all the best practices that workflows should not assume any particular directory or writing rights, that inputs/outputs of each step should be declared, that temporary files created within the same step (that uses multiple commands) should use runtime workflow workspace as scratch space, that we also have REANA_WORKSPACE_PATH and friends, etc. In theory, this will make workflow more portable even outside of REANA and even outside of container systems to HPC and such. In practice, this would cause a hurdle for users if we only document this.

(2) Improve user home directory management. When starting a workflow step as a user job pod, ensure that the runtime user home drectory is writable. Several possibilities here, for example: (2a) create writable home directory and start commands there, (2b) inject writable WORKDIR for the container, (2c) switch to use runtime workspace by default so that "undeclared" writes will be carried out there, and more. Some of these are more preferable than others, e.g. the last one may be messy regarding best "isolation" practices.

Let's muse about various options IRL together with related UID/GID management for Kubernetes? #269

Notes

P.S. Observed during updating the BMS search example https://github.com/reanahub/reana-demo-bsm-search/issues/14

Sinclert commented 4 years ago

I ended up in this issue after encountering a problem with permissions inside Docker containers being run by REANA.

I understand REANA performs an implicit user-substitution, from root to, let's say, reana (that non privileged user identity with UID 1000). Now, I could not find this anywhere on the documentation, and I think it is an important aspect to highlight given that the goal of containerized environment is controlling what and how the code is going to run, and this implicit substitution makes it not possible.

In addition to the necessary documentation changes, I would like to ask: what is the alternative for workflows that require write permissions in certain / folders?

I can think of three:

The third option resembles what is explained on this part of the User's guide (Supporting arbitrary user IDs section). However, it will not work if REANA substitutes the Docker specified USER with its non privileged one even if the specified one is not root (which is something I don't know).