ohsu-comp-bio / funnel

Funnel is a toolkit for distributed task execution via a simple, standard API.
https://ohsu-comp-bio.github.io/funnel
MIT License
121 stars 32 forks source link

storage/local: strategy for user/group permissions #66

Open buchanae opened 7 years ago

buchanae commented 7 years ago

Task outputs often become inputs to other tasks, such as when TES/Funnel is being used by a workflow engine. When containers write output files, they are usually owned as root, which creates a tricky problem of file ownership on the host.

The best solution we've found is to use setgid on a directory to ensure the group owning the file is the group the Funnel server is in.

Funnel code should do the setgid automatically when creating a working directory, with logs explaining what is happening.

What happens when the working directory already exists? Do we modify it to have these permissions? Do we log a warning without modifying anything?

buchanae commented 7 years ago

Another issue here is that the file written by the container might not be writable by the group, which leads to errors when the owner is root and you/Funnel are trying to delete/modify the file on the host system. Is it possible to force the file to writable for the group?

buchanae commented 7 years ago

Another issue:

If a tarball is extracted in the container as the root user, the setgid behavior doesn't apply and the files end up with the user/group ID of the files in the tarball.

buchanae commented 7 years ago

Switching the focus of this issue since it seems setgid isn't a complete solution.

buchanae commented 7 years ago

These are the options that have been discussed.

chown

This would run chown recursively on all the output files. This would either require elevated permissions for the Funnel worker, or a post-processing Docker container.

Cons:

docker -u

This would run all containers with the -u flag, which sets the user inside the docker container.

https://github.com/ohsu-comp-bio/funnel/pull/46

Cons:

setgid working directory on the host

This would use linux's setgid behavior on the Funnel working directory, which ensures that files created in that directory and all sub-directories have a specific group (with exceptions).

CAP_CHOWN is the linux capability Funnel would need to access chown, so it doesn't necessarily need root.

Cons:

https://superuser.com/questions/381416/forcing-group-and-permissions-for-created-file-inside-folder

FUSE filesystem

This would include a FUSE filesystem in Funnel, which would host the working directory (where task outputs are written). With low-level file system control, we might have more control over permissions mapping.

FUSE is on macOS requires installing OSXFUSE but Docker for Mac seems to handle users for you, so maybe FUSE wouldn't be needed there.

https://unix.stackexchange.com/questions/198590/what-is-a-bind-mount

Cons:

buchanae commented 7 years ago

Focus on userspace (singularity, rootless runC) containers

This would provide a non-Docker alternative to running Funnel in an environment where root access is not appropriate.

Linux ACLs

This would leverage linux ACLs to give Funnel full control over all files in a range of users. This would require Docker to run with user-id mapping, so that the range of users in the ACL could be specified.

Cons:

https://askubuntu.com/questions/705489/allow-a-lxc-container-user-to-write-as-an-external-user-to-a-mounted-directory

buchanae commented 7 years ago

Docker Engine plugin

There may be something possible with Docker plugins. No idea. https://docs.docker.com/engine/extend/#developing-a-plugin

adamstruck commented 6 years ago

Throwing this up here before I forget.

udocker

A basic user tool to execute simple docker containers in user space without requiring root privileges.

https://github.com/indigo-dc/udocker

buchanae commented 6 years ago

Here's another: dx-docker https://wiki.dnanexus.com/Developer-Tutorials/Using-Docker-Images

mr-c commented 6 years ago

docker -u gets my recommendation; the requirement to be okay with non-root is in line with "Recommendations for the packaging and containerizing of bioinformatics software" 10.12688/f1000research.15140.1

denis-yuen commented 6 years ago

For clarity, it seems to me that there is a bit of a difference between writing a Docker container that uses the user command to set a specific non-root user inside the container as opposed to writing a Docker container that is tolerant of being run as a seemingly random user.

Is the proposal to use docker -u to override just the containers that would otherwise run as root or to use it to override the users inside containers across the board?

buchanae commented 6 years ago

Semi-related: we've discussed ways of moving the docker run command into the config and adding a template language. This would give people flexibility to use non-docker executors, and to tweak behavior like this.

Is the proposal to use docker -u to override just the containers that would otherwise run as root or to use it to override the users inside containers across the board?

Seems like the answer could change per-user. For me, I'd want to always force the user and even remove the capability to switch users.

jvkersch commented 4 years ago

This is coming up for us in a different context, but the way to fix it would probably be the same. We are using Funnel to run commands in biocontainers images. With BioContainers the user running the command is usually the biodocker user, which corresponds to UID 10000 inside the container. On the host UID 10000 may not exist or be a totally unrelated user with no permissions to write to anything at all.

Are there any other workarounds beyond those mentioned above? Happy to discuss or to help contributing to a solution.

kellrott commented 4 years ago

@jvkersch I want to make sure I understand the use case. You are deploying Funnel, using the shared file system storage driver, the Biocontainers act as user 10000 inside the containers, produce a bunch of files, which are then created and belong to a non-existent user, and they are unusable.

If Funnel wasn't in the loop, what would your docker invocation look like?

One solution (and I'm kind of guessing here) might be just to completely avoid the shared file system. Setup a S3 server using Minio ( https://github.com/minio/minio ) and configure funnel to talk to that.

jvkersch commented 4 years ago

@kellrott That pretty much sums it up. If I were running the biocontainers container manually, I would probably use the --user flag to override the user (or the group, and then set the sticky bit on the shared folder). To be honest, I have not tried this out so there may be problems with this approach independent of Funnel.

In the end, we are looking into two solutions. One is to avoid the shared filesystem as you recommended (and Minio is indeed a very good way to do so), and the other one is to provide a different docker executable to Funnel, one that modifies the command-line arguments specified to it and then calls through to the "real" docker command (which may be docker, udocker, ...)

kellrott commented 4 years ago

The 'docker wrapper' solution is one we ended up using on our local cluster for some testing. Right now the docker worker code just invokes whatever docker it finds ( https://github.com/ohsu-comp-bio/funnel/blob/master/worker/docker.go ). We can try to expand the config system so that could be manually changed. But the faster way to test if altering the docker invocation would work might be to write a docker wrapper, and add it to the PATH before the real one. So before turning on the funnel worker,

PATH=$HOME/dockerhack:$PATH

And in $HOME/dockerhack/docker have

#!/bin/bash
/usr/bin/docker -u `id -u` $@