Open buchanae opened 7 years ago
Another issue here is that the file written by the container might not be writable by the group, which leads to errors when the owner is root and you/Funnel are trying to delete/modify the file on the host system. Is it possible to force the file to writable for the group?
Another issue:
If a tarball is extracted in the container as the root user, the setgid behavior doesn't apply and the files end up with the user/group ID of the files in the tarball.
Switching the focus of this issue since it seems setgid isn't a complete solution.
These are the options that have been discussed.
This would run chown
recursively on all the output files. This would either require elevated permissions for the Funnel worker, or a post-processing Docker container.
Cons:
This would run all containers with the -u
flag, which sets the user inside the docker container.
https://github.com/ohsu-comp-bio/funnel/pull/46
Cons:
~/
This would use linux's setgid behavior on the Funnel working directory, which ensures that files created in that directory and all sub-directories have a specific group (with exceptions).
CAP_CHOWN is the linux capability Funnel would need to access chown
, so it doesn't necessarily need root.
Cons:
https://superuser.com/questions/381416/forcing-group-and-permissions-for-created-file-inside-folder
This would include a FUSE filesystem in Funnel, which would host the working directory (where task outputs are written). With low-level file system control, we might have more control over permissions mapping.
FUSE is on macOS requires installing OSXFUSE but Docker for Mac seems to handle users for you, so maybe FUSE wouldn't be needed there.
https://unix.stackexchange.com/questions/198590/what-is-a-bind-mount
Cons:
This would provide a non-Docker alternative to running Funnel in an environment where root access is not appropriate.
This would leverage linux ACLs to give Funnel full control over all files in a range of users. This would require Docker to run with user-id mapping, so that the range of users in the ACL could be specified.
Cons:
There may be something possible with Docker plugins. No idea. https://docs.docker.com/engine/extend/#developing-a-plugin
Throwing this up here before I forget.
A basic user tool to execute simple docker containers in user space without requiring root privileges.
Here's another: dx-docker
https://wiki.dnanexus.com/Developer-Tutorials/Using-Docker-Images
docker -u
gets my recommendation; the requirement to be okay with non-root is in line with "Recommendations for the packaging and containerizing of bioinformatics software" 10.12688/f1000research.15140.1
For clarity, it seems to me that there is a bit of a difference between writing a Docker container that uses the user command to set a specific non-root user inside the container as opposed to writing a Docker container that is tolerant of being run as a seemingly random user.
Is the proposal to use docker -u
to override just the containers that would otherwise run as root or to use it to override the users inside containers across the board?
Semi-related: we've discussed ways of moving the docker run
command into the config and adding a template language. This would give people flexibility to use non-docker executors, and to tweak behavior like this.
Is the proposal to use docker -u to override just the containers that would otherwise run as root or to use it to override the users inside containers across the board?
Seems like the answer could change per-user. For me, I'd want to always force the user and even remove the capability to switch users.
This is coming up for us in a different context, but the way to fix it would probably be the same. We are using Funnel to run commands in biocontainers images. With BioContainers the user running the command is usually the biodocker
user, which corresponds to UID 10000 inside the container. On the host UID 10000 may not exist or be a totally unrelated user with no permissions to write to anything at all.
Are there any other workarounds beyond those mentioned above? Happy to discuss or to help contributing to a solution.
@jvkersch I want to make sure I understand the use case. You are deploying Funnel, using the shared file system storage driver, the Biocontainers act as user 10000 inside the containers, produce a bunch of files, which are then created and belong to a non-existent user, and they are unusable.
If Funnel wasn't in the loop, what would your docker invocation look like?
One solution (and I'm kind of guessing here) might be just to completely avoid the shared file system. Setup a S3 server using Minio ( https://github.com/minio/minio ) and configure funnel to talk to that.
@kellrott That pretty much sums it up. If I were running the biocontainers container manually, I would probably use the --user
flag to override the user (or the group, and then set the sticky bit on the shared folder). To be honest, I have not tried this out so there may be problems with this approach independent of Funnel.
In the end, we are looking into two solutions. One is to avoid the shared filesystem as you recommended (and Minio is indeed a very good way to do so), and the other one is to provide a different docker
executable to Funnel, one that modifies the command-line arguments specified to it and then calls through to the "real" docker command (which may be docker, udocker, ...)
The 'docker wrapper' solution is one we ended up using on our local cluster for some testing. Right now the docker worker code just invokes whatever docker it finds ( https://github.com/ohsu-comp-bio/funnel/blob/master/worker/docker.go ). We can try to expand the config system so that could be manually changed. But the faster way to test if altering the docker invocation would work might be to write a docker wrapper, and add it to the PATH before the real one. So before turning on the funnel worker,
PATH=$HOME/dockerhack:$PATH
And in $HOME/dockerhack/docker
have
#!/bin/bash
/usr/bin/docker -u `id -u` $@
Task outputs often become inputs to other tasks, such as when TES/Funnel is being used by a workflow engine. When containers write output files, they are usually owned as root, which creates a tricky problem of file ownership on the host.
The best solution we've found is to use setgid on a directory to ensure the group owning the file is the group the Funnel server is in.
Funnel code should do the setgid automatically when creating a working directory, with logs explaining what is happening.
What happens when the working directory already exists? Do we modify it to have these permissions? Do we log a warning without modifying anything?