Environment Var For --allow-run-as-root

ax3l commented 6 years ago

Background information

What version of Open MPI are you using?

openmpi 3.0.0

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

from source, via the spack package manager inside a Docker container.

Please describe the system on which you are running

Operating system/version: Ubuntu 16.04
Computer hardware: x86
Network type: Ethernet

Details of the problem

I would like to control the --allow-run-as-root flag via an environment variable (or config file) as well.

Running as root in Docker containers is fine and partitioning by exporting an environment variable (e.g. autotools equivalent is FORCE_UNSAFE_CONFIGURE=1) would make it way easier to interact with mpirun/mpiexec for deeply integrated workflows (where directly editing the final mpi call might not always be possible).

jsquyres commented 6 years ago

I hear what you're saying, but I'm not sure it's the right approach. We put the --allow-run-as-root option there only with very, very deep reservations. We strongly advise against using it. HPC / MPI jobs should not need to be run as root.

Specifically: --allow-run-as-root is already bad enough. I really, really do not want to provide another trapdoor to already-unsafe functionality; that's just going in the wrong direction. Is there a better reason to do it other than "all the other kids in Dockerland do it"?

If Docker requires running MPI jobs as root, it seems like a better solution would be to fix that (i.e., enable Docker jobs to not run as root).

Just my $0.02; sorry to be a grumpy old man...

ax3l commented 6 years ago

Thank you for the detailed answer. I understand your concerns and I fully share your thoughts on regular systems.

Nevertheless, there is an important difference to be aware of when running something in a Docker container. Running containers are fully isolated (tm), which means it is not being started as hostsytem root (but from a user with user rights). Only internally, inside the container, the (container-)root user is active - which really is fine and save. The only thing that can happen is, that the "active copy" of an image, which when it is running is called container, self-destructs. But this does not affect the image it came from (which can be restarted just from that state) nor does it harm the host system.

"all the other kids in Dockerland do it"

Yes, it's a fundamental crazyness in Docker to be inside as a new user (root) and assumes something like full isolation really exists. Other solutions such as Singularity spin this way better, keeping the external user transparently as the in-container process owner and adding a transparent layer on top of the FS. Of course, I can also add a local user just to satisfy MPI inside docker (which is still another user), but this is adding no safety net (since it is already there) and is just a hard to maintain work-around.

Long story short: for Docker --allow-run-as-root is really needed, it's very save to use it in Docker since it can not destroy things that are not "checkpointed" in images.

bwbarrett commented 6 years ago

@ax3l, when we've looked at best practices for running MPI in Docker containers (under Amazon ECS or AWS Batch), we didn't run into any real issues using a "normal" user to run MPI applications, and were much happier with the overall posture of the system. Yes, you're right that it's the container and not the host system running a root process, but there's still plenty of reason to be careful. For example, you do create some natural blast radius for filesystems mounted into your container (like an NFS/Lustre system with everyone's data on it).

ax3l commented 6 years ago

using a "normal" user to run MPI applications

I did that as well as a work-around in the past but this complicates, e.g. the import of such Docker images into other solutions such as Singularity. Also, see the argument below about "who creates images" for a HPC workflow (the user).

[we] were much happier with the overall posture of the system

Can you please elaborate a little on that? In what sense?

In an ideal world, a (containerized) process should never have more rights (including FS access) than the user that executed it, independent of what userid it presents internally. In that sense, one would not mount a HPC file system directly inside docker but on the host system and then forward it with the same rights a user has.

filesystems mounted into your container

I would not do that anyway (see above) but let FS mounts always be managed by the actual host system. Then only "docker mount" a user-visible view on a FS. One also needs to think about container workflows, where users should be allowed & flexible enough to adjust their containers and then could do anything to the FS if they control the direct mount / lustre clients / ... We would not want that.

jsquyres commented 6 years ago

I'm not sure I understand your reply -- you seem to be saying that the container should be running with limited rights on the host. I was not under the impression that that was (currently) possible. E.g., if you are root in the container, you can be root on the underlying filesystem.

Regardless, yes, we know that the --allow-run-as-root option is awkward. It's intended to be, if for no other reason than the emphasize that MPI jobs should never be run as root.

Meaning / to be blunt: "it's awkward" is not a strong enough reason for us to implement another trapdoor to already-unsafe behavior. Is there something you can't do -- in any way -- without having --allow-run-as-root functionality being able to be invoked via an env variable and/or config file?

rhc54 commented 6 years ago

err...allow-run-as-root is only available on the cmd line - there is no envar or config file option for controlling it. I believe that actually was the request

jsquyres commented 6 years ago

I know. I'm asking if there's something that can't be done (via any mechanism) unless that functionality is invoked via an env var or config file.

ax3l commented 6 years ago

Yes, running simple mpi programs and test suites that just run mpiexec will not run in containers out of the box without sed-ing through them.

Isn't that bad enough? I think a person that sets a central config file to allow run as root or even an environment variable should be aware of what he/she is doing, we don't need to make it awkward.

zach-nervana commented 6 years ago

I just ran into this issue as well. Requiring non-root in docker can be very inconvenient, since many existing dockerfiles and recipes assume a root user. Running as a non-root user often requires extra work. I understand that having some default safety features can be nice, but this particular method for implementing it is resulting in everyone using docker to either mix infrastructure details into their code, or modify the rest of their infrastructure which uses (or shares dockerfile images with code which runs) mpi to allow running as non-root.

jsquyres commented 6 years ago

@ax3l To be clear: that was not my question. I asked if there was something that can't be done (via any mechanism) unless that functionality is provided via an env var or config file.

sed-ing through a Dockerfile qualifies as "any mechanism".

I'm trying to ask if there is a technical reason to require an environment variable or config file for some reason. The "I don't want to modify my Dockerfile" motivation is already established -- but that's not a technical reason. As I said above, we do provide an [intentionally] awkward way to get this functionality. Is there a technical reason to require that we provide this functionality via env var and/or config file option (other than "I don't want to modify my Dockerfile")?

Additionally: is anyone leaning on Docker to fix this scenario? Forcing everything to run as root is not good practice, and is the real root of the problem (see what I did there? 😉).

One thing I'm not clear on: even if we provide an env var or config file option for this functionality, you have to add that to the container somehow (i.e., modify/add this to the Dockerfile), right? Am I missing something?

zach-nervana commented 6 years ago

There isn't anything that can't currently be done inside of docker with the current code (by any mechanism).

To be more clear about why we see this is an issue, having to modify a dockerfile at all is not the problem. The problem is that it requires making changes to components completely unrelated to mpi as running as root is a standard practice in docker. For example, if you already have a recipe for installing or configuring a package for use as root used throughout your infrastructure, you will no longer be able to include that recipe in your dockerfile without conditioning that recipe on whether or not mpi will also be used. Then you would also need to maintain a root and a non-root recipe for every component. Alternatively, you could modify every component in your system to support non-root, just to use mpi. This additional coupling between mpi 'config' and every other component of the system is the problem. Setting an environment variable on the other hand is not going to have any unintended consequences with the rest of the system.

jsquyres commented 6 years ago

@zach-nervana Just to be clear: you can run Open MPI jobs as root, you simply have to specify --allow-run-as-root on the mpirun command line. We strongly recommend not running as root, but you can certainly do so if you wish. So I'm not sure I follow the "Open MPI requires modifying every recipe in the system" argument...?

Setting an environment variable on the other hand is not going to have any unintended consequences with the rest of the system.

Isn't the same thing true with adding --allow-run-as-root?

zach-nervana commented 6 years ago

This is true, but as you've mentioned you don't really want your code to always have that flag set as it may be unsafe for many cases. The decision to run as root is a deployment level concern and so would be most naturally be set inside of a docker container/through an environment variable instead of baking it into your code, which might be deployed by different people in different contexts.

jsquyres commented 6 years ago

You have to modify the Dockerfile to either set an environment variable or set this CLI flag, right? If so, I fail to see the difference.

Additionally, you basically said above "everyone runs as root in a Docker", so you're somewhat contradicting yourself.

However, if that's not true (i.e., you can sent an environment variable inside a Docker container from outside the Dockerfile), then why can't you do something like:

# Outside the container
user$ export MY_MPIRUN_CLI_ARGS="--allow-run-as-root"
user$ start_the_container my_dockerfile

# Inside the container
root$ mpirun $MY_MPIRUN_CLI_ARGS ...

Alternatively, one could avoid the environment variable issue altogether by just creating a user in the container:

# Inside the container
root$ adduser mpiuser
root$ su -c "mpirun ..." mpiuser

Clearly I am missing something in your argument... 😦

jsquyres commented 6 years ago

I just realized I missed another point from your previous reply: using --allow-run-as-root does not mean that you have to run as root. As the CLI name directly states, it allows you to run as root -- but does not force you to do so.

Meaning: you can pass that CLI argument in all your Dockerfiles, regardless of whether they will be deployed as root- or non-root-runnable situations.

It seems like there are 3 scenarios:

You always run commands in your containers as root. In such situations, --allow-run-as-root is appropriate.
You never run commands in your containers as root. In such situations, mpirun works just fine without the need for --allow-run-as-root.
You have a container that can be deployed as either running its commands as root or as non-root.
1. If you're constructing / assembling that Dockerfile on the fly (e.g., based on the system, environment, or other external requirements), then the choice to adding --allow-run-as-root or not can simply be one of the factors considered while assembling that Dockerfile.
2. If you have a fixed/static Dockerfile that can deploy to either run commands inside the container as root or run commands inside the container as non-root, then you can:
  1. Add --allow-run-as-root and it'll work in both cases, or
  2. Use some variant of the env variable method I showed in the previous reply (if that works -- I don't know if env variables propagate that way into containers), or
  3. Use the adduser mechanism I showed in the previous reply

Taking a completely simplistic view: if you want to run as root, then embrace it and always use --allow-run-as-root. Sure, we don't recommend this. But if you're in the case where you are possibly running as root at all, then you're already outside of our recommendations. Wearing my just-get-the-job-done hat, you might as well just hard-code the use of --allow-run-as-root.

Am I missing something?

ax3l commented 6 years ago

Btw, is --allow-run-as-root OpenMPI only? That means it adds to the diversity of mpirun/mpiexec flags in the MPI eco system if not controlled by an env var (which other implementations would just ignore).

This is also something which is not very desirable from an application perspective to be honest :-)

jsquyres commented 6 years ago

Yes, --allow-run-as-root is specific to the Open MPI family of implementations. There are lots of mpirun CLI options that are specific to the Open MPI family (just as there are mpirun-specific CLI options in other MPI implementation families). Hence, it is quite possible that you already have to customize your mpirun command line depending on which MPI implementation family you are running.

Sidenote / FWIW: the MPI Forum has extensively discussed standardizing the command line, but the conversations always run into deep problems between the different requirements of the different MPI implementations and sets of users. Hence, the best that the MPI Forum can do in terms of standardization is http://mpi-forum.org/docs/mpi-3.1/mpi31-report/node228.htm#Node228, which basically says "If an MPI implementation has a command line starter, please name it mpiexec and please try to support these CLI options. You don't have to, and you can also support other CLI options if you want to." FWIW, Open MPI has mpiexec (it's just a sym link to mpirun) and it supports most of those options.

As mentioned above, --allow-run-as-root is deliberately intrusive because running as root explicitly goes against our recommendations.

ax3l commented 6 years ago

Thank you for the feedback, I understand your argument. Nevertheless "deliberately intrusive" is intrusive in more places than you might think and where it is not required to be so.

Let's take for example install routines and CMake scripts that include tests ("make test"), e.g. with CTest. They are in principle possible to work with small examples with any MPI implementation as long as -n is implemented.

What happens now when someone decides to perform this routine in an environment, where the "root" user is ok and save? It will fail with your clean error message for good. All right, so now they rethink their environment and come to the conclusion it is save to run the test as root user, e.g. be cause they are partitioning a docker container.

Now they could set an environment variable and re-try if they are really sure it is save to proceed as root. The alternative is project-specific options, e.g. in CMakeLists.txt or configure options. They do not help anyone (packagers, sysadmins, users) because they are different for each software for the same task.

An environment variable would unify the workflow by default. Set it, re-do make test, done. For any MPI project! And without loosing portability across MPI implementations.

But what if this env control does not exist? People, like we, will guess and add switch-cases at places where the MPI flavor is unnecessary to know - such as CMake scripts. And we/people hard-code work-arounds like this:

https://github.com/ComputationalRadiationPhysics/libSplash/blob/2b5df2b72b30a58d02dbf0b8d69ea586531eb864/CMakeLists.txt#L391-L399

that is making the install + make test more robust since the test will actually run (as long as OpenMPI is used).

Now you can say: "huh, that's dangerous, you should make this an option!" - and you are right. So we are back to project-specific options for tests and launches because we developers wrap MPI calls.

Instead, we could have of a common way to control it across the whole eco-system via an environment var.

OpenMPI will not be alone with such an env var. Autotools react on FORCE_UNSAFE_CONFIGURE=1 since they discourage root as you do.

ax3l commented 6 years ago

Alternatively, one could avoid the environment variable issue altogether by just creating a user in the container

This is unnecessary. Some modern container solutions such as singularity partition your container (configure - make - make test - make install) as container root user but run the container itself with a lightweight overlay of the filesystem as the current user exactly as you recommend it. Adding new users and unused directories is superfluous.

(The point is: of course I do run full integration test in the end for a container with reduced user rights. But I won't have access to temporary "make test" stages of individual components anymore for component tests.)

Also, most of the time one will just import a docker image - in both cases partitioning the container is done as "container root user".

thananon commented 6 years ago

Wow. This doesn't look healthy. I'm all for discussion but bashing people on commit message is just something else.

I'm closing this as it seems to be resolved.

ax3l commented 6 years ago

Hi, OP author here. I am not supporting the personal language in the linked commit nor am I affiliated with it in any kind.

I would like to continue the discussion based on objective feedback from various applications that benefit from OpenMPI. There are many people applying these work-arounds in scripts, CIs, etc. where they could control it better from the outside, e.g. when they test against various MPI applications.

The issue is not resolved yet, please let us try to find a solution that does not divert MPI implementations for users, packagers and integrators.

thananon commented 6 years ago

Hi,

I will reopen this for discussion. I understand your frustration and I'm in no place to weight in on this but this is an interesting topic.

So in our face to face meeting in March, we brought this issue up and I remember everyone is against having an env var.

We have some catastrophic mistakes in the past when someone run mpirun with root privilege and we put some protection in place which is not really difficult but awkward to use (intentionally).

It is very easy to detect Open MPI. You can have a script to locate the command ompi_info (comes with only Open MPI). If it exist, maybe you can set the alias of mpirun to include --allow-run-as-root.

Your ompi solution seems to work as well but I know you don't want to have your own custom ompi.

ivotron commented 6 years ago

My 2 cents: I'm testing (in a container) the code that someone else wrote but unit tests from the project are failing due to running mpirun as root. Since I didn't write the code, I had to invest a significant amount of time trying to find where mpirun is being invoked and then use sed to add the --allow-run-as-root flag. Having an environment variable would have saved time and effort.

bangerth commented 6 years ago

The discussion is already long, but I'll add my $0.02 as well: We have the same issue in the deal.II and ASPECT projects (https://github.com/dealii/dealii, https://github.com/geodynamics/aspect): We have extensive testsuites that call mpirun -n X for some tests. These testsuites are also executed on regression testers and continuous integration machines that set up the environment for testing inside docker containers. This now can't be done without seding into the scripts these projects set up to test.

What's particularly awkward is that if you already have a script that does this:

  git clone abc
  mkdir build
  cd build
  cmake ....
  make
  make test

(In practice, this takes 500 lines and consequently is pre-packaged into a script that's handed around.) The problem here is that you need to (i) know which files cmake .... creates so that you can sed them as appropriate, but more importantly (ii) the place where you have to call sed is in the middle of this script that probably came from somewhere else -- maybe checked out from some other git repo everytime you start the docker image. An even worse problem appears if the mpirun calls are only setup inside the make test execution, i.e., you can't put the sed calls between make and make run -- you have to adjust the cmake input files. That's way down in the hierarchy.

From the perspective of the person who sets up the docker image, calling a script like the one above is atomic, or at the very least the individual commands inside it are atomic. You can't slice into these commands, because the person who sets up the container is (i) generally without the detailed knowledge of where to the files that need to be edited are, and (ii) even if they knew, they'd make themselves dependent on the source structure of the project.

In the case of continuous integration services, you want to test patches. If a patch moves an automatically generated file somewhere else, you'd also have to adjust the docker image that's supposed to test the patch. That's not feasible in practice.

I do understand your reluctance to make it easier to run as root. That's good practice. But I fail to see how people would accidentally shoot themselves in the foot with an environment variable. --allow-run-as-root is not an option people just switch on without knowing what they do. It's people who face a very specific problem, go out on the web to figure out how to solve it, and then employ the solution. If you want to make it even more difficult for people to shoot themselves in the foot, do something like this:

  OMPI_ALLOW_RUN_AS_ROOT_X
        An environment variable that controls whether `mpiexec` is run as if `--allow-run-as-root` has been
        specified on the command line. Doing so is dangerous and not encouraged, but if you need
        to do so, you need to set two environment variables in the following way: a variable of the
        name above with X replaced by 1 needs to be set to 1, and one with X replaced by 2 needs to
        be set to 2.

We'd all be happy to jump through these hoops, and you make the procedure cumbersome enough that everyone who casually thinks about it has to at the very least read the documentation carefully.

rhc54 commented 6 years ago

Hmmm...now that is actually a quite reasonable proposal. I have no objection to implementing that approach.

@jsquyres @bwbarrett ?

jsquyres commented 6 years ago

@bangerth That's actually a very interesting idea (2 env variables). I think it might satisfy our requirement of "if you do this, you should really know what you are doing and not do it by accident and you accept all responsibility for what happens." It's not going to prevent people from copy-n-pasting this without realizing what they're doing, but it will help -- i.e., some people might actually read what they're pasting.

ax3l commented 6 years ago

Thanks, I am glad we found a compromise :+1:

As a personal note, I think the solution is patronizing/nannying your users more than you might think. But I can live with that.

As constructive criticism: instead of repeating twice that running as root is dangerous, rather document clearly what the design choices of OpenMPI are that make running as root insecure/potentially harmful. That is totally fine and users can then judge if these risks apply to their use case. Make it a chapter of your docs. Say once for an option it's dangerous in its doc string and refer to the chapter.

rhc54 commented 6 years ago

I think you perhaps don't realize the situation, or maybe are only seeing it from your own perspective. This wasn't us being arbitrary for the sake of it. We had actual cases where users rendered their systems useless chunks of metal because of this problem, and we absorbed their anger.

Documenting the risk made no difference to the situation - frankly, the people most at risk from making this mistake are also the ones who don't read the documentation. It is a rather common trait nowadays - I confess to rarely reading the manual myself. The more convinced someone is that they know what they are doing, the less likely they are to read the documentation or even the man page.

Having someone indicate they intend to override the protection, and then confirm their choice, is actually a rather common practice. It is now standard on most potentially destructive operations (e.g., erase a partition), so I don't think we are being patronizing here.

ax3l commented 6 years ago

Thank you for your thoughts. I think it would help if the actual issues that were seen are referenced (mailing list threads? bug reports?) and he underlying architectural reasons are documented.

All a user will find in the official docs seems to be this: https://www.open-mpi.org/doc/v3.0/man1/mpirun.1.php#sect22

which is as brief as any answer in this thread :-)

Googling deeper (which user's probably won't) shows (fixed) bugs like these: https://svn.open-mpi.org/trac/ompi/ticket/4534

But can you maybe add more insight to the docs/man pages? If one has a section on why running as root is risky, why not document the underlying reason for users?

jsquyres commented 6 years ago

You cited the exact issue: https://svn.open-mpi.org/trac/ompi/ticket/4534

In brief: when MPI apps start up, they create a shared session directory tree. When they shut down, they remove the entire tree. There was a bug in Open MPI that caused the recursive directory removal to go higher than intended -- i.e., it ultimately could end up invoking rm -rf /. This wasn't a problem if you weren't running as root -- the removal just failed as soon as it tried to remove something that your user didn't own/couldn't remove. But if you were root, the results were typically fairly disastrous. Needless it say, it was definitely a bug, and one that we fixed as soon as we learned of it.

Specifically: it's not that Open MPI has a design that is risky -- it's just that bugs happen. Bugs that occur when running as a regular user can be bad enough; bugs that occur when running as root can have far more serious consequences -- even when running in a container. Our choice after https://svn.open-mpi.org/trac/ompi/ticket/4534 was to try to limit the scope of damage if/when future bugs occur.

But the issue underscored the axiom that userspace applications should be run by regular users. We shouldn't have to defend this -- the whole POSIX system was designed around this philosophy. Indeed, not running applications as local administrator is an industry-wide best practice (indeed, run all applications with the fewest possible permissions). Some -- not all -- containers communities have chosen to go a different way. You may view our warnings as patronizing/nannying; we view them as fair warnings.

Please -- let's end this discussion. Everyone has stated their opinion. No one is going to change their mind at this point. We have conceded and given you an option that we didn't want to. Please let that be enough.

ax3l commented 6 years ago

Thanks, that's the exact info I was looking for. And thanks again for the great support! :sparkles:

fhh2626 commented 5 years ago

Thank you for the detailed answer. I understand your concerns and I fully share your thoughts on regular systems.

Nevertheless, there is an important difference to be aware of when running something in a Docker container. Running containers are fully isolated (tm), which means it is not being started as hostsytem root (but from a user with user rights). Only internally, inside the container, the (container-)root user is active - which really is fine and save. The only thing that can happen is, that the "active copy" of an image, which when it is running is called container, self-destructs. But this does not affect the image it came from (which can be restarted just from that state) nor does it harm the host system.

"all the other kids in Dockerland do it"

Yes, it's a fundamental crazyness in Docker to be inside as a new user (root) and assumes something like full isolation really exists. Other solutions such as Singularity spin this way better, keeping the external user transparently as the in-container process owner and adding a transparent layer on top of the FS. Of course, I can also add a local user just to satisfy MPI inside docker (which is still another user), but this is adding no safety net (since it is already there) and is just a hard to maintain work-around.

Long story short: for Docker --allow-run-as-root is really needed, it's very save to use it in Docker since it can not destroy things that are not "checkpointed" in images.

Maybe you can do (from a Chinese page): search something like "if (0 == geteuid" in:

orte/tools/orte-dvm/orte-dvm.c orte/tools/orte-submit/orte-submit.c orte/tools/orterun/orterun.c

and delete the corresponding source code. Then you can get rid off "--allow-run-as-root" forever.

open-mpi / ompi