parthenon-hpc-lab / parthenon

Parthenon AMR infrastructure
https://parthenon-hpc-lab.github.io/parthenon/
Other
112 stars 36 forks source link

Running mpi as root in Docker Containers is problematic #128

Closed JoshuaSBrown closed 2 years ago

JoshuaSBrown commented 4 years ago

There is a current problem with the ci where by default the docker images execute as root. This is not ideal and mpi complains. It would be nice to pursue a solution that runs the docker container as a user.

['/opt/openmpi/bin/mpiexec', '-n', '8', '/builds/pgrete/parthenon/build-debug/example/calculate_pi/pi-example', '-i', '/builds/pgrete/parthenon/tst/regression/test_suites/calculate_pi/parthinput.regression'] 241 [runner-3noJE-eM-project-17263873-concurrent-0:00910] Error: Unable to get the current working directory 242 -------------------------------------------------------------------------- 243 mpiexec has detected an attempt to run as root. 244 Running as root is strongly discouraged as any mistake (e.g., in 245 defining TMPDIR) or bug can result in catastrophic damage to the OS 246 file system, leaving your system in an unusable state. 247 We strongly suggest that you run mpiexec as a non-root user. 248 You can override this protection by adding the --allow-run-as-root 249 option to your command line. However, we reiterate our strong advice 250 against doing so - please do so at your own risk. 251 --------------------------------------------------------------------------

JoshuaSBrown commented 4 years ago

@junghans Do you have any ideas of how to get around running as non root user in a docker image in the ci?

JoshuaSBrown commented 4 years ago

@pgrete I will probably need some help with this since you have access to the host machines. Where is the cuda10.0-mpi-hdf5 image coming from is this on Docker hub or is this a local image? If you built it locally maybe there is a way we can change it so that it will run by default as the host user as opposed to root.

junghans commented 4 years ago

just add --allow-run-as-root to the mpiexec options or use a container what has a non root user created.

JoshuaSBrown commented 4 years ago

@junghans I already tried that but it leads to other problems. Particularly when I run regression tests. The regression test script is unable to open any files written using the mpi root calls due to permissions issues.

JoshuaSBrown commented 4 years ago

469 Analysing Driver Output 470 ***** 471 Summary file not accessible 472 Traceback (most recent call last): 473 File "/builds/pgrete/parthenon/tst/regression/run_test.py", line 122, in 474 main(**vars(args)) 475 File "/builds/pgrete/parthenon/tst/regression/run_test.py", line 69, in main 476 test_result = test_manager.Analyse() 477 File "/builds/pgrete/parthenon/tst/regression/utils/test_case.py", line 228, in Analyse 478 test_pass = self.test_case.Analyse(self.parameters) 479 File "/builds/pgrete/parthenon/tst/regression/test_suites/calculate_pi/calculate_pi.py", line 86, in Analyse 480 pi_val = float(words[2])

JoshuaSBrown commented 4 years ago

I guess I could try to also escalate the permissions of the regression testing script but I think this would be going in the wrong direction. --allow-run-as-root is not an optimal solution, after all why does it really need to be run as root.

junghans commented 4 years ago

--allow-run-as-root doesn't change the user nor the permissions, so your above error must have a different reason.

pgrete commented 4 years ago

We didn't encounter problems using --allow-run-as-root in the CI for K-Athena, see https://gitlab.msu.edu/gretephi/kathena/blob/master/.gitlab-ci.yml I budgeted time to work on Parthenon tomorrow. So if you like I can have a look at the issue (might be easier given that I have direct access to the CI machine).

JoshuaSBrown commented 4 years ago

@pgrete that would be helpful, it's a little hard for me to figure out what is going on. I don't know why the summary file is inaccessible.

pgrete commented 4 years ago

This is fixed in #124. Linking PR to close this issue automatically on merge.

JoshuaSBrown commented 4 years ago

@pgrete I think it might be worth keeping this issue open until we have a container that does not require root user. There are also security reasons to not run containers as the root user. Also do you know where the docker image is coming from? I do not see it listed on docker hub.

pgrete commented 4 years ago

As far as I know docker runs as root by design (which is also the reason why it's unpopular on supercomputers in terms of user rights on host [not in the container] and using shared filesystem in the container -- some of these issues are addressed by Singularity which is now more often available on clusters). What kind of security concerns do you have in the context of CI? The key idea of having/using docker in first place is isolation (so increased security over running it as a normal user on the host with direct filesystem access).

The docker image we currently use has been has been created locally. https://gitlab.com/pgrete/kathena/-/wikis/Continuous-integration should be roughly the script that I wrote to generate the image.

JoshuaSBrown commented 4 years ago

This is not an immediate concern, as we are not using the image on our systems. But here is a link with a good discussion of the security issues running as root in a Docker container pose: https://medium.com/@mccode/processes-in-containers-should-not-run-as-root-2feae3f0df3b. If we were to ever use this image on our systems this would need to be addressed. There is a work around though. One simply need to declare a user in the Dockerfile with a known acceptable user id etc.

There are other reasons to not run as root as well. It is ideal to mimic the behavior of someone who would be installing the software, in such cases they are unlikely to be installing as the root user.

JoshuaSBrown commented 4 years ago

@pgrete I have created a separate repository for automating the creation of the images used by the ci. https://github.com/lanl/parthenon-buildenv/tree/master

junghans commented 4 years ago

@pgrete I have created a separate repository for automating the creation of the images used by the ci. https://github.com/lanl/parthenon-buildenv/tree/master

@JoshuaSBrown have a look at https://github.com/votca/buildenv/blob/master/.github/workflows/continuous-integration-workflow.yml

Yurlungur commented 2 years ago

Appears resolved.