hpc / charliecloud

Lightweight user-defined software stacks for high-performance computing.
https://hpc.github.io/charliecloud
Apache License 2.0
310 stars 61 forks source link

add PID namespace #452

Open yuvipanda opened 5 years ago

yuvipanda commented 5 years ago

Heya! It would be great to add a commandline flag that allows you to set up a PID namespace to isolate users from seeing other commands elsewhere.

Here are some of my use cases:

  1. Some applications (specifically, RStudio) try to have only one process running per system, with daemonizing and PID hacks. They behave much better inside a PID namespace than outside.
  2. When running interactive applications (like Jupyter notebooks), this gives users a more isolated, clearer view of what they are actually running.
  3. Something about having tini or similar be able to run as pid1 more easily. I don't actually know if this has any advantages, though.

In general, especially for interactive use cases, this gives you the option to provide a more isolated view of the system than otherwise.

reidpr commented 5 years ago

We can consider it. The downside is that container set up is considerably more complicated with a PID namespace. We'd need to add a second code path, basically.

Do you have links to relevant documentation from RStudio and tini?

What is tini? Google thinks you mean the Argentinian singer, which is probably not right. ;)

olifre commented 5 years ago

What is tini? Google thinks you mean the Argentinian singer, which is probably not right. ;)

I think this project is meant: https://github.com/krallin/tini In addition to different setup code path for the PID namespace, an init-like process (such as tini) is needed for correct behaviour (signal handling, reaping children etc.). This (very long) in-depth discussion explains why it's needed in much more detail: https://github.com/krallin/tini/issues/8

yuvipanda commented 5 years ago

Ah, I'm sorry. tini is https://github.com/krallin/tini - a small supervisor that can run as pid1.

I don't have particular documentation, but here are some bugs exploring that: https://github.com/jupyterhub/the-littlest-jupyterhub/issues/18, https://github.com/jupyterhub/jupyter-rsession-proxy/issues/44 and https://github.com/jupyterhub/jupyter-rsession-proxy/issues/33. Daemonizing seems to be reasonably common among old style applications that fork twice, and kill the parent - making themselves be children of pid 1...

yuvipanda commented 5 years ago

Thanks for that specific link, @olifre! That is very useful.

Now this makes me wonder what happens now with zombie processes and signal forwarding when using ch-run...

olifre commented 5 years ago

Now this makes me wonder what happens now with zombie processes and signal forwarding when using ch-run...

Without having a separate PID namespace, the system's init / PID 1 takes care of it. You only need something like tini when creating your own PID namespace.

yuvipanda commented 5 years ago

Hmm, right. I just tested it out, and ch-run leaves zombie processes around when killed. I'm surprised by this, since it almost feels like a container escape of sorts - you have a process that continues to run even though the ch-run call that spawned it has been killed. Maybe I need a mindset switch...

So when running inside something like JupyterHub, that spawns and supervises multiple calls to ch-run, I'll have to make sure that my application reaps any processes that ch-run's exec leaves behind. I guess I can also possibly run tini as the process that ch-run starts, which would probably also solve this issue.

Either way, I'm currently surprised by the behavior of zombie processes from inside ch-run, regardless of PID namespace use.

reidpr commented 5 years ago

Hello @yuvipanda; sorry for dropping this.

Can you provide reproduction steps for the zombie processes?

reidpr commented 5 years ago

Hello @yuvipanda; friendly reminder that I'd love to see steps to reproduce. Thanks!