Open deitch opened 6 years ago
cc @justincormack
What is your config.json
? In particular, do you have terminal
enabled or disabled?
Unset, which, according to the docs, means false
. Not that I ever really understood the import of the setting. :-)
As far as I can tell, terminal: false
means not to create a new pty for the container, but rather to inherit from whatever was passed to runc
. When I run runc -d
(or runc create
and then runc start
), it runs in detached mode - runc
starts it and exits - and that causes something to hang. I am unsure what.
If I try to do it as runc run
(no -d
or create
+start
), then it appears to work.
stdin
/stdout
/stderr
but still have the lifecycle control of runc create
followed by runc start
?(We really need to document this somewhere other than issues, or I should come up with a canned reply to questions about terminal
.)
There are three main cases that can occur with pty handling:
runc create
or runc run -d
indicates that runc
will not stick around -- there will be no runc
code running after the container is set up. This means that runc
cannot do any IO forwarding -- it has to be set up by the caller. The way this restriction manifests depends on the terminal setting.
terminal: true
means that a PTY will be set up by runc
-- but runc
cannot exit(2)
without another process having the file descriptor for the master PTY. This is done by sending the fd through the --console-socket=
argument.terminal: false
just means that the stdio fds will be inherited by the container (whatever they are) as the PTY will not be set up. In other words, 0..2
are set as the containers stdio. Be aware that this can cause security issues if you are not very careful when you use it (assuming you're running untrusted code in the container).runc run
means that runc
will stick around doing IO forwarding. In this case, terminal: true
and terminal: false
will look similar (except for the fact that terminal: true
creates a PTY and terminal: false
doesn't).So, if you're piping runc run -d
to a file what's actually happening is that the container process's stdio FDs are the pipe FD (you should be very careful when doing runc run -d ctr >some_file
because it gives the container indirect access to the host filesystem). Hopefully that better explains what is happening when you use terminal: ...
.
(We really need to document this somewhere other than issues, or I should come up with a canned reply to questions about terminal.)
Very much would be appreciated. I struggled with isolating my problem for a while.
console-socket argument
I couldn’t figure that out either. What gets passed? The path to a Unix domain socket, ok, but is that socket itself intended to become one of stdout or similar? How does that align with “pty will be set up by runc”?
terminal: false just means that the stdio fds will be inherited by the container (whatever they are) as the PTY will not be set up. In other words, 0..2 are set as the containers stdio.
This is exactly what I was trying to do. Why in this case does setting stdout or stderr to existing stdout/stderr or to an actual file work, but to a pipe or a buffer hang?
Hopefully that better explains what is happening when you use terminal:
It does, thank you.
@cyphar re: documenting; should there be a “docs” or “examples” directory in this repo?
@deitch
I couldn’t figure that out either. What gets passed? The path to a Unix domain socket, ok, but is that socket itself intended to become one of stdout or similar? How does that align with “pty will be set up by runc”?
--console-socket /path/to/unix/socket
tells runc to send the file descriptor for the master end of the PTY device through the socket using SCM_CREDENTIALS (in case you're not aware, on Unix you can send file descriptors through sockets and the receiving end then has a dup
of that file descriptor). runc
creates the PTY, but it sends the master to another process since you asked runc
to exit (the master end needs to be open otherwise the container will get a SIGHUP
or something similar, and you won't be able to interact with the stdio -- and exit
will close the file descriptor that runc
has for the master).
--console-socket
and (while it's quite a long thread) that includes the discussion over the interface as well as usecases and so on (but yes -- that also means that I'm the one to blame for this one :wink:).Why in this case does setting stdout or stderr to existing stdout/stderr or to an actual file work, but to a pipe or a buffer hang?
I am not sure, and I will take a look at this when I get a chance.
@thaJeztah Yeah we probably should have a docs
(https://runc.io/ used to be our "soft" documentation, but it looks like that website has been taken down and now just points to this repository -- which is sort of useless). Especially since stuff like this isn't documented anywhere (because all of the magic crap in --console-socket
is completely out-of-spec for OCI -- because we had to punt on terminal
being meaningful in the specification).
@cyphar so it creates all three stdio descriptors, and then passes them via the socket? So I
runc create
to complete and then retrieve it from thereIs that it? Why didn’t it just let me pass to it —stdin dev —stdout dev —stderr dev
instead of the whole complex “have runc create it and I have to make a place to retrieve it”?
I am not sure, and I will take a look at this when I get a chance
Thanks. Much appreciated. I started to dig into the code myself, but not being familiar with it from the get go means it will take me a lot longer.
I’m having a hard time reasoning about what could hang with a pipe or byte buffer that wouldn’t with a file or stdout (which is a file handle, but then again, so is a pipe).
And if I ever actually fully get it, happy to open a PR for docs for this part.
👍 for documentation about terminal behavior with runc. I also have a relevant question in a related ticket #1716
@deitch
Is that it? Why didn’t it just let me pass to it —stdin dev —stdout dev —stderr dev instead of the whole complex “have runc create it and I have to make a place to retrieve it”?
--console-socket
only applies if you have terminal: true
-- this means "please create the pty for me, from inside the container's /dev/pts/...
mount, during the rest of the container setup". The "from inside the container" part is quite important, and there were many bugs (including sudo
not working inside containers) that were caused by the pty not being created inside the container.
Now, you could manually do the above instead of using --console-socket
(by doing a runc exec
and then emulating what I just listed by opening a socket and using --preserve-fds
for the container process to inherit it), but in order for Docker and other upstream users to be able to have a sane terminal setup we just added an option for it.
As for why we don't have --stdin
, --stdout
, --stderr
for terminal: false
? Though I don't like the flag names (and would prefer if --preserve-fds
was more general purpose and didn't have its current lets-just-copy-Go semantics), this is something that I've been pushing for ever since I worked on #1018 (and if you wanna go for a long read, you'll see the discussion we had about this at the time) -- because the current implementation of "just inherit the stdio fds" is:
Way too magical, for an already-pretty-magical part of the codebase. People get hit by this weirdness all the time, and it just confuses everyone. It also makes the code kinda complex (but the alternative is also complex).
Ripe for security bugs, as inheriting the stdio fds for something like your terminal can be quite dangerous (see ioctl_tty(4)
and TIOCSTI
). Same goes for inheriting file descriptors that refer to directories from within the host mount namespace (very unsafe).
The main arguments that are against having specific flags for stdio (or extending --preserve-fds
as I mentioned) are:
Adding more flags to stdio handling will make it more complicated, and it's already far too complicated. Not to mention that the flags will be useless in "non-detached" mode, making the handling even more complicated. This is quite a reasonable argument, and is the reason I suggest extending --preserve-fds
(though --preserve-fds
for stdio would still be more complicated in the non-detach case -- since in that case runc run
will use pipes for the stdio).
It would break the "out of the box" case for detached containers -- running it in your terminal. I used to think that this argument was okay, but I've recently changed my mind after actually trying to imagine what a new user would see if they tried this. In short, a new user would see that their terminal has been hijacked as both the container and their shell try to read from the same terminal. This is not an out-of-the-box case that makes sense.
It is not clear what the default stdio should be in the detached case. There are obviously incorrect answers (create a logfile for stdout and stderr -- like nohup
does), but the most obvious answer (just use /dev/null
) is also not without its downsides. In particular, this would mean that the "out of the box" case for detached containers that have a shell as their pid1
will exit prematurely (shells don't like their stdin
being /dev/null
). While this all makes sense if you understand what's going on -- most new users won't know why their container didn't survive for more than a split second. The other option would be to simply not let users start a detached container without specifying stdio, which would mirror the requirements for --console-socket
. I think this is nicer overall, but also makes onboarding more complicated than it was previously.
Our detached handling is effectively identical to &
in a shell -- so users should already know how things will look. While I agree runc run -d
might superficially look like runc run &
, I don't think it make sense to use this similarity as an argument for keeping it.
I kid you not, I had to reread this several times before I was sure I got it. :-)
I still have to read #1018 in depth when I have a chance (off for a few days now), and I will.
And if I didn’t say it yet, thank you for the great detail.
Basically, it sounds like the summary is:
terminal:false
, and it inherits all your stdio. You could set an alternate with terminal:true
but usually unnecessary unless you really don’t want your calling process’s stdio to be the container process’s stdio. terminal:true
. Passing the stdio from your live real terminal to the detached container is asking for trouble. Of course, you need some way to ask runc for the file descriptors for the created streams, hence —console-socket
. An alternate approach would be to create them first and pass them to runc, but that wasn’t done, and doesn’t have real advantages over this approach. terminal:true
and retrieve the fds, or terminal:false
and pass the streams in, should be fine. For reasons unknown, the second approach doesn’t work unless the passed stdio are real stdio or files (this issue), but in principle either should. That fair?
@deitch
I kid you not, I had to reread this several times before I was sure I got it. :-)
Damn, I was trying to make it simpler to understand. :/
Your summary is pretty much on the money aside from this point:
If you’re running in foreground, non detached, normally it’d be same to run terminal:false, and it inherits all your stdio. You could set an alternate with terminal:true but usually unnecessary unless you really don’t want your calling process’s stdio to be the container process’s stdio.
This isn't correct. There are many good reasons to use terminal: true
(in general I recommend using terminal: true
over terminal: false
), mainly related to ncurses programs requiring having a terminal so they can do their terminal-switching magic. Also programs like sudo
and gpg
can require a terminal depending on their configuration. However, creating a terminal with a detached container is more work (requiring --console-socket
usage), so that should be kept in mind (and are you going to try to use an ncurses program with a detached container?).
In addition, in non-detached mode, runc
will either create a pty and then forward it to your terminal (if you have terminal: true
) or it will create pipes for the containers' stdio and then forward it to your terminal (if you have terminal: false
). You cannot make the container inherit the stdio in non-detached mode (because then the top-level runc process wouldn't be able to do any IO forwarding -- which is the whole point of non-detached mode).
Damn, I was trying to make it simpler to understand. :/
Read it not as a criticism of your writing, but a view into my brain’s addled state. :-)
Your summary is pretty much on the money aside from this point:
Basically, you are saying, it almost always makes sense to do terminal:true
. We recognize that sometimes you want the existing stdio to be passed down, whether in detached or non detached, but as a general rule of thumb, true
makes for greater sanity.
I understand it, but it feels like it goes against the grain on Unix style streams and pipes. I start processes, they inherit stdio no matter how far down, and I always can pipe into their stdin or out of their stdout/stderr.
I understand it, but it feels like it goes against the grain on Unix style streams and pipes. I start processes, they inherit stdio no matter how far down, and I always can pipe into their stdin or out of their stdout/stderr.
Yeah, I agree. But unfortunately when you're basically booting a separate operating system, the purity of pipes is less usable -- though of course note that if you don't run in detached mode this shouldn't be an issue (since runc run
just forwards output in that mode). And ultimately if you want to use containers in a "batch mode" (where you actually don't care about whether you have a terminal) then that's why terminal: false
exists.
So basically, the message is, "use terminal:true
as much as you can and use console-socket
to interact with the stdio, but if you really really really don't want to, or if you are running in foreground runc run
where it probably doesn't matter, then use terminal:true
". :-)
I will put in a PR for docs later this week.
@cyphar read through this a number of times. As far as I can tell, if you set terminal: true
(which appears to be what you recommend), then runc
will create a new pty, and, if --console-socket
is provided, it will send the fd for that pty to the socket passed.
However, this does not change what the container's process has, only allows you to "tap into it". The net effect appears to be that I (outside process) can feed to its stdin by writing to that fd, and I can read the combined stdout/stderr but reading from that fd.
But I cannot, e.g., separate stderr from stdout (i.e. fds 1 and 2 are combined onto that passed fd). If I wanted to do that, I would need to run it foreground.
Is that correct? I feel like it may be inherent to the nature of setting up a new pty, i.e. (as you put it) "basically booting a separate operating system" means you expect a terminal as a console, and that terminal doesn't really have separate stderr/stdout or pipes to/from stdio, just a console.
If you want to do more "regular Unix-y stdio things" (yes, I really just wrote that.... :-) ), then you need to run foreground or terminal:false
. Which is fine, if not recommended, except for the oddness around it hanging, which is what started this issue.
Understanding correct?
I created a PR for documenting it. In doing so, I realized that having a simple library to interact with the console-socket
, as opposed to a CLI recvtty
, would be hugely helpful. Will try a PR for that.
Abstracted out https://github.com/opencontainers/runc/pull/1731
@deitch [I am currently on vacation, so sorry for the brief response.]
I believe what you wrote is effectively correct, but I'll read through it again when I have some more time.
Is that correct? I feel like it may be inherent to the nature of setting up a new pty, i.e. (as you put it) "basically booting a separate operating system" means you expect a terminal as a console, and that terminal doesn't really have separate stderr/stdout or pipes to/from stdio, just a console.
Effectively yes. When you open a new shell on your machine, you'll find that /proc/self/fd/1
and /proc/self/fd/2
are the same PTY. You describe this as a short-coming in #1730 -- I don't disagree (and I was thinking of making them different PTYs when I first wrote the code), but from memory there is some process management voodoo that breaks if you have a non-controlling terminal as your stdio (because terminals are super magical in Unix).
I am currently on vacation
Another one who does OSS on vacation? I know I am in good company, but what is it with us? @jmahowald said to me about this, "don't complain. We are blessed that we enjoy what we do."
I thank you. Looking forward to seeing those PRs (fixed if needed and then) merged in. Enjoy vacation!
Alright, we now have documentation. Time to fix this issue. I'll take a look at this again later this week.
@cyphar were you able to find something?
I am currently looking into runc+criu tests (i.e. https://github.com/opencontainers/runc/blob/5b38ef7173cfd50e8fb6ded787876b8406f29408/tests/integration/checkpoint.bats#L51..L235) and it is likely I am hitting the issue described. I guess this is also the reason for commit 5369f9ade by @filbranden
No, I stared at the stall for a while and couldn't figure it out. My line of thinking was that there's something odd going on with how Go is copying the input, but this bug is so old I've forgotten where I got with debugging it. I can give it another shot though.
it is likely I am hitting the issue described
(for the sake of anyone reading this) I was wrong about ^^^ that.
The issue I saw was the issue with the test case itself, it is documented and fixed in commit d5e68ceb0cca (which is part of PR #2332).
There are many good reasons to use terminal: true (in general I recommend using terminal: true over terminal: false), mainly related to ncurses programs requiring having a terminal so they can do their terminal-switching magic.
On the other hand using pipes would be the safest option when using detached-passhtrough mode. The documentation even warns about passing regular files or ttys in that mode. And yet the safest one is the one that doesn't work according to this issue. 😞
Are there any more details under which constellations pipes cause hangs? Is it std in/out/err? Is it combining out/err? Does it depend on the O_NONBLOCK fcntl?
We also ran into this with Go and crun
and found the following, which may be helpful here:
Cmd.Stdout
or Cmd.Stderr
is set to a non-*os.File
, Go will launch goroutines to transfer between those and the process and, crucially, waits for the pipes to be closed even after the process has exited.crun
, the create
command already starts the entrypoint (in our case sleep
), which inherits and never closes stdout
and stderr
.We worked around this by setting Cmd.WaitDelay
to time.Nanosecond
and ignoring ErrWaitDelay
.
I found this by playing with various options in golang
cmd := exec.Command("runc")
and changingcmd.Stdout
, etc. to try and capture output from a container run. It works if I make stdoutos.Stdout
or an actualfile
, but anything else causes it to hang.For example:
cmd.Stdout = io.MultiWriter(os.Stdout)
cmd.Stdout = io.MultiWriter(file)
pipe := cmd.StdoutPipe()
- hangs when you doio.Copy(os.Stdout, pipe)
A sample with this is here
But it turns out I can recreate the problem simply:
The above hangs.
Is this because of some strange pty/tty handling? How would I work around it if I need to capture stdout from a
runc create
and possibly send it to other areas? E.g.runc create echo | tree /tmp/foo