deitch commented 6 years ago

I found this by playing with various options in golang cmd := exec.Command("runc") and changing cmd.Stdout, etc. to try and capture output from a container run. It works if I make stdout os.Stdout or an actual file, but anything else causes it to hang.

For example:

cmd.Stdout = io.MultiWriter(os.Stdout)
cmd.Stdout = io.MultiWriter(file)
pipe := cmd.StdoutPipe() - hangs when you do io.Copy(os.Stdout, pipe)

A sample with this is here

But it turns out I can recreate the problem simply:

$ runc create echo | cat

The above hangs.

Is this because of some strange pty/tty handling? How would I work around it if I need to capture stdout from a runc create and possibly send it to other areas? E.g. runc create echo | tree /tmp/foo

deitch commented 6 years ago

cc @justincormack

cyphar commented 6 years ago

What is your config.json? In particular, do you have terminal enabled or disabled?

deitch commented 6 years ago

Unset, which, according to the docs, means false. Not that I ever really understood the import of the setting. :-)

deitch commented 6 years ago

As far as I can tell, terminal: false means not to create a new pty for the container, but rather to inherit from whatever was passed to runc. When I run runc -d (or runc create and then runc start), it runs in detached mode - runc starts it and exits - and that causes something to hang. I am unsure what.

If I try to do it as runc run (no -d or create+start), then it appears to work.

Did I understand correctly?
If so, what is the correct way to get the container to use an already existing stdin/stdout/stderr but still have the lifecycle control of runc create followed by runc start?

cyphar commented 6 years ago

(We really need to document this somewhere other than issues, or I should come up with a canned reply to questions about terminal.)

There are three main cases that can occur with pty handling:

runc create or runc run -d indicates that runc will not stick around -- there will be no runc code running after the container is set up. This means that runc cannot do any IO forwarding -- it has to be set up by the caller. The way this restriction manifests depends on the terminal setting.
- terminal: true means that a PTY will be set up by runc -- but runc cannot exit(2) without another process having the file descriptor for the master PTY. This is done by sending the fd through the --console-socket= argument.
- terminal: false just means that the stdio fds will be inherited by the container (whatever they are) as the PTY will not be set up. In other words, 0..2 are set as the containers stdio. Be aware that this can cause security issues if you are not very careful when you use it (assuming you're running untrusted code in the container).
runc run means that runc will stick around doing IO forwarding. In this case, terminal: true and terminal: false will look similar (except for the fact that terminal: true creates a PTY and terminal: false doesn't).

So, if you're piping runc run -d to a file what's actually happening is that the container process's stdio FDs are the pipe FD (you should be very careful when doing runc run -d ctr >some_file because it gives the container indirect access to the host filesystem). Hopefully that better explains what is happening when you use terminal: ....

deitch commented 6 years ago

(We really need to document this somewhere other than issues, or I should come up with a canned reply to questions about terminal.)

Very much would be appreciated. I struggled with isolating my problem for a while.

console-socket argument

I couldn’t figure that out either. What gets passed? The path to a Unix domain socket, ok, but is that socket itself intended to become one of stdout or similar? How does that align with “pty will be set up by runc”?

terminal: false just means that the stdio fds will be inherited by the container (whatever they are) as the PTY will not be set up. In other words, 0..2 are set as the containers stdio.

This is exactly what I was trying to do. Why in this case does setting stdout or stderr to existing stdout/stderr or to an actual file work, but to a pipe or a buffer hang?

deitch commented 6 years ago

Hopefully that better explains what is happening when you use terminal:

It does, thank you.

thaJeztah commented 6 years ago

@cyphar re: documenting; should there be a “docs” or “examples” directory in this repo?

cyphar commented 6 years ago

@deitch

I couldn’t figure that out either. What gets passed? The path to a Unix domain socket, ok, but is that socket itself intended to become one of stdout or similar? How does that align with “pty will be set up by runc”?

--console-socket /path/to/unix/socket tells runc to send the file descriptor for the master end of the PTY device through the socket using SCM_CREDENTIALS (in case you're not aware, on Unix you can send file descriptors through sockets and the receiving end then has a dup of that file descriptor). runc creates the PTY, but it sends the master to another process since you asked runc to exit (the master end needs to be open otherwise the container will get a SIGHUP or something similar, and you won't be able to interact with the stdio -- and exit will close the file descriptor that runc has for the master).

1018 implemented `--console-socket` and (while it's quite a long thread) that includes the discussion over the interface as well as usecases and so on (but yes -- that also means that I'm the one to blame for this one :wink:).

Why in this case does setting stdout or stderr to existing stdout/stderr or to an actual file work, but to a pipe or a buffer hang?

I am not sure, and I will take a look at this when I get a chance.

cyphar commented 6 years ago

@thaJeztah Yeah we probably should have a docs (https://runc.io/ used to be our "soft" documentation, but it looks like that website has been taken down and now just points to this repository -- which is sort of useless). Especially since stuff like this isn't documented anywhere (because all of the magic crap in --console-socket is completely out-of-spec for OCI -- because we had to punt on terminal being meaningful in the specification).

deitch commented 6 years ago

@cyphar so it creates all three stdio descriptors, and then passes them via the socket? So I

Create the socket
Pass it via console-socket
Wait for runc create to complete and then retrieve it from there

Is that it? Why didn’t it just let me pass to it —stdin dev —stdout dev —stderr dev instead of the whole complex “have runc create it and I have to make a place to retrieve it”?

I am not sure, and I will take a look at this when I get a chance

Thanks. Much appreciated. I started to dig into the code myself, but not being familiar with it from the get go means it will take me a lot longer.

I’m having a hard time reasoning about what could hang with a pipe or byte buffer that wouldn’t with a file or stdout (which is a file handle, but then again, so is a pipe).

deitch commented 6 years ago

And if I ever actually fully get it, happy to open a PR for docs for this part.

nqbao commented 6 years ago

👍 for documentation about terminal behavior with runc. I also have a relevant question in a related ticket #1716

cyphar commented 6 years ago

@deitch

Is that it? Why didn’t it just let me pass to it —stdin dev —stdout dev —stderr dev instead of the whole complex “have runc create it and I have to make a place to retrieve it”?

--console-socket only applies if you have terminal: true -- this means "please create the pty for me, from inside the container's /dev/pts/... mount, during the rest of the container setup". The "from inside the container" part is quite important, and there were many bugs (including sudo not working inside containers) that were caused by the pty not being created inside the container.

Now, you could manually do the above instead of using --console-socket (by doing a runc exec and then emulating what I just listed by opening a socket and using --preserve-fds for the container process to inherit it), but in order for Docker and other upstream users to be able to have a sane terminal setup we just added an option for it.

As for why we don't have --stdin, --stdout, --stderr for terminal: false? Though I don't like the flag names (and would prefer if --preserve-fds was more general purpose and didn't have its current lets-just-copy-Go semantics), this is something that I've been pushing for ever since I worked on #1018 (and if you wanna go for a long read, you'll see the discussion we had about this at the time) -- because the current implementation of "just inherit the stdio fds" is:

Way too magical, for an already-pretty-magical part of the codebase. People get hit by this weirdness all the time, and it just confuses everyone. It also makes the code kinda complex (but the alternative is also complex).
Ripe for security bugs, as inheriting the stdio fds for something like your terminal can be quite dangerous (see ioctl_tty(4) and TIOCSTI). Same goes for inheriting file descriptors that refer to directories from within the host mount namespace (very unsafe).

The main arguments that are against having specific flags for stdio (or extending --preserve-fds as I mentioned) are:

Adding more flags to stdio handling will make it more complicated, and it's already far too complicated. Not to mention that the flags will be useless in "non-detached" mode, making the handling even more complicated. This is quite a reasonable argument, and is the reason I suggest extending --preserve-fds (though --preserve-fds for stdio would still be more complicated in the non-detach case -- since in that case runc run will use pipes for the stdio).
It would break the "out of the box" case for detached containers -- running it in your terminal. I used to think that this argument was okay, but I've recently changed my mind after actually trying to imagine what a new user would see if they tried this. In short, a new user would see that their terminal has been hijacked as both the container and their shell try to read from the same terminal. This is not an out-of-the-box case that makes sense.
It is not clear what the default stdio should be in the detached case. There are obviously incorrect answers (create a logfile for stdout and stderr -- like nohup does), but the most obvious answer (just use /dev/null) is also not without its downsides. In particular, this would mean that the "out of the box" case for detached containers that have a shell as their pid1 will exit prematurely (shells don't like their stdin being /dev/null). While this all makes sense if you understand what's going on -- most new users won't know why their container didn't survive for more than a split second. The other option would be to simply not let users start a detached container without specifying stdio, which would mirror the requirements for --console-socket. I think this is nicer overall, but also makes onboarding more complicated than it was previously.
Our detached handling is effectively identical to & in a shell -- so users should already know how things will look. While I agree runc run -d might superficially look like runc run &, I don't think it make sense to use this similarity as an argument for keeping it.

deitch commented 6 years ago

I kid you not, I had to reread this several times before I was sure I got it. :-)

I still have to read #1018 in depth when I have a chance (off for a few days now), and I will.

And if I didn’t say it yet, thank you for the great detail.

Basically, it sounds like the summary is:

If you’re running in foreground, non detached, normally it’d be same to run terminal:false, and it inherits all your stdio. You could set an alternate with terminal:true but usually unnecessary unless you really don’t want your calling process’s stdio to be the container process’s stdio.
If you’re running detached from a real shell, it’d be most sane to run terminal:true. Passing the stdio from your live real terminal to the detached container is asking for trouble. Of course, you need some way to ask runc for the file descriptors for the created streams, hence —console-socket. An alternate approach would be to create them first and pass them to runc, but that wasn’t done, and doesn’t have real advantages over this approach.
If you’re running detached from a non interactive shell - which is how I came to it - then either terminal:true and retrieve the fds, or terminal:false and pass the streams in, should be fine. For reasons unknown, the second approach doesn’t work unless the passed stdio are real stdio or files (this issue), but in principle either should.

That fair?

cyphar commented 6 years ago

@deitch

I kid you not, I had to reread this several times before I was sure I got it. :-)

Damn, I was trying to make it simpler to understand. :/

Your summary is pretty much on the money aside from this point:

If you’re running in foreground, non detached, normally it’d be same to run terminal:false, and it inherits all your stdio. You could set an alternate with terminal:true but usually unnecessary unless you really don’t want your calling process’s stdio to be the container process’s stdio.

This isn't correct. There are many good reasons to use terminal: true (in general I recommend using terminal: true over terminal: false), mainly related to ncurses programs requiring having a terminal so they can do their terminal-switching magic. Also programs like sudo and gpg can require a terminal depending on their configuration. However, creating a terminal with a detached container is more work (requiring --console-socket usage), so that should be kept in mind (and are you going to try to use an ncurses program with a detached container?).

In addition, in non-detached mode, runc will either create a pty and then forward it to your terminal (if you have terminal: true) or it will create pipes for the containers' stdio and then forward it to your terminal (if you have terminal: false). You cannot make the container inherit the stdio in non-detached mode (because then the top-level runc process wouldn't be able to do any IO forwarding -- which is the whole point of non-detached mode).

deitch commented 6 years ago

Damn, I was trying to make it simpler to understand. :/

Read it not as a criticism of your writing, but a view into my brain’s addled state. :-)

Your summary is pretty much on the money aside from this point:

Basically, you are saying, it almost always makes sense to do terminal:true. We recognize that sometimes you want the existing stdio to be passed down, whether in detached or non detached, but as a general rule of thumb, true makes for greater sanity.

I understand it, but it feels like it goes against the grain on Unix style streams and pipes. I start processes, they inherit stdio no matter how far down, and I always can pipe into their stdin or out of their stdout/stderr.

cyphar commented 6 years ago

I understand it, but it feels like it goes against the grain on Unix style streams and pipes. I start processes, they inherit stdio no matter how far down, and I always can pipe into their stdin or out of their stdout/stderr.

Yeah, I agree. But unfortunately when you're basically booting a separate operating system, the purity of pipes is less usable -- though of course note that if you don't run in detached mode this shouldn't be an issue (since runc run just forwards output in that mode). And ultimately if you want to use containers in a "batch mode" (where you actually don't care about whether you have a terminal) then that's why terminal: false exists.

deitch commented 6 years ago

So basically, the message is, "use terminal:true as much as you can and use console-socket to interact with the stdio, but if you really really really don't want to, or if you are running in foreground runc run where it probably doesn't matter, then use terminal:true". :-)

I will put in a PR for docs later this week.

deitch commented 6 years ago

@cyphar read through this a number of times. As far as I can tell, if you set terminal: true (which appears to be what you recommend), then runc will create a new pty, and, if --console-socket is provided, it will send the fd for that pty to the socket passed.

However, this does not change what the container's process has, only allows you to "tap into it". The net effect appears to be that I (outside process) can feed to its stdin by writing to that fd, and I can read the combined stdout/stderr but reading from that fd.

But I cannot, e.g., separate stderr from stdout (i.e. fds 1 and 2 are combined onto that passed fd). If I wanted to do that, I would need to run it foreground.

Is that correct? I feel like it may be inherent to the nature of setting up a new pty, i.e. (as you put it) "basically booting a separate operating system" means you expect a terminal as a console, and that terminal doesn't really have separate stderr/stdout or pipes to/from stdio, just a console.

If you want to do more "regular Unix-y stdio things" (yes, I really just wrote that.... :-) ), then you need to run foreground or terminal:false. Which is fine, if not recommended, except for the oddness around it hanging, which is what started this issue.

Understanding correct?

deitch commented 6 years ago

I created a PR for documenting it. In doing so, I realized that having a simple library to interact with the console-socket, as opposed to a CLI recvtty, would be hugely helpful. Will try a PR for that.

deitch commented 6 years ago

Abstracted out https://github.com/opencontainers/runc/pull/1731

cyphar commented 6 years ago

@deitch [I am currently on vacation, so sorry for the brief response.]

I believe what you wrote is effectively correct, but I'll read through it again when I have some more time.

Is that correct? I feel like it may be inherent to the nature of setting up a new pty, i.e. (as you put it) "basically booting a separate operating system" means you expect a terminal as a console, and that terminal doesn't really have separate stderr/stdout or pipes to/from stdio, just a console.

Effectively yes. When you open a new shell on your machine, you'll find that /proc/self/fd/1 and /proc/self/fd/2 are the same PTY. You describe this as a short-coming in #1730 -- I don't disagree (and I was thinking of making them different PTYs when I first wrote the code), but from memory there is some process management voodoo that breaks if you have a non-controlling terminal as your stdio (because terminals are super magical in Unix).

deitch commented 6 years ago

I am currently on vacation

Another one who does OSS on vacation? I know I am in good company, but what is it with us? @jmahowald said to me about this, "don't complain. We are blessed that we enjoy what we do."

I thank you. Looking forward to seeing those PRs (fixed if needed and then) merged in. Enjoy vacation!

cyphar commented 6 years ago

Alright, we now have documentation. Time to fix this issue. I'll take a look at this again later this week.

kolyshkin commented 4 years ago

@cyphar were you able to find something?

I am currently looking into runc+criu tests (i.e. https://github.com/opencontainers/runc/blob/5b38ef7173cfd50e8fb6ded787876b8406f29408/tests/integration/checkpoint.bats#L51..L235) and it is likely I am hitting the issue described. I guess this is also the reason for commit 5369f9ade by @filbranden

cyphar commented 4 years ago

No, I stared at the stall for a while and couldn't figure it out. My line of thinking was that there's something odd going on with how Go is copying the input, but this bug is so old I've forgotten where I got with debugging it. I can give it another shot though.

kolyshkin commented 3 years ago

it is likely I am hitting the issue described

(for the sake of anyone reading this) I was wrong about ^^^ that.

The issue I saw was the issue with the test case itself, it is documented and fixed in commit d5e68ceb0cca (which is part of PR #2332).

the8472 commented 6 months ago

There are many good reasons to use terminal: true (in general I recommend using terminal: true over terminal: false), mainly related to ncurses programs requiring having a terminal so they can do their terminal-switching magic.

On the other hand using pipes would be the safest option when using detached-passhtrough mode. The documentation even warns about passing regular files or ttys in that mode. And yet the safest one is the one that doesn't work according to this issue. 😞

Are there any more details under which constellations pipes cause hangs? Is it std in/out/err? Is it combining out/err? Does it depend on the O_NONBLOCK fcntl?

fmeum commented 2 weeks ago

We also ran into this with Go and crun and found the following, which may be helpful here:

If Cmd.Stdout or Cmd.Stderr is set to a non-*os.File, Go will launch goroutines to transfer between those and the process and, crucially, waits for the pipes to be closed even after the process has exited.
At least with crun, the create command already starts the entrypoint (in our case sleep), which inherits and never closes stdout and stderr.

We worked around this by setting Cmd.WaitDelay to time.Nanosecond and ignoring ErrWaitDelay.

opencontainers / runc

runc create hangs when passed a pipe for stdout or stderr #1721

1018 implemented `--console-socket` and (while it's quite a long thread) that includes the discussion over the interface as well as usecases and so on (but yes -- that also means that I'm the one to blame for this one :wink:).

opencontainers / runc

runc create hangs when passed a pipe for stdout or stderr #1721

1018 implemented --console-socket and (while it's quite a long thread) that includes the discussion over the interface as well as usecases and so on (but yes -- that also means that I'm the one to blame for this one :wink:).

1018 implemented `--console-socket` and (while it's quite a long thread) that includes the discussion over the interface as well as usecases and so on (but yes -- that also means that I'm the one to blame for this one :wink:).