Add -preserve-environment option to runc exec

mtaylor91 commented 7 years ago

I'm looking to use runc to run container builds, however I've found it inconvenient and messy to have to explicitly configure each container to pass the appropriate environment variables. It would be great if runc exec supported a '-preserve-environment' flag to pass environment variables from the calling process into the process executed in the container.

I threw together a quick attempt at this change myself that seems to work fairly well; it can be found over at https://github.com/mtaylor91/runc. If someone can review it and give any feedback, I can make any requested changes and submit a pull request if this functionality might be more widely useful.

crosbymichael commented 7 years ago

@mtaylor91 thanks for opening the issue first.

I'm not sure if this would be the best way to handle env vars. We have a flag for fds because there is no other way to pass an fd other than at runtime. For env vars its different. We want to make sure that we follow the spec and ensure that the spec is our single source for the data required to run containers. This would break that assumption.

However, with that being said, I looked at your code and it seems to be only with runc exec. Exec is a grey area as we already have the --env flag along with others. If possible I would rather not inherit env vars and keep it explicit but we can see what others think on the change as well.

If you didn't know, you can provide a json blob via the --process flag so that you have a more programmatic way to pass the options to runc exec. Its the same format as the Process object in the runtime-spec.

mtaylor91 commented 7 years ago

I definitely agree with you about this not always being the best way to handle env vars, and indeed I do handle them much more explicitly on production systems. That being said, for use cases like the one I've hit, where the environment has already largely been sanitized and prepared by the build environment; it's a great deal of extra plumbing to massage the variables into json and prepare process objects, all just to recreate what the calling process had already finished setting up. That being said, I understand your hesitation since if it's there, it's liable to get used inappropriately. What about supporting it but hiding it from the help, or adding a note recommending against it's usage in most circumstances?

jvantuyl commented 7 years ago

The big drawback with the --env flag (and most other solutions to this), for me, is escaping. Environmental variables can contain things that are fiendishly difficult to escape correctly. Allowing someone to mark the environment as clean and just pass it through may seem rife for abuse--but I feel that pales in comparison to requiring every consumer of runc that needs to pass through a complicated environment to implement shell escaping. I don't know that it gets much better with layering it in JSON using the --process flag (given that much outside automation is in the form of shell scripts).

In my case, I have a number of applications that conform to some subset of the 12-Factor-App philosophy. One of my big wins has been the practice of storing config in the environment. I feel that they make a great case for it and I, personally, have felt the benefits. Starting such apps using runc can be a bit of a pain right now because the app configuration is usually done best inside of the container--which also interferes with using the security of containment to protect against malicious attacks on the configuration process.

I would greatly appreciate a flag to turn of environment sanitizing like the one @mtaylor91 proposes. If we can't turn it off wholesale, I would be happy with an option to at least whitelist a set of variables. That would meet my needs and still give you strong environment sanitizing.

jvantuyl commented 7 years ago

One other thing that escaped my attention the first time. It's very common these days to see things that pass in credentials in the environment. For example, Amazon's AWS CLI likes to you to pass in AWS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN (among other things). Requiring these sorts of things to go into the config file now makes it security sensitive even for the purposes of reading it (where as before reading was okay but writing to it was sensitive). Allowing the whitelisting of those keys or preserving the environment makes it easier for us to do things like checking those files into SCM.

So, again, there's the trade-off of someone misusing an optional feature versus requiring shell escaping in some places (which can have huge security impact if done wrong) and requiring any app that wants credentials in the environment to leak those credentials into the config files (which also has a huge security impact--especially if accidentally committed to SCM). If our concern is security, I think that an optional feature is the engineering trade-off with the least negatives.

amorphid commented 7 years ago

The --preserve-environment option is pretty handy for sudo --preserve-environment. I can see myself liking this in runc, too. Also, I like the idea of being able to pass sensitive data in a manner that isn't tightly coupled to command line arguments.

In the example, $FOO isn't exposed in the process list

# preserve environment
$ export FOO=bar
$ sudo ./runc exec --preserve-environment -c "bash"

# can't see $FOO
$ ps -ef | grep runc
root      7038  6917  0 02:11 pts/1    00:00:00 sudo ./runc exec --preserve-environment -c bash

But in this example, $FOO gets exposed when maybe that wasn't necessary

# using --env
$ export FOO=bar
$ sudo ./runc exec --env FOO="${FOO}" -c "bash"

# can't see $FOO
$ ps -ef | grep runc
root      7043  6917  0 02:13 pts/1    00:00:00 sudo ./runc exec --env FOO=bar -c bash

cyphar commented 7 years ago

I agree with @crosbymichael that this should be explicit and I also don't like --preserve-environment. Here are my reasons:

--preserve-environment now opens you up to accidentally including sensitive environment variables from your host. That seems like a foot-gun to me. Not to mention that we'd have to figure out whether the preserved environment variables should override variables set in the Process JSON.
In addition, there is no longer a single source of truth (the spec) when it comes to what the environment of exec is. To be fair, exec is not required by the runtime-spec but I don't see why we should add an out-of-band measure for something like this. It's incredibly trivial to generate a JSON blob for the evironment (and you get the benefit of being explicit about it).
Every single time we add an out-of-spec flag to runc exec (or runc create/start) I get more and more nervous. runC is an implementation of a spec, and we should be moving towards using the spec and only the spec wherever possible -- if a feature is so crucial it needs to be implemented in all runtimes in order for a vital usecase then it should go in the spec. IMO this is not an example of such a circumstance, which begs the question why does runC need to add additional flags to wrap the 5 lines of shell it takes to generate an environment JSON?

All-in-all, NACK. But thanks for opening an issue @mtaylor91 before creating a PR.

kstrauser commented 7 years ago

This would be incredibly handy in our CI pipeline. It feels more footshooty to make a script to encode the environment as JSON and then import then whole thing than to grant access to it directly.

I totally, totally get why this is something you don't want to do 90% of the time. But that other 10% of the time, it's exactly what you want to do and adding the extra wrapping can only make it more fragile.

jvantuyl commented 7 years ago

@kstrauser That's exactly where I'm using it. Anyone who's had to manage a Jenkins instance knows that the bane of your existence is a never ending parade of "special" jobs that are all magically tweaked in some subtle manner. This is exactly what I'm fighting with. Specifying the variables in the runc command line is the last thing sitting between me and having entirely generic jobs for my container builds. Tools are meant to be used. It's unclear how runc benefits from forcing me into a ton of extra complexity just to work around something I could ask it to turn off. And it borders on insulting to suggest that I should have to do so because I might misuse the optional feature I'm requesting.

Best as I can tell runc exec is entirely about one-off executions. That's why it's not in addressed in the spec (which has a tighter scope than the runc tool). If anything, it's a use-case outside of the spec, so it's rather torturous to suggest that it should somehow "implement the spec". If you want to have a concrete "container definition", that's what runc create is for and the spec is solid.

Outside of that, It's unclear to me what we're trying to protect me from. I'm already using runc exec. I'm out of the safe zone. There is not one JSON source of truth for my container definition. That ship has sailed. If you're not going to add the features, like this, that support the one-off use-case that it's designed for, why not just remove it entirely?

With that in mind, does anyone have any suggestion to solidly solve this problem? Our use case is passing through a dynamic list of environment variables. Ideally we don't want to write a helper script in a high-level language or generating JSON in shell script. The former is a pain because we're already running a CLI command, shipping another program along with our scripts is a pain, and it causes us to commit sensitive key material to disk. The latter is an escaping nightmare with potentially much deeper abuses than leaking variables ever did.

I'm particularly skeptical of the dismissive (and, honestly, condescending) suggestion to write "five lines of shell it takes to generate an environment JSON". JSON handling in Bash? Bash does not have any solid features for reliably generating JSON. It has no JSON serializer or deserializer. There's no guarantee that we don't generate a subtly invalid JSON file. In fact, I'd imagine it could be used to attack the entire JSON structure being passed in.

Moreover, variables containing sensitive characters could be used to terminate the environment section of the JSON and inject other much more nefarious options. Generating JSON in shell is fragile at best and downright dangerous at worst. I can't believe that's the recommendation to avoid "shooting ourselves in the foot".

In my case--even ignoring injection attacks against the JSON--we're still explicitly creating a significant security problem. I'm passing through credentials. Ideally, those don't show in the process list and they don't end up unencrypted in a file on disk. People use the environment for this because it's the least bad option. The two options for getting this into a container either expose it in the process list (--env) or commit it to disk (--process) which now must be securely erased. Why create a bona-fide security problem instead of adding the ability to (possibly selectively) whitelist environment variables?

jvantuyl commented 7 years ago

@crosbymichael @cyphar Would you be amenable if I specified a list of variables instead? Something like --env-import <variable> or something? Basically, something with the same power as --env but without putting us through escaping hell with the shell. If anything that's even safer than what we have now (and handles the use-case of credentials well).

amorphid commented 7 years ago

How about adding importable environment variables to the spec?

jvantuyl commented 7 years ago

@amorphid blinks That sounds (to me) like a fabulous idea.

cyphar commented 7 years ago

@amorphid Ignoring my concerns about how that would work with non-cli runtimes (the "host environment variables" is not very well specified in non-cli setups -- but that's not a problem we can't solve), if you'd like to write up a proposal for it the repo is https://github.com/opencontainers/runtime-spec. Even if you just open an issue we can discuss it there.

@jvantuyl --env-import ... sounds a bit better to me because it's less of a footgun but the whole out-of-spec thing is still a concern. If you'd like to avoid writing some dodgy shell script, I can make a new project that generates Process JSONs so you don't need to worry about it as much -- would that be helpful?

jvantuyl commented 7 years ago

@cyphar Well, that still puts my credentials on disk which is problematic for my particular use case. If I were to implement --env-import, do you think we could get it accepted? I'm all for adding environment import to the spec, but it doesn't seem too reasonable to add this to runc exec... I mean, we're already outside of the spec and it's no worse than --env (and, in some ways, much better).

cyphar commented 7 years ago

@jvantuyl Is it not possible to use process substitution in Jenkins scripts? Or even with using sh -c ...? Here is how I would anticipate this script would be used:

% runc exec --process <(gen-process --host-env)

While runc exec is "out of the spec", we use process.json internally (this used to be much more exposed previously, now it's "more hidden") and I'd like to make sure that there's no nice way to do what you'd want to do without adding more completely-out-of-spec fields. At least, that's my opinion on the topic.

opencontainers / runc

Add -preserve-environment option to runc exec #1404