opencontainers / runtime-spec

OCI Runtime Specification
http://www.opencontainers.org
Apache License 2.0
3.22k stars 548 forks source link

Allow to do some actions before container process stop #926

Open medzin opened 7 years ago

medzin commented 7 years ago

Background: The OCI hooks mechanism looks like a good place to register container in discovery service or add it to load balancer pool - as we have Prestart and Poststart hooks, but there is a problem when we want to remove old containers after new version deployment. We only have Poststop hook which is fired during container deletion (so process inside container is already dead), which in practice makes impossible to gracefully remove old containers (e.g. by using load balancer connection draining feature). When a single container handles high traffic (thousands of requests per seconds) it ends with a lot of 502 HTTP errors for the site users. Currently each popular container management software provides some way to achieve graceful container removal (Kubernetes has PostStart/PreStop container lifecycle hooks, on Mesos we can write custom executors with that logic), but OCI spec is missing that feature.

Question: Can the OCI spec be extended to allow to do some actions before container process stop in a standardised way?

wking commented 7 years ago

Kubernetes has PostStart/PreStop container lifecycle hooks...

The spec doesn't have pre-stop hooks, because it doesn't have a stop operation. It has a kill operation, which you can use for things like SIGTERM and SIGKILL. You'll probably want logic like:

  1. Take the container out of the load balancer.
  2. Use kill to send TERM.
  3. Wait for a gracefull shutdown, which for servers may include queue draining.
  4. Optionally use kill to send KILL, if gracefull shutdown takes longer then you want.
  5. delete once the process exits.

That's pretty easy to do if you can tell the controller to fire 1 (and 3, if any monitoring is necessary there), and thrre's no need to involve the runtime. But since the create / start split, there's no need for hooks at all (#483). So if we wanted new hooks for this case, I think we'd want prekill and postkill entries taking signal keys, and inside those per-signal entries the usual hook array:

{
  ...,
  "prekill": {
    "TERM": [
      ...remove from load balancer...
    ]
  }
}

That gets a bit wiggly, because the spec says nothing about supported kill signals (although the in-flight opencontainers/runtime-tools#321 would require support for at least TERM and KILL). And there's no guarantee that the runtime (or even the controller) would be sending the signal (the container process could crash on its own, among many other possibilities). So I'm not sure it's worth supporting this in hooks, but if we want to, that pre/postkill structure might be sufficient.

medzin commented 7 years ago

And there's no guarantee that the runtime (or even the controller) would be sending the signal (the container process could crash on its own, among many other possibilities).

In my opinion this is not a good argument for this use case. To be honest, as you go along this reasoning, no hook is guaranteed to be run because the runtime controller process can crash in any moment also (as any other component of your infrastructure can fail).

I understand that it is impossible to guarantee a 100% effectiveness of a graceful shutdown mechanism in all situations, but 99% effectiveness in every day use cases (new version deployments, host maintenance etc.) is a great business value, because it allows you to work almost freely (and safely) with the containers.

wking commented 7 years ago

On Mon, Oct 02, 2017 at 09:16:14AM +0000, Adam Medziński wrote:

To be honest, as you go along this reasoning, no hook is guaranteed to be run because the runtime controller process can crash in any moment also (as any other component of your infrastructure can fail).

Fair. And the fact that the signals supported by ‘kill’ are undocumented is clearly something that could be addressed by the spec if the maintainers agree that it's useful (I think it is, regardless of whether kill-related hooks are supported).

This is really a judgement call for maintainers about the scope of hooks in the spec. I don't know if we have maintainers speaking for themselves somewhere on that topic, but based on my paraphrasing in 1, it's possible that the current hooks were just grandfathered into the spec because they had existing users, and that the maintainers would rather not add additional hooks because folks can always address these issues in higher levels (e.g. in containerd or whatever controller you use). Or it's possible that they want pre- and post-hooks for all operations 2. Or it's possible that they want to debate the utility of pre- and post-hooks for each case, and only require runtime support for those which they deem sufficiently useful.

If we do add pre- and post-kill hooks, an alternative to my previous suggestion 3 might be to drop the signal keys and add a token (‘${SIGNAL}’?) to pass the chosen signal through to the hook:

$ cat config.json { …, "hooks": { "prekill": [ { "path": "/usr/bin/my-pre-kill-hook" "args": ["ab", "${SIGNAL}", "cd"], } ] } } $ cat /usr/bin/my-pre-kill-hook

!/bin/sh

AB="${1}" SIGNAL="${2}" CD="${3}" STATE="$(cat)" # read the state JSON from stdin case "${SIGNAL}" in TERM) …whatever… ;; KILL) …other stuff… ;; USR1|USR2) …more stuff… ;; esac …

That would also allow the spec to continue to dodge the issue of signal names, although I don't see a need to do so.

medzin commented 6 years ago

@wking Should I make a pull request with proposed changes or more opinions is needed?

wking commented 6 years ago

Should I make a pull request with proposed changes or more opinions is needed?

Working up a pull request is more work for you, and the maintainers may end up rejecting it if you don't happen to guess what they want. But it might increase the chances of a maintainer chiming in on this point, and that would help. The December meeting is just around the corner; I'll try to get some opinions then. If you can make the meeting, you may wan't to argue for your preferred approach. I don't really care if the spec gets kill hooks or not. They are convenient (good) but feel like scope-creep (bad) and those balance out for me. I'd be more enthusiastic if we had a separate runtime-controller layer with all our hooks, but that would be a bigger shift.

If we don't get maintainer opinions at the meeting and you're comfortable investing time in a PR that might be rejected, then you can work up a PR. Or if you don't mind investing time and want something more concrete to discuss at the meeting, you can work up a PR now.

wking commented 6 years ago

There was some discussion of this in today's meeting, although it looks like collabot decided to check out and miss the fun :p. Here's my local log:

14:07 @wking I'd like to talk about the plans for hooks in runtime spec (https://github.com/opencontainers/runtime-spec/issues/926), but we may not have enough runtime-spec maintainers present to make useful progress on that 14:08 @wking stevvooe: I thought the hooks were all but deprecated after the create/start split 14:08 @wking mikebrow: create/start covers most of it, but the GPU guys want a pre-start hook to modify the config 14:09 @wking stevvooe: but why would you do that at the runtime level? 14:09 @wking stevvooe: it seems like an implementation detail 14:09 @wking mikebrow: that's a good point 14:09 @wking #topic: hooks in runtime-spec 14:09 @wking stevvooe: Docker, containerd, and runc all have hooks. What's the right level? 14:10 @wking stevvooe: if you call each of those actions independently, it doesn't seem like you need hooks at every level 14:11 @wking my impressions is that there's nothing you need hooks for, but they're convenient for folks who are passing runtime configs to the orchestrator 14:11 @wking stevvooe: right now having each action driven explicitly lets the orchestrator decide exactly how hook-like things are executed 14:12 @wking mikebrow: I think part of the issue is that early in runc there wasn't an orchestrator API 14:14 @wking supporting hooks is not going to be hard. I think we just need a maintainer decision on whether we want to freeze with the existing (grandfathered-in from before create/start) hooks, or if we plan on adding new hooks 14:15 @wking vbatts|work: I don't like packing them all the way though. Hopefully these are coming from the orchestrator and not traversing the whole stack to inject a hook from higher up 14:15 @wking mikebrow: I've also seen folks proxy runc so they can inject hooks when they call the real runc 14:15 @wking mikebrow: there's definitely a need for this sort of thing when you can't get the orchestrator to pass these down 14:16 @wking mikebrow: I think we need some patterns/recommendations to give folks an approach to follow 14:16 @wking stevvooe: a statement about runtime-spec / orchestrator scoping would be useful 14:17 @wking +1 14:17 @wking mikebrow: that applies not just to hooks, but to quite a few elements of the spec 14:17 @wking stevvooe: I'd like to hear crosbymichael's opinion on this too 14:18 @wking mikebrow: and mrunalp can talk about some of the patterns he's seen 14:18 @wking stevvooe: with the GPUs, that can be a pre-create spec processor where the appropriate components get added. And that can happen outside runc 14:19 @wking stevvooe: let's say I have a spec-modification pipeline outside of runc. You can deserialize once, make a few modificiations, and then serialize once 14:19 @wking stevvooe: with hooks, you'll be deserializing several times

So we're waiting for @mrunalp and @crosbymichael to weigh in with their plans for hooks.

cyphar commented 6 years ago

FWIW I also think adding hooks (where they make sense) might be a good decision, specifically because it allows us to avoid having to have even more wrappers-of-wrappers when people decide they want to slightly alter how runc works without creating an entirely separate project. The GPU usecases in particular show how useful hooks are (they even modified the hook design in LXC so that the GPU usecase worked better).

wking commented 6 years ago

On Wed, Jan 10, 2018 at 11:23:04PM +0000, Aleksa Sarai wrote:

The GPU usecases in particular show how useful hooks are (they even modified the hook design in LXC so that the GPU usecase worked better).

Is this a reference to lxc/lxc#1826? That sounds like it could be handled in our existing pre-start hook 1 or by the orchestrator between the runtime ‘create’ and ‘start’ calls. Are there other links into the GPU-hook discussion?

cyphar commented 6 years ago

Yes, that's the issue I was referring to. Most of the discussion about hooks for GPU happened in-person at the time (this was at Linux Plumbers) so I can't really link to any of it.

sighoya commented 1 year ago

Anything new to report here?

Would like to use it as podman hook to retrieve ip information of the used interfaces in the container.

After the container stops, all that information is lost.