Use `shareProcessNamespace` to shut down the proxy when the main container shuts down

Enrico2 commented 4 years ago

Feature Request

What problem are you trying to solve?

Our main app containers have various durations for their graceful shut downs, sometimes even within the same application (e.g. in prod n seconds, in dev m << n seconds). We want the proxy process to stay alive until the main app container shuts down (graceful shutdown might still need network). We also want the proxy container to shut down as quickly as possible after the app has shut down.

How should the problem be solved?

We can achieve this by applying the solution proposed here: https://github.com/linkerd/linkerd2/issues/1869#issuecomment-595456178

It would be fantastic if this solution would be implemented by the proxy injection and instead allow users to set something along the lines of

config.linkerd.io/shutdown-proxy-after-application: "true"

Fwiw this along with linkerd-await is essentially side-car "support" by ensuring the proxy starts before the app, and stops after the app.

Any alternatives you've considered?

The link above, by doing this manually, but we'd like to avoid all the boilerplate involved with adding this to all our applications.

How would users interact with this feature?

a configuration annotation mentioned above.

grampelberg commented 4 years ago

Have you looked at #3798 yet? That solves this specific problem (though, obviously doesn't address the Job issues).

Enrico2 commented 4 years ago

I did. This is why I mentioned the m << n part. Setting a constant time value is not as robust as it being dynamic.

On Fri, Mar 6, 2020, 10:34 AM Thomas Rampelberg notifications@github.com wrote:

Have you looked at #3798 https://github.com/linkerd/linkerd2/pull/3798 yet? That solves this specific problem (though, obviously doesn't address the Job issues).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/linkerd/linkerd2/issues/4146?email_source=notifications&email_token=AABHMIVTZ57KH562AOFVZ6TRGE64BA5CNFSM4LDEWO32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOCL4HI#issuecomment-595901981, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABHMIXEQTCRI7A7JGAPRFTRGE64BANCNFSM4LDEWO3Q .

Enrico2 commented 4 years ago

Also fwiw, this suggestion would also help with #3751, which we have still not narrowed down to a root cause.

grampelberg commented 4 years ago

It sounds reasonable, there are some nasty edge cases but they don't seem like a major blocker. This'll definitely require going through the RFC proces given the potential edge cases. Interested in getting a proposal together there?

Some concerns that I have right now:

You'll need something to handle the SIGTERM that k8s sends and keep it from getting to the proxy as well.
It looks like shareProcessNamespace relies on SYS_PTRACE. That'll have some RBAC implications.

@alpeb @ihcsim where'd we get on making it possible for folks to change the injector configuration on a per-install basis? That'd definitely get rid of my concerns.

Enrico2 commented 4 years ago

I tried this out in our environment and facing some permission issues. See https://github.com/linkerd/linkerd2/issues/1869#issuecomment-596777911.

chicocvenancio commented 2 years ago

Any chance this could be revisited? My main use-cases are where we have a long terminationGracePeriodSeconds to account for uncommon long shutdowns. Two cases for this in our environment are celery workers for long tasks, ideally we keep them alive while the current tasks are still being worked on; and rabbitmq. On the rabbitmq operator the default grace period is a week, and its not too uncommon for a node to need tens of minutes to allow for syncing of the messages it holds as master before being killed.

olix0r commented 2 years ago

It would need some testing, but I suspect that this could be rigged up if the application container uses linkerd-await --shutdown. The proxy could be configured with a waitBeforeExitSeconds that matches the application's terminationGracePeriodSeconds and linkerd-await would terminate the proxy once the application exits.

linkerd / linkerd2