Open Enrico2 opened 4 years ago
Have you looked at #3798 yet? That solves this specific problem (though, obviously doesn't address the Job
issues).
I did. This is why I mentioned the m << n part. Setting a constant time value is not as robust as it being dynamic.
On Fri, Mar 6, 2020, 10:34 AM Thomas Rampelberg notifications@github.com wrote:
Have you looked at #3798 https://github.com/linkerd/linkerd2/pull/3798 yet? That solves this specific problem (though, obviously doesn't address the Job issues).
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/linkerd/linkerd2/issues/4146?email_source=notifications&email_token=AABHMIVTZ57KH562AOFVZ6TRGE64BA5CNFSM4LDEWO32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOCL4HI#issuecomment-595901981, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABHMIXEQTCRI7A7JGAPRFTRGE64BANCNFSM4LDEWO3Q .
Also fwiw, this suggestion would also help with #3751, which we have still not narrowed down to a root cause.
It sounds reasonable, there are some nasty edge cases but they don't seem like a major blocker. This'll definitely require going through the RFC proces given the potential edge cases. Interested in getting a proposal together there?
Some concerns that I have right now:
shareProcessNamespace
relies on SYS_PTRACE
. That'll have some RBAC implications.@alpeb @ihcsim where'd we get on making it possible for folks to change the injector configuration on a per-install basis? That'd definitely get rid of my concerns.
I tried this out in our environment and facing some permission issues. See https://github.com/linkerd/linkerd2/issues/1869#issuecomment-596777911.
Any chance this could be revisited?
My main use-cases are where we have a long terminationGracePeriodSeconds
to account for uncommon long shutdowns. Two cases for this in our environment are celery workers for long tasks, ideally we keep them alive while the current tasks are still being worked on; and rabbitmq. On the rabbitmq operator the default grace period is a week, and its not too uncommon for a node to need tens of minutes to allow for syncing of the messages it holds as master before being killed.
It would need some testing, but I suspect that this could be rigged up if the application container uses linkerd-await --shutdown
. The proxy could be configured with a waitBeforeExitSeconds
that matches the application's terminationGracePeriodSeconds
and linkerd-await
would terminate the proxy once the application exits.
Feature Request
What problem are you trying to solve?
Our main app containers have various durations for their graceful shut downs, sometimes even within the same application (e.g. in prod n seconds, in dev m << n seconds). We want the proxy process to stay alive until the main app container shuts down (graceful shutdown might still need network). We also want the proxy container to shut down as quickly as possible after the app has shut down.
How should the problem be solved?
We can achieve this by applying the solution proposed here: https://github.com/linkerd/linkerd2/issues/1869#issuecomment-595456178
It would be fantastic if this solution would be implemented by the proxy injection and instead allow users to set something along the lines of
Fwiw this along with linkerd-await is essentially side-car "support" by ensuring the proxy starts before the app, and stops after the app.
Any alternatives you've considered?
The link above, by doing this manually, but we'd like to avoid all the boilerplate involved with adding this to all our applications.
How would users interact with this feature?
a configuration annotation mentioned above.