kubernetes / enhancements

Enhancements tracking repo for Kubernetes
Apache License 2.0
3.39k stars 1.46k forks source link

Sidecar Containers #753

Open Joseph-Irving opened 5 years ago

Joseph-Irving commented 5 years ago

Enhancement Description

/sig node

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

nfickas commented 4 years ago

Not sure if everyone else here saw but after doing some digging and watching a kubecon video, I found that Lyft had done something similar to this. Here is the mentioned commit from their fork of kubernetes: https://github.com/lyft/kubernetes/commit/ba9e7975957d61a7b68adb75f007c410fc9c80cc

kfox1111 commented 4 years ago

As an Istio + Kubernetes shop, we have also been waiting anxiously for this feature. And growing increasingly frustrated that it slips from release to release.

I'm a potential istio user but have kept to the sidelines a bit due to waiting for a feature like this. During the discussions above though, I keep seeing things that make me think that the sidecar feature alone as discussed here will not fix all the problems the istio sidecar has with the workflow. It may help though. Which I think is part of the reason this has stalled.

How does running istio in a sidecar when using the istio cni driver work? I believe init containers trying to reach the network will still fail to function properly as documented in the istio documentation.

hence my question above if network sidecars are its own thing.

thockin commented 4 years ago

This issue seems to be a prime example of how difficult it may be to contribute if you're not an employee of eg. Google, RedHat or other big player.

Hah! What you don't know is that those people get stuck sometimes too!

Seriously, I am sorry. I have excuses, but that sucks so I won't bother.

kfox1111 commented 4 years ago

For clarification: I'm not implying that we shouldn't merge this as alpha to get some feedback on the approach. In general I think it is sound. I think there are a few holes in the use cases such as service meshes that it doesn't quite cover. But that is not a reason to block getting this in asap so we can find all the other use cases that it doesn't cover so that we can make the beta version of the feature work well for everyone. That is precisely what an alpha is for IMO.

I'm just mentioning what I did, specifically to the folks hoping this will be a silver bullet to the existing service mesh issue. I don't think the alpha as proposed will fully fix that particular issue. So don't get your hopes up too high just yet. But please, lets not block this feature just because it doesn't support everybody just yet.

dims commented 4 years ago

I've requested for an exception, let's see if we can try to get this in: https://groups.google.com/d/msg/kubernetes-sig-release/RHbkIvAmIGM/nNUthrQsCQAJ

craigbox commented 4 years ago

Maybe it was you or somebody else in another [Kubernetes Podcast] episode who felt really bad that this sidecar KEP didn't make it to 1.16

Please see episodes 72, with Lachie Evenson, and 83, with Guinevere Saenger. I even called out this week that PR reviews are required to get this one issue over the line. We can do it!

Are there any Istio maintainers on here? A lot are Googlers, and might have some more sway with the K8s folks internally.

@duderino and @howardjohn have both commented on this thread already.

kikisdeliveryservice commented 4 years ago

To be clear we need merged: kubernetes/kubernetes#79649 kubernetes/kubernetes#80744

Are there any other PRs we should be tracking?

Thanks!

Joseph-Irving commented 4 years ago

Big thanks to everyone who posted messages of support (publicly or privately) that was very much appreciated ❤️

There was a valiant effort by members of the community to try and get this into 1.18, including the release team who accepted an extension request, but alas, the decision has been made to defer this to 1.19. You can see the relevant conversation starting from this comment: https://github.com/kubernetes/kubernetes/pull/80744#issuecomment-595292034.

Despite it not getting into 1.18, this has had a lot more attention in the past few days than it has had in quite a while, so I'm hoping that this momentum will carry forward into 1.19.

cc @jeremyrickard, @kikisdeliveryservice

andrew-waters commented 4 years ago

Great stuff @Joseph-Irving, sounds like some of your frustrations have been worthwhile and listened to. Thanks for persevering.

dims commented 4 years ago

/milestone v1.19

thockin commented 4 years ago

Hi all. A group of us have been discussing this topic over the last week.

First, we apologize for what has happened here. We are not happy about it.

This PR and associated KEP have brought to light a number of things that the project can do better. We would like to separate the social, procedural, and technical concerns.

Socially, this feature fell victim to our desire to please each other. Derek approved the KEP, despite reservations expressed within the SIG, because Clayton and Tim were pushing for it. We all trust each other, but apparently we don’t always feel like we’re able to say “no”. We know this because we have all done the exact same thing. None of us want to be the blocker for the next great idea.

Trusting each other has to include trusting that we can say “no” and trusting that when someone says “no”, they are doing so for good reasons. This technical area spans SIGs, and we should NOT pressure sig-node, who will ultimately be the ones to field problems, into accepting new features they are not yet comfortable to support.. This is not about Tim or Derek or Clayton in particular, but ALL of the high-level approvers and SIG leads and “senior” contributors.

This feature also fell victim to procedural uncertainty around KEPs. As a KEP reviewer, am I obligated to be a code reviewer? To delegate to a code reviewer? Or just to read the KEP? As KEPs span releases, how do we ensure a shepherd is available for the set of changes budgeted in a particular span of releases. If a KEP spans SIGs, how do we budget and allocate time across the SIGs? We need to clarify this. We’re going to work on some KEP change-proposals (KEP KEPs) to strengthen the definition of roles in the KEP process.

Technically, this feature fell victim to both time and attention. Reviewers didn’t make enough time to review it, or it simply was not high enough priority for them. Back-and-forth discussions take time. Circumstances and our understanding of the problem space change over time.

As more users adopt Kubernetes, we see an increasing number of weird edge-cases or flakes get reported to sig-node. Since the Pod lifecycle is core to Kubernetes, any change made to that subsystem MUST be undertaken carefully. Our ability to merge new features must be balanced with our desire to improve reliability. How we are thinking about the Pod lifecycle today is a bit different than how we thought of it when this feature was started. This does not diminish the use-cases leading up to this in any way, but it does suggest that long-running KEPs need to be periodically re-reviewed over time.

We think we need to do a bit of first-principles thinking around Pod lifecycle. What do we really want? We tried not to descend into too much complexity, but we fear we merely broke that complexity up into multiple phases, and the net result may be MORE complex than just tackling it head-on.

What does that mean for this PR and the associated KEP? We’re not 100% sure. It probably means we should NOT push this through yet, though.

Derek raised some concerns around the shutdown sequencing. The KEP called them out of scope for now, but there’s some hesitation. We already don’t respect graceful termination on node shutdown, and that has surprised many users. That’s not this KEP’s fault, but it let’s call it “extenuating circumstances”. If anyone uses sidecars to “clean up” their pods (e.g. to drain cached logs into a logging service), they will expect (reasonably) some clear and useful semantics around shutdown, which this KEP doesn’t guarantee.

Tim has concerns that init-sidecars will need to become a thing, and that doesn’t feel right. He waived that concern in the past, but it still bothers him.

We need SIG Node to help define what the medium-term goal is for pod lifecyle, and what their appetite is for taking that on. If we can agree that this is an incremental step toward the goal, we can unblock it, but unless we know the goal, we’re probably over-driving our headlights.

Let us all be the first to say that this stinks. We have real problem statements, a passionate contributor, and a set of well-meaning maintainers, and we ended up ... here. Tim will volunteer his time to help brainstorm and design. Derek will push node-shutdown work for the current pod lifecycle to ensure we have a stable base to grow it further. We’ll need to spec very carefully what guarantees we can and cannot make in the face of unplanned machine failures.

Thanks, Clayton, David, Dawn, Derek, John, Tim

thockin commented 4 years ago

To try to spur some forward movement: Derek or Dawn - is there anyone is sig-node who can make time to do some brain-storming about a more holistic pod and container lifecycle?

derekwaynecarr commented 4 years ago

@thockin will add this to sig-node agenda.

naseemkullah commented 4 years ago

@thockin @derekwaynecarr whats the tl;dr as to why this could not go in?

One-line enhancement description: Containers can now be a marked as sidecars so that they startup before normal containers and shutdown after all other containers have terminated.

Sounds like something that would make life easier in this new era of service mesh sidecars.

Furthermore any recommendations for having sidecars start before main app containers and shutdown after main app container termination today?

itskingori commented 4 years ago

... whats the tl;dr as to why this could not go in?

@naseemkullah From https://github.com/kubernetes/enhancements/issues/753#issuecomment-597372056 ... 👇

What does that mean for this PR and the associated KEP? We’re not 100% sure. It probably means we should NOT push this through yet, though.

Derek raised some concerns around the shutdown sequencing. The KEP called them out of scope for now, but there’s some hesitation. We already don’t respect graceful termination on node shutdown, and that has surprised many users. That’s not this KEP’s fault, but it let’s call it “extenuating circumstances”. If anyone uses sidecars to “clean up” their pods (e.g. to drain cached logs into a logging service), they will expect (reasonably) some clear and useful semantics around shutdown, which this KEP doesn’t guarantee.

[...]

We need SIG Node to help define what the medium-term goal is for pod lifecyle, and what their appetite is for taking that on. If we can agree that this is an incremental step toward the goal, we can unblock it, but unless we know the goal, we’re probably over-driving our headlights.

krancour commented 4 years ago

Respectfully, I am curious as to whether any leads plan to prioritize sorting this out. @Joseph-Irving put an enormous amount of work into this and a staggering number of people who would have been happy with his solution are anxious to hear some superior solution from those who nixed this.

kfox1111 commented 4 years ago

Minimally, even though there are qualms with a few aspects of it, I think it is still reasonable to get in as an Alpha in order to find what issues will show in practice. Can we get this merged? The issues can block it from going Beta so I don't think its critical to get perfect before an initial Alpha is made.

howardjohn commented 4 years ago

will add this to sig-node agenda.

@thockin @derekwaynecarr is there any update on the current state of this? I looked through the sig-node meeting notes and don't see anything about this.

There are a large number of devs on this thread who would be more than happy to contribute time to getting this implemented, as its critical for many use cases (the KEP itself has 2.5x as many :+1: as any other KEP). What can we do to make this happen? Having a list of prerequisites to stability of this area, even if it may span many releases to accomplish, that we could start actively working on would be a huge improvement from where we are today.

palnabarun commented 4 years ago

Hi @Joseph-Irving @thockin @khenidak @kow3ns -- 1.19 Enhancements Lead here, I wanted to check in if you think this enhancement would graduate in 1.19?

In order to have this part of the release:

  1. The KEP PR must be merged in an implementable state
  2. The KEP must have test plans
  3. The KEP must have graduation criteria.

The current release schedule is:

Joseph-Irving commented 4 years ago

@palnabarun, As per this comment https://github.com/kubernetes/enhancements/issues/753#issuecomment-597372056, this KEP has been put on indefinite hold, so no it won't be graduating in 1.19.

palnabarun commented 4 years ago

Thank you @Joseph-Irving for clarifying the situation. :+1:

Appreciate your efforts!

thockin commented 4 years ago

To everyone who is eager to get this in, and again to @Joseph-Irving - I am personally very sorry for this situation. I want this (or something like it), too, but the fact of the matter is that sig-node has more work to do than people to do it right now, and they are not ready to consider this.

It sucks. I get it. I really really do.

The best way people could help is to jump into sig-node and help make more capacity by taking on code-reviews and issue triage, by fixing bugs and tests, and by building toward a place where the sig-node experts have more capacity and confidence in making such a change.

mikebrow commented 4 years ago

sig-node has more work to do than people to do it right now

Understood. We've been promoting, with emphasis, sig-node's capacity needs internally. We are bringing on and mentoring sig-node OSS volunteers, some experienced some new, all with a desire to work in this space (four so far). I'll be citing to your comment @thockin, thank you!

palnabarun commented 4 years ago

/milestone clear

tariq1890 commented 4 years ago

The best way people could help is to jump into sig-node and help make more capacity by taking on code-reviews and issue triage, by fixing bugs and tests, and by building toward a place where the sig-node experts have more capacity and confidence in making such a change.

@thockin Could you provide the links like repositories, mailing lists, guides etc. ? That would help people get an idea on how to engage with sig-node effectively. This particular feature request is over 2 years old with no resolution in sight.

dims commented 4 years ago

@tariq1890 folks writing this KEP have done everything right. they have left no stones unturned. The issue here is exactly what @thockin said, there's tech debt we need to fix first and hands are needed for that before we can consider this one. So the ask is for folks to help out with what needs to be done.

Please see the latest update here : https://github.com/kubernetes/enhancements/pull/1874

tariq1890 commented 4 years ago

@dims I think I've been misunderstood. What I meant to say is that we need a list of actionable targets and goals. If there is tech debt to be dealt with, then we could maintain a GitHub Milestone or provide a bulleted list of pending action items in the OPs comment so that people visiting this issue can know right away what needs to be addressed.

I am definitely willing to offer my help to sig/node with advancing this KEP, but just don't know how

dims commented 4 years ago

@tariq1890 the specific ask is here : "prerequisite on the (not yet submitted KEP) kubelet node graceful shutdown" https://github.com/kubernetes/enhancements/pull/1874/files#diff-c6212b56619f2b462935ad5f631d772fR94

We need to get that started. Someone has to take point and get that going.

-- Dims

Joseph-Irving commented 4 years ago

So the summarise https://github.com/kubernetes/enhancements/pull/1874 for those in this issue: Sig-node (and others) think it is unwise to introduce a new feature like this KEP, which adds more complex behaviour to pod termination, while there is still the more generic problem of pod termination while a node is being shut down.
So it's been decided that this feature won't progress until the solution to node termination has been implemented. There's currently a google doc here: https://docs.google.com/document/d/1mPBLcNyrGzsLDA6unBn00mMwYzlP2tSct0n8lWfuRGE Which contains a lot of the discussion around the issue, but the KEP for this is yet to be submitted. There are still open questions so commenting on there could be helpful, I believe @bobbypage and @mrunalp are leading this effort so perhaps they can share any other ways people could assist with moving this forward.

dims commented 4 years ago

@Joseph-Irving thanks a ton for summarizing. I am hoping all the +ve energy on this enhancement translates to more participation from everyone in sig-node on a regular basis and not just a one off for features. There's plenty of work to do and very few hands.

rata commented 4 years ago

Hi! One more comment regarding this KEP, though: I raised some edge cases about this KEP in past SIG-node meetings (June 23 if you want to watch the recordings) and we decided that the proper way to continue that discussion is opening PRs about those issues so we can decide how is best to proceed.

I'm currently working on a PR to state those issues and some alternatives I can think of.

Also, the KEP state is now provisional (instead of implementable) so it can be reviewed and only set to implementable again when all people agree feel comfortable to move forward with the KEP.

I think this was the only missing bit of information in this issue. Thanks!

mattfarina commented 4 years ago

@rata Did you open issues/PRs on the proper way to handle the issues?

Joseph-Irving commented 4 years ago

@mattfarina This is the PR https://github.com/kubernetes/enhancements/pull/1913 It contains a number of proposed solutions to current problems/edge cases in the KEP Also contains details a number of alternatives that were discussed and decided against, so that we have a better log of why certain decisions have been made.

trondhindenes commented 4 years ago

I would very much like to see the sidecar functionality also cover scaling: Today, HPA scaling is based on a metric (such as cpu). If the pod contains more than one container, the average across all containers is used (as far as I know). For pods with sidecar (app+nginx etc) this makes it very hard to make scaling function correctly. I was hoping that the sidecar implementation in Kubernetes would include marking one container in the pod as "authorative" in terms of metrics used for HPA scaling.

howardjohn commented 4 years ago

I would very much like to see the sidecar functionality also cover scaling:

I agree this would be useful but its not necessarily "sidecar" specific and since the implementation is uncoupled from this it may make sense to make it a separate issue - this one is already very complex. I am also not convinced you want to just ignore the sidecar. We may want per-container HPA scaling instead, for example. Not sure - would need exploring as its own issue I think.

shaneqld commented 4 years ago

Does anyone have any reference to, or could be so kind to share, the current workaround for this issue, specifically for the case of the Istio Envoy sidecar?

I recall a possible workaround involving:

XSAM commented 4 years ago

Does anyone have any reference to, or could be so kind to share, the current workaround for this issue, specifically for the case of the Istio Envoy sidecar?

We use a custom daemon image like a supervisor to wrap the user's program. The daemon will also listen to a particular port to convey the health status of users' programs (exited or not).

Here is the workaround:

As a result, the users' process will start if Envoy is ready, and Envoy will stop after the process of users is exited.

It's a complicated workaround, but it works fine in our production environment.

q42jaap commented 4 years ago

The link in the description gives a 404

https://github.com/kubernetes/enhancements/blob/master/keps/sig-apps/0753-sidecarcontainers.md

I think it should be:

https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0753-sidecarcontainers.md

Joseph-Irving commented 4 years ago

yeah it was moved in https://github.com/kubernetes/enhancements/pull/1913, I've updated the link

cainelli commented 4 years ago

Does anyone have any reference to, or could be so kind to share, the current workaround for this issue, specifically for the case of the Istio Envoy sidecar?

@shaneqld for startup issues, the istio community came up with a quite clever workaround which basically injects envoy as the first container in the container list and adds a postStart hook that checks and wait for envoy to be ready. This is blocking and the other containers are not started making sure envoy is there and ready before starting the app container.

We had to port this to the version we're running but is quite straightforward and are happy with the results so far.

For shutdown we are also 'solving' with preStop hook but adding an arbitrary sleep which we hope the application would have gracefully shutdown before continue with SIGTERM.

kikisdeliveryservice commented 4 years ago

Related PR: https://github.com/kubernetes/enhancements/pull/1980

kikisdeliveryservice commented 4 years ago

Hi @Joseph-Irving @thockin and everyone else :smile:

Enhancements Lead here. I see that there is still a ton of ongoing conversation, but as a reminder, please keep us updated of any plans to include this in 1.20 are decided so we can track the progress.

Thanks! Kirsten

rata commented 4 years ago

@kikisdeliveryservice will keep you posted, thanks!

Zachery2008 commented 3 years ago

Does anyone have any reference to, or could be so kind to share, the current workaround for this issue, specifically for the case of the Istio Envoy sidecar?

@shaneqld for startup issues, the istio community came up with a quite clever workaround which basically injects envoy as the first container in the container list and adds a postStart hook that checks and wait for envoy to be ready. This is blocking and the other containers are not started making sure envoy is there and ready before starting the app container.

We had to port this to the version we're running but is quite straightforward and are happy with the results so far.

For shutdown we are also 'solving' with preStop hook but adding an arbitrary sleep which we hope the application would have gracefully shutdown before continue with SIGTERM.

Could you show some insights in detail how to do these? How to achieve adding 'pre-stop' to the Istio-proxy sidecar? It seems it needs some custom configuration or use custom sidecar. I face the same issue that when pods scales down, the main container is trying to finish the jobs but it loses connection to outside probably because the Istio-sidecar closed immediately after SIGTERM. Right now I just use the default sidecar injection. Thank you!

tariq1890 commented 3 years ago

Ok this thread is getting hijacked. Let's stay on topic, please.

kikisdeliveryservice commented 3 years ago

Just a gentle reminder that Enhancements Freeze is next week, Tuesday, October 6th. By that time the KEP would need to be updated to be marked implementable.

Also the KEP is using an older format, so updating would be great (once you finish hammering out the details): https://github.com/kubernetes/enhancements/tree/master/keps/NNNN-kep-template

rata commented 3 years ago

@kikisdeliveryservice thanks for the remainder. Will do if it is decided to be included for 1.20. Thanks! :)

rata commented 3 years ago

This won't be part of 1.20. Thanks a lot for pinging! :)

Atreus-Technologies commented 3 years ago

I have an interest in this issue, and wanted to thank both @Joseph-Irving and @howardjohn for their insights on this, which helped resolve some of my questions.

I don't want to hijack this proposal, but based on the conversations above, I wonder if this is maybe a slightly broader/larger issue than has been recognised so far.

I can imagine the following solutions to this issue -

  1. Define a new container entity "sidecar container" which starts after initContainers, before "main containers", and terminate after "main containers" terminate (per @Joseph-Irving original proposal)
  2. Define an additional field on (1) which sets whether the "sidecar container" starts before the initContainer(s) per @luksa suggestion).
  3. Go broader.

Personally, option (2) solves my immediate problem.

But I'm wondering if these questions don't speak to a more strategic issue in K8s around scheduling and how we define a pod. In my specific (Istio related) case, I suggested something like runlevels within pods.

Option (2) solves my problem too, but I can imagine even more complex dependency structures which might call for embedding a DAG of container dependencies within a pod/statefulSet/daemonSet/whatever - this is the option (3) I am thinking of.

Just wondering if this issue should really be re-focused on the pod definition itself, with a view to creating something more generic? I originally thought in terms of a runlevels analogy, but maybe an Airflow-like DAG structure would have the broadest applicability.

michalzxc commented 3 years ago

What about adding envoy as init container as well? This way it will provide network for other init containers. When init will finish it would 'exit 0' as well, and then regular envoy (not init) would take over