kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
109.96k stars 39.35k forks source link

Static pod Lifecycle Formalization (was: Decide fate of mirror pods) #16627

Open yujuhong opened 8 years ago

yujuhong commented 8 years ago

I would like to open a discussion on how and whether we should continue support mirror pods, given that 1) mirror pods need to be treated differently in multiple places in kubelet, making them brittle, and 2) there are some limitations about mirror pods that may cause confusion (e.g., what to expect if user modifies mirror pods)?

Background kubelet watches a few sources for pods: apiserver, http, and files in a local directory. We support non-apiserver sources for several main use cases:

These pods are sometimes called static pods in the codebase.

For pods from non-apiserver sources, kubelet would create a corresponding mirror pod in the apiserver and continues to update the status. These mirror pods serve two main purposes:

  1. Allow users to inspect the pod status through kubectl.
  2. Allow scheduler to account for resource usage.

Note that currently the mirror pods may not reflect truthfully what's been running on the node because apiserver can apply default (or cluster-specific) values to certain fields in the mirror pod, but kubelet will adhere to the original pod spec obtained from the source.

[Option 1]: Deprecate mirror pods completely.

[Option 2]: Only allow the use of static pods in standalone kubelets.

[Option 3]: Continue supporting mirror pods, but mark them readonly (non-updatable) in apiserver

[Option 4]: Continue supporting mirror pods, and kubelet should sync to the mirror pods

[Option 5]: Option 4 + updatable mirror and static pods

[Option 6]: Maintain status quo

/cc a couple folks who have expressed opinions about mirror pods: @bgrant0607 @thockin @davidopp @kubernetes/goog-node

timstclair commented 8 years ago

It option 4, what happens if the static pod source is updated? Is there a way to delete a running static pod?

yujuhong commented 8 years ago

It option 4, what happens if the static pod source is updated? Is there a way to delete a running static pod?

If the static pod source is updated, kubelet should detect that and recreate the mirror pod. This would destroy the containers though. Another option is to only allow kubelet to update the mirror pod.

timstclair commented 8 years ago

Ok, thanks for the nice write up. Do you want to a note about what we do now? I believe from our discussion on Tuesday that we currently do something like option 5, but update inconsistently and don't address the 2 issues you raised.

dchen1107 commented 8 years ago

If we want to continue supporting static pods, we have to support mirror pods. But I prefer to mark it read-only in APIserver.

yujuhong commented 8 years ago

Ok, thanks for the nice write up. Do you want to a note about what we do now? I believe from our discussion on Tuesday that we currently do something like option 5, but update inconsistently and don't address the 2 issues you raised.

Added Option 6, which is what we use now.

yujuhong commented 8 years ago

If we want to continue supporting static pods, we have to support mirror pods. But I prefer to mark it read-only in APIserver.

If we go with option 2 and support static pods only on standalone kubelets, we wouldn't need mirror pods :-)

davidopp commented 8 years ago

Once we have DaemonSet, what are the use cases for static pods on non-standalone kubelets?

mikedanese commented 8 years ago

Static pods are required for cluster bootstrapping. This may fall under the purview of "standalone" mode though.

resouer commented 8 years ago

+1 for static pods value in cluster bootstrapping.

I also know use cases that some companies integrate standalone kubelet into their own platform. I can see this is a value of kubernetes as you don't have to buy the whole idea.

bgrant0607 commented 8 years ago

Quick comment for now: Not setting default values for mirror pods in apiserver is not an option, since that would break conversion.

yujuhong commented 8 years ago

Quick comment for now: Not setting default values for mirror pods in apiserver is not an option, since that would break conversion.

Noted but none of the options involve not setting default values for mirror pods. However, making mirror pods read-only (non-updatable) seems useful.

dchen1107 commented 8 years ago

Yes, cluster bootstrapping requires static pod support, but it could fall into standalone mode. But what are the procedures to hand over to cluster-mode from standalone mode? How to manage the first booted APIServer? Before we figure out all of them, we still need mirror pods. Again, I prefer to make them read-only for APIServer.

pmorie commented 8 years ago

OT: the title of this issue made me lol

yujuhong commented 8 years ago

OT: the title of this issue made me lol

I'm glad that it provided some entertainment value, even though no definite conclusions could be drawn from the discussions yet ;)

pmorie commented 8 years ago

@yujuhong Entertainment value is always appreciated :)

I was talking to @spencerbrown today and he has a use-case for static pods. This is the first data point that I personally have of someone actually using the kubelet in standalone mode to launch static pods in production, but it seems like there is at least one person doing this. Let that be considered while we debate the fate of these features.

mikedanese commented 8 years ago

StaticPod != MirrorPod. IIUC we are not discussing the fate of static pods, we like those. We are discussing the fate of the practice of pushing a read-only representation of them into the API server.

pmorie commented 8 years ago

@mikedanese I agree -- the use case we were discussing had a touchpoint with mirror pods.

mikedanese commented 8 years ago

Ok I missed it sorry

pmorie commented 8 years ago

@mikedanese You didn't miss it -- I didn't write it :) Was more tagging spencer in so he could weigh in.

gmarek commented 8 years ago

In #19436 @yujuhong suggested that:

IIUC we want to treat mirror pods exactly as normal ones, and that idea would mean that we start treating them differently and that we need a way to distinguish them from the ordinary ones (kubernetes.io/config.mirror label is good enough). We should decide what's the fate of mirror pods, so we'll be able to solve the mirror pod deletion problem appropriately, instead of duct-taping it. cc @davidopp @dchen1107 @bgrant0607

bgrant0607 commented 8 years ago

We're planning to give daemons "forgiveness" (#19567, #1574, #18263) so that they aren't evicted when nodes are not Ready. We could do similarly for mirror pods. At some point we'll have a general mechanism for expressing this.

cc @mml

bgrant0607 commented 8 years ago

How much resources are required by Kubelet+Docker for, say, 40 pods?

How about for a hyperkube-based single-node cluster without the scheduler, for the same number of pods?

mml commented 8 years ago

While I understand that static pods are useful, I don't understand the value of having mirror pods at all. The special cases will only continue. The kubelet can know about static pods (perhaps we should stop calling them "pods" and then the desire for a mirror will fade) and report them via a monitoring interface, but no other components should need to know about them, since they're by design not manageable, so I'd like to see them dropped from the apiserver altogether.

yujuhong commented 8 years ago

How much resources are required by Kubelet+Docker for, say, 40 pods?

How about for a hyperkube-based single-node cluster without the scheduler, for the same number of pods?

@bgrant0607, how is the resource usage related to mirror pods? Wrong issue?

bgrant0607 commented 8 years ago

@mml The main value in mirror pods over just accounting for the resources is in being able to see their status (container health, restarts, etc.) and events via normal means. The mirror pods are actual pods that we'd otherwise like to behave normally.

If not for the case of debugging the cluster control-plane components, option 2 would be appealing. DaemonSet could be used for daemons on cluster nodes and the "master" node(s) could be treated as standalone Kubelets. Maybe that is worth the simplification.

If we only supported one source at a time, we could eliminate the distinction between the sources instead of treating them independently (#15195). That would facilitate tricks like starting kubelet in one mode (perhaps via --runonce) and restarting it in another (e.g., apiserver mode), adopting the running pods in the case that both sources specified the same pods.

@yujuhong Sorry for not explaining. I was thinking about alternative solutions to some of the use cases, but nevermind.

gmarek commented 8 years ago

Can we put making this decision on our 1.3 list?

@bgrant0607 - I'm not quite sure what do you mean by treating "master" nodes as standalone Kubelets and how it relates to self-registration of them. Do we want to drop master self-registration effort?

bgrant0607 commented 8 years ago

@gmarek I was referring to the "standalone" terminology in this issue's description.

Yes, if treated as a standalone Kubelet, the master would not self-register. I was just pointing that out as a possibility. And hopefully only temporary, in any case. Could the apiserver proxy be used to access container status of the standalone kubelet for debugging purposes?

If we wanted to retain self-registration and mirror pods, then it would help if we implemented more self-hosting features (#246), such as checkpointing #489. And HA would be a necessity if we didn't want to risk bricking the cluster. That would permit "pivoting": create a standalone master node, launch self-hosted replicas of control-plane components, then nuke the standalone node, allowing control to fail over to the self-hosted replicas.

The update-from-static-manifest process could also be extracted into a separate container, if that would be easier. It seems like this could mostly work like kubectl apply, which has almost the same problem. The main difference would be the ability to act upon updates even if apiserver were down, which is only needed if this mechanism is used to run the control plane in the steady state.

yujuhong commented 8 years ago

IMO, there are two main pain points with mirror pods in today's implementation.

  1. Mirror pods behaves very differently than other pods. This complicates the kubelet codebase, making it error-prone, and/or increases the difficulty of debugging. If we decide to keep mirror pods in the future, perhaps it's for the best to decouple mirror pods from regular paths in kubelet and move all operations into one component. This will make maintenance easier.
  2. The fact that users can modify mirror pods is confusing and misleading. Users are not given much clue on why the pods are different at the api level.

@bgrant0607, I don't quite understand why the self-hosting features will help with the pain points.

bgrant0607 commented 8 years ago

@yujuhong In what ways are mirror pods different than other pods?

Self-hosting would help because it would enable us to get the benefits of mirror pods without actual mirror pods.

yujuhong commented 8 years ago

@yujuhong In what ways are mirror pods different than other pods?

To name a few:

  1. The regular ADD/DELETE/UPDATE operations on a mirror pod needs to be distinguished and handled differently.
  2. Extra logic to handle creation/deletion of mirror pods when we sync a static pod (manifest file)
  3. UID lookup to map a mirror pod's UID to a static pod's UID to support various user operations (exec, log, etc).
  4. Reverse UID lookup when updating the pod status.
  5. Logic to correct the mirror pod spec modified by users or other components (e.g., node controller) or the lack thereof.

There is also a race condition at cluster startup scheduler will not be aware of the resource consumption of the static pods until the mirror pods are created. During that window, scheduler may overcommit the node. Of course, kubelet will reject the pods, but this is slightly annoying.

Self-hosting would help because it would enable us to get the benefits of mirror pods without actual mirror pods.

I see it now that self-hosting will free kubelet from the responsibility of maintaining the mirror pods. Without the ability to stop users from modifying these pods via the apiserver, we still have the problem of resolving two source of truth and the potential conflicts.

UPDATE:

saad-ali commented 8 years ago

I'm interested in the outcome of this for the design https://github.com/kubernetes/kubernetes/issues/20262.

No strong preference between the options.

xinxiaogang commented 8 years ago

If we continue to support static pod, I think mirror pod is useful for us to inspect the pod status from apiserver. While, to keep it simple, I would support to mark mirror pod as read-only from apiserver, to let kubelet sync the real status from static pod. I don't think it is a good idea to accept changes from apiserver on mirror pod and make manifest file changes, this would make the cluster operation hard and bring potential drift to our cluster.

yujuhong commented 7 years ago

@bgrant0607 @dchen1107 is setting mirror pod read-only still a viable option?

bgrant0607 commented 7 years ago

@yujuhong

I have no idea how we'd implement entirely "read-only" API resources. They'd at least have to bypass admission control, much of the registry logic (e.g., uid generation), the defaulting pass, resource-version substitution, self-link generation, ... Defaulting must run in order for validation and conversion to work, and we need to run conversion in order to serialize the state in etcd, in general. Simply blocking updates, however, would be easier, and we might want that for other purposes (related to #10179).

As for static pods themselves, I'm fine with deprecating the cases covered by DaemonSet.

I assume that we (or someone) are using mirror pods for monitoring the master's containers. However, we could potentially create the mirror pods for the master components outside of Kubelet. For instance, we could potentially add a bootstrap dance to kubeadm, including creation of the mirror pods.

More generally, I would like to figure out how to better facilitate apiserver+etcd bootstrapping, whether via #28138 or some other approach. Bootkube proposal, for reference: https://docs.google.com/document/d/1VNp4CMjPPHevh2_JQGMl-hpz9JSLq3s7HlI87CTjl-8/edit

Or, if we were to keep kubelet-generated mirror pods, if we were to make more pod fields updatable (which I realize is non-trivial), presumably sync'ing to state in apiserver would require less special-case code. I'd consider moving the kubelet static-pod code to be based on kubectl apply logic. Perhaps it could even share code with the addon manager.

Elimination of mirror pods would be a breaking behavioral change, though, so we'd need to wait a while before actually removing them, even if we announced their deprecation today.

yujuhong commented 7 years ago

I assume that we (or someone) are using mirror pods for monitoring the master's containers. However, we could potentially create the mirror pods for the master components outside of Kubelet. For instance, we could potentially add a bootstrap dance to kubeadm, including creation of the mirror pods.

kubeadm is moving the the self-hosting model -- where static pods would only be used for bootstrapping, and regular pods will created to replace them afterwards. The need for mirror pods for monitoring the master components may no longer be there in the future.

As for static pods on the (non-master) nodes, many of the default addon pods have already been moved to DaemonSet (e.g., fluentd). The only pod left is kube-proxy, and there is an issue #23225 to address this.

Elimination of mirror pods would be a breaking behavioral change, though, so we'd need to wait a while before actually removing them, even if we announced their deprecation today.

Yes, this would be a breaking change. Given that we are gradually phasing out the use of static pods in all cases, I think mirror pods would naturally go away. We can tolerate supporting this legacy feature until no one needs it.

fejta-bot commented 6 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta. /lifecycle stale

yujuhong commented 6 years ago

Still relevant. /lifecycle frozen

nikhita commented 6 years ago

/remove-lifecycle stale

cynepco3hahue commented 3 years ago

/kind feature

sftim commented 2 years ago

kubeadm is moving the the self-hosting model -- where static pods would only be used for bootstrapping, and regular pods will created to replace them afterwards. The need for mirror pods for monitoring the master components may no longer be there in the future.

Is that still true?

detiber commented 2 years ago

kubeadm is moving the the self-hosting model -- where static pods would only be used for bootstrapping, and regular pods will created to replace them afterwards. The need for mirror pods for monitoring the master components may no longer be there in the future.

Is that still true?

It is not, in fact the self-hosted model was deprecated by kubeadm quite a while ago. @neolit123 could provide more information.

neolit123 commented 2 years ago

Yeah, kubeadm had an experimental implementation of a self-hosted control plane, but it did not survive a node restart. It needed propper "checkpointing" in the kubelet. Related implementations were removed a few releases back.

Static pods are currently used in a number of commercial distros, so we are sort of stuck with their quirks and complexities. Emphemeral storage and admission come to mind.

I am +1 to close this ticket. If SIG Node wishes we can create a new one (that links to the conversation here) to track resolving some of the related tech dept. But someone from SIG Node with sufficient knowledge has to enumerate the current problem areas. On Apr 7, 2022 05:17, "Jason DeTiberus" @.***> wrote:

kubeadm is moving the the self-hosting model -- where static pods would only be used for bootstrapping, and regular pods will created to replace them afterwards. The need for mirror pods for monitoring the master components may no longer be there in the future.

Is that still true?

It is not, in fact the self-hosted model was deprecated by kubeadm quite a while ago. @neolit123 https://github.com/neolit123 could provide more information.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/16627#issuecomment-1091008076, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACRATFQRP3ZS2X4DH2P5NTVDZASJANCNFSM4BTFN7IA . You are receiving this because you were mentioned.Message ID: @.***>

thockin commented 2 years ago

@derekwaynecarr @ehashman @endocrimes @bobbypage - should we close this?

sftim commented 1 year ago

I suggest lazy consensus and close after 2023-01-31 if nobody has objected.

smarterclayton commented 1 year ago

I view static pods as part of the 1.0 supported API (GA) for Kube and therefore thanks to Hyrum's Law they are supported. Multiple distributions leverage some form of static pods.

Our current stance is Option 4 (in practice) and I have not heard sufficient objections to change it to override our general API expectations and guarantees. We have invested in aligning and correcting the issues in static pods that map outside of the Kubelet, specifically termination, cleanup, etc.

At this point I believe the stance of the project should be:

  1. Static pods and mirror pods are GA
  2. Static pods are source of truth at Kubelet (current state) and writes to mirror pods are ignored / overriden
  3. The kubelet must correctly handle the supported set of behavior from config barring a new KEP proposing a change (which would then have to work with the ecosystem safely to remove) - that includes graceful termination, updates, consistent behavior in the rest of the system, admission, etc
  4. Folks most dependent on using static pods should contribute effort - OpenShift has historically driven a lot of this, Google has handled a number of pain points, and a large set of individual contributors has added work over the years.
  5. We should complete documentation of static pods in their current state

I'm kind of semi-maintaining static pods in the Kubelet right now so I obviously have strong opinions. :)

I would propose that we take sftim's lazy consensus proposal around the following text above, but will bring it up in sig-node and sig-arch if there are other perspectives.

smarterclayton commented 1 year ago

(we should also propose a change to the title of this issue)

smarterclayton commented 1 year ago

/assign

thockin commented 7 months ago

@tallclair