Refactor prober_manager to update the runtime state vs. API state

SergeyKanzhelev commented 1 year ago

This is part of pre-work for sidecar KEP: https://github.com/kubernetes/enhancements/issues/753
See also uber issue: https://github.com/kubernetes/kubernetes/issues/115934

In sidecar container KEP we plan to use the Startup probes state as a way to decide whether the next Init container need to start being initialized. Before this KEP probes results were not used in any of Pod lifecycle decisions. In order to implement this we need to have a probe status at a time we call findNextInitContainerToRun. This may need some refactoring on how prober_manager is initialized and used.

Today we calculate changes needed to be performed on Pod including the next init container to run. Then we apply these actions and update the Status of a Pod on API server using convertStatusToAPIStatus from generateAPIPodStatus. Once we converted the runtime status to API status, we also call kl.probeManager.UpdatePodStatus(pod.UID, s) passing the API status object to be updated with the probes results. Thus the probe results are not visible to the findNextInitContainerToRun function.

The proposal is to refactor the prober_manager.go to operate on kubecontainer.PodStatus instead of v1.PodStatus. This will require adding the probes status to the kubecontainer.PodStatus. We will also move the call into the prober_manager.go early in the syncPod execution so we know the startup probes result before we call into findNextInitContainerToRun.

cc @smarterclayton

/sig node /kind feature /priority important-soon

SergeyKanzhelev commented 1 year ago

/triage accepted /assign @matthyx

smarterclayton commented 1 year ago

On the surface this seems reasonable:

PodStatus is purely synthetic - it's source of truth from runtime, aggregate, and not durable
v1.Pod has a mix of truth from different sources - it's easy for callers to get confused about which field in v1.Pod is authoritative at any time (i.e. the phase of the pod from the API is different from the phase of the pod the kubelet is aware of, since both are sources of truth but the kubelet has the normal owner), and so it's better not to mix more things in

A couple of points:

We should make sure the code path that needs the probe results doesn't get confused elsewhere in the kubelet (i.e. people expect it to be set, but it's only populated in one path) and consider whether we should populate prober status in other code paths besides the pod worker
It's been a long time since we added things to the PodStatus, so while it's probably fine, I can't guarantee we have some subtle bug because of a detail locked in someone's head. If we find a detail, we need to document it more clearly

gjkim42 commented 1 year ago

Today we calculate changes needed to be performed on Pod including the next init container to run. Then we apply these actions and update the Status of a Pod on API server using convertStatusToAPIStatus from generateAPIPodStatus. Once we converted the runtime status to API status, we also call kl.probeManager.UpdatePodStatus(pod.UID, s) passing the API status object to be updated with the probes results. Thus the probe results are not visible to the findNextInitContainerToRun function.

After digging into this code area, I found that findNextInitContainersToRun is called by (*kubeGenericRuntimeManager).computePodActions and *kubeGenericRuntimeManager has its own probe managers. (It already has been using those probe managers in computePodActions.)

https://github.com/kubernetes/kubernetes/blob/d48b8167f7094bfae89ab601d2356868ba107627/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L115-L118

The proposal is to refactor the prober_manager.go to operate on kubecontainer.PodStatus instead of v1.PodStatus. This will require adding the probes status to the kubecontainer.PodStatus. We will also move the call into the prober_manager.go early in the syncPod execution so we know the startup probes result before we call into findNextInitContainerToRun.

IMHO, one alternative could be that let kuberuntime.PodStatus be as it is and use those probe managers in findNextInitContainersToRun.

@matthyx @SergeyKanzhelev What do you think?

matthyx commented 1 year ago

What do you think?

it might work, but here I have implemented your other idea https://github.com/SergeyKanzhelev/kubernetes/pull/3

k8s-triage-robot commented 1 year ago

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged. Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Deprioritize it with /priority important-longterm or /priority backlog
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

matthyx commented 1 year ago

This one will be worked on after Sidecars are in GA.

k8s-triage-robot commented 10 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

matthyx commented 10 months ago

/remove-lifecycle stale

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

vaibhav2107 commented 6 months ago

/remove-lifecycle stale

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

matthyx commented 3 months ago

still waiting for sidecars GA (1.32) /remove-lifecycle stale

k8s-triage-robot commented 3 weeks ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

kubernetes / kubernetes

Refactor prober_manager to update the runtime state vs. API state #115937