kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.2k stars 6.48k forks source link

Upgrading to v2.26.0 requires setting kubelet_max_parallel_image_pulls explicitly #11613

Closed olevitt closed 1 month ago

olevitt commented 1 month ago

What happened?

Hi !

tl;dr : due to #11094, clusters that defined serializeImagePulls: false prior to v2.26.0 will have the concurrent pull feature stealthly limited to 1 concurrent pull unless they explictly set the new maxParallelImagePulls variable.

11094 introduced a new variable kubelet_max_parallel_image_pulls to control the newly available maxParallelImagePulls kubelet configuration.

The PR chose to set a single parameter kubelet_max_parallel_image_pulls that defines both maxParallelImagePulls and indirectly serialize-image-pulls with the idea that if kubelet_max_parallel_image_pulls is not explicitly defined it will default to 1 and serialize-image-pulls will not matter. The thing is that this is a breaking change for clusters upgrading to v2.26.0 while having already defined serialize-image-pulls to false as the new kubelet_max_parallel_image_pulls set to 1 will stealthly limit to 1 concurrent pull, effectivly disabling the parallel feature.

I think this is borderline close to a breaking change and could warrant a note in the release notes

What did you expect to happen?

I was not expecting the upgrade to v2.26.0 to disable the concurrent pull feature that I had enabled

How can we reproduce it (as minimally and precisely as possible)?

Have a cluster in a version earlier than 2.26.0 with

kubelet_config_extra_args:
  serializeImagePulls: false

in your vars.
Your cluster will have concurrent image pull with no limitation.

Upgrade to 2.26.0 and notice that maxParallelImagePulls is set to 1 in your kubelets configuration effectivly disabling the serializeImagePulls: false that you had active

OS

irrelevant

Version of Ansible

irrelevant

Version of Python

irrelevant

Version of Kubespray (commit)

v2.26.0

Network plugin used

calico

Full inventory with variables

irrelevant, see explanation

Command used to invoke ansible

irrelevant

Output of ansible run

irrelevant

Anything else we need to know

No response

KubeKyrie commented 1 month ago

I think the point is https://github.com/kubernetes-sigs/kubespray/pull/11094 missed setting serializeImagePulls, only set maxParallelImagePulls. Because maxParallelImagePulls cannot be set if SerializeImagePulls is true. and its default value is no-limit.

@olevitt Do u think it's ok to add new variable serializeImagePulls to work together with maxParallelImagePulls. So that we can be consistent with kubelet configuration. And if it's ok, i am glad to open a PR to address it.

KubeKyrie commented 1 month ago

/assign

olevitt commented 1 month ago

Hi !
My understanding of that PR is that the solution of using 2 variables was their first intention and they switched to using only a single variable as in most cases it's enough. See https://github.com/kubernetes-sigs/kubespray/pull/11094#pullrequestreview-2054851212

I'm okay with their reasoning (even if I think I personaly would prefer sticking with something as close as possible to the kubelet but I don't think changing it back now is worth it)

The only scenario that fails with this approach is the one I describe in this issue that is people that would have disabled serialization before that upgrade. Anyone else will have a good experience I think.

So my suggestion is simply adding a note in the release notes to warn upgrading users that would have disabled serialization that they now should specify the max number of concurrent pulls (and they can get rid of their serializeImagePulls: false) and not changing thé codebase with respect to the discussion that happened in the PR at the time

VannTen commented 1 month ago

It wasn't missed, it was explicitly designed out.

In hindsight, I think we should not have added that variable at all, and just continues to rely on kubelet_config_extra_args. The overhead of having specific variables for each setting is not really worth it ; but since we already have a bunch, it was added. I'll modify the release-note block of that PR and see if we can regenerate the release notes of v2.26.0

VannTen commented 1 month ago

Oh, I see, we missed the fact that with undefined maxImagePulls, serializeImagePulls: false means unlimited.

VannTen commented 1 month ago

I edited the PR description and the release notes for v2.26.0 @olevitt does that seems clear enough ?

olevitt commented 1 month ago

I edited the PR description and the release notes for v2.26.0 @olevitt does that seems clear enough ?

Yup, looks good to me, thanks 👍

VannTen commented 1 month ago

I've added it in the Deprecation/Removal at the top.

I'm not exactly sure how the release notes were generated for v2.26.0, it does not appear that the release-notes go tool was used (I get a completely different markdown if I try to use that).

KubeKyrie commented 1 month ago

Oh, I see, we missed the fact that with undefined maxImagePulls, serializeImagePulls: false means unlimited.

Yes, I think the default value should be consistent with kubelet to avoid misunderstanding.