kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
110.42k stars 39.48k forks source link

Running 1.13.x kubelet against 1.12.x apiserver errors due to missing volumeMode #71783

Closed vdboor closed 5 years ago

vdboor commented 5 years ago

Upgrading a cluster from 1.12.3 to 1.13.0 causes pods to break because the "volumeMode" is missing.

What happened:

all pods with volumes became unavailable:

Dec 6 09:31:46 experience kubelet[24011]: E1206 09:31:46.522733 24011 desired_state_of_world_populator.go:296] Error processing volume "media" for pod "djangofluent-tst-test-6cfc6555-9bfm6_fluentdemo(0dd95f6c-ed7e-11e8-afe8-5254000919ee)": cannot get volumeMode for volume: djangofluent-tst-media

Dec 6 09:31:46 experience kubelet[24011]: E1206 09:31:46.919223 24011 desired_state_of_world_populator.go:296] Error processing volume "media" for pod "djangofluent-prd-production-7c765b5c58-6kprb_fluentdemo(0dd8554c-ed7e-11e8-afe8-5254000919ee)": cannot get volumeMode for volume: djangofluent-prd-media

Dec 6 09:47:38 experience kubelet[1926]: E1206 09:47:38.027881 1926 desired_state_of_world_populator.go:296] Error processing volume "redis-data" for pod "redis-master-0_infra(eb93df30-ed7d-11e8-afe8-5254000919ee)": cannot get volumeMode for volume: redis-data-redis-master-0

What you expected to happen:

Kubelet would default to FileSystem when the volumeMode is not present.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

/kind documentation

vdboor commented 5 years ago

/sig node

liggitt commented 5 years ago

/sig storage

liggitt commented 5 years ago

Run a kubernetes 1.12.3 cluster (installed bare metal with kubeadm). apt-get dist-upgrade for kubelet, kubeadm, kubectl systemctl restart kubelet

does this mean you are running a 1.13-level kubelet against a 1.12-level kube-apiserver? kubelets may not be newer than the apiserver they speak to.

vdboor commented 5 years ago

@liggitt Ah, good to know! Yes, I always upgraded the kubelets first, and then performed kubeadm upgrade. 🤦‍♂️ So far it kinda worked (from 1.8 -> 1.12)

That's also because upgrading the master isn't possible until it's kubelet is upgraded first.

liggitt commented 5 years ago

upgrading the master isn't possible until its kubelet is upgraded first

that doesn't sound right. cc @kubernetes/sig-cluster-lifecycle

neolit123 commented 5 years ago

@vdboor what step of the upgrade guide is failing for you: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade-1-13/

as you can see the kubelet is upgraded last and this process hasn't changed much since 1.11.

liggitt commented 5 years ago

Step 1 of https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade-1-13/#upgrade-the-control-plane does apt-get upgrade kubelet

neolit123 commented 5 years ago

that's a problem in the docs.

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

kubelet ugprade should be done after.

neolit123 commented 5 years ago

i will log an issue. :\

kvaps commented 5 years ago

Same problem after upgrading to v1.13, adding volumeMode: Filesystem to PVs does not making any change.

# kubectl patch pv local-pv-b6fb5339 -p '{".spec.volumeMode": "Filesystem"}'
persistentvolume/local-pv-b6fb5339 patched (no change)
liggitt commented 5 years ago

@kvaps what version is your apiserver at? Can you include the output of kubectl version?

kvaps commented 5 years ago

@kvaps what version is your apiserver at? Can you include the output of kubectl version?

@liggitt, my bad, one of my apiserver wasn't upgraded.

problem solved

liggitt commented 5 years ago

see also https://github.com/kubernetes/release/issues/3295

fejta-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 5 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 5 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 5 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/kubernetes/issues/71783#issuecomment-501798451): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.