kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
109.96k stars 39.35k forks source link

Node does not become NotReady with read only filesystem #99126

Open fbsb opened 3 years ago

fbsb commented 3 years ago

What happened:

We noticed on our clusters on bare-metal that nodes do not become NotReady after a read only remount by the kernel e.g. due to a filesystem corruption. This causes pods to get scheduled on the node but fail to start as the kubelet cannot create directories for the pod.

What you expected to happen:

Kubelet should notice that it cannot write to the filesystem and prevent further pods from being scheduled on the node.

How to reproduce it (as minimally and precisely as possible):

  1. install minikube and virtualbox
  2. minikube start --driver=virtualbox -n 2
    $ kubectl get nodes
    NAME           STATUS   ROLES                  AGE    VERSION
    minikube       Ready    control-plane,master   3m6s   v1.20.2
    minikube-m02   Ready    <none>                 119s   v1.20.2
  3. start a pod on a machine
    kubectk get pod                              
    NAME                       READY   STATUS    RESTARTS   AGE
    hello-1-657cb9b9f5-brbf4   1/1     Running   0          16s
  4. ssh into the worker node and trigger an emergency readonly remount (simulate a filesystem failure) then wait a few minutes
    minikube ssh --node minikube-m02
    echo u | sudo tee /proc/sysrq-trigger
  5. node stays ready and attracts new pods
    
    kubectl get node
    NAME           STATUS   ROLES                  AGE     VERSION
    minikube       Ready    control-plane,master   10m     v1.20.2
    minikube-m02   Ready    <none>                 9m12s   v1.20.2

kubectl get pod NAME READY STATUS RESTARTS AGE hello-1-657cb9b9f5-brbf4 1/1 Running 0 8m41s hello-2-7ddff58f66-6mgbm 0/1 ContainerCreating 0 16s

kubectl describe pod hello-2-7ddff58f66-6mgbm ... Events: Type Reason Age From Message


Normal Scheduled 33s default-scheduler Successfully assigned default/hello-2-7ddff58f66-6mgbm to minikube-m02 Warning Failed 9s (x3 over 33s) kubelet error making pod data directories: mkdir /var/lib/kubelet/pods/b7d540b3-c949-4fad-becc-76743a654467: read-only file system Warning FailedMount 1s (x7 over 33s) kubelet MountVolume.SetUp failed for volume "default-token-5fjs5" : mkdir /var/lib/kubelet/pods/b7d540b3-c949-4fad-becc-76743a654467: read-only file system



#### Anything else we need to know?:

#### Environment:
- Kubernetes version (use `kubectl version`): v1.20.2 (also reproducible in latest 1.18.x and 1.19.x)
- Cloud provider or hardware configuration: bare metal
- OS (e.g: `cat /etc/os-release`): Flatcar Container Linux by Kinvolk 2605.12.0 (Oklo) 
- Kernel (e.g. `uname -a`): 5.4.92-flatcar
sfudeus commented 3 years ago

/sig node

lyzs90 commented 3 years ago

We could act on the FailedToMakePodDataDirectories and FailedMountVolume events, https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet.go#L1696-L1710, and update the node ReadyCondition so pods will not be scheduled. But this should only be for a finite amount of time otherwise the node will not be allowed to recover.

Not sure if my assessment is on the right track, but if any work needs to be done here, I'd be happy to take it up :)

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

sfudeus commented 3 years ago

/remove-lifecycle stale

swatisehgal commented 3 years ago

/triage accepted /area kubelet /priority important-longterm

swatisehgal commented 3 years ago

We could act on the FailedToMakePodDataDirectories and FailedMountVolume events, https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet.go#L1696-L1710, and update the node ReadyCondition so pods will not be scheduled. But this should only be for a finite amount of time otherwise the node will not be allowed to recover.

Not sure if my assessment is on the right track, but if any work needs to be done here, I'd be happy to take it up :)

/assign @lyzs90 Your approach seems reasonable to me. Please go ahead with the implemention.

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

sfudeus commented 2 years ago

/remove-lifecycle rotten

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

vaibhav2107 commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sfudeus commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

vaibhav2107 commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

christophergertz commented 1 year ago

/remove-lifecycle rotten

tzneal commented 1 year ago

I just noticed that node-problem-detector can detect read-only filesystems as well. It appears to just look for the message "Remounting filesystem read-only" in the kernel log, which isn't necessarily coming from a relevant filesystem. It seems more useful to me to detect it directly in kubelet and mark the node as NotReady there.

k8s-triage-robot commented 7 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

sfudeus commented 3 months ago

/remove-lifecycle rotten

I still think that such a thing is necessary. If there is guidance to use NPD for this, then this is fine for me, then one could focus on improving the logic in NPD. Any opinions here?

k8s-triage-robot commented 1 week ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale