When a Pod with a PV is moved to another node stuck in ContainerCreating a long time

diogo-reis commented 7 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug /kind feature

What happened:

When i move a Pod with the expression "nodeSelector:" to another Node of the cluster Kubernetes the Pod waiting 8 minutes in the "ContainerCreating" status.

ERRORs:
Warning FailedAttachVolume Multi-Attach error for volume "pvc-7ec40eec-949e-11e7-b96d-fa163ef575ff" Volume is already exclusively attached to one node and can't be attached to another

Multi-Attach error for volume "pvc-7ec40eec-949e-11e7-b96d-fa163ef575ff" (UniqueName: "kubernetes.io/cinder/ab54e390-cace-466f-8624-bdb270fa49ff") from node "knode3" Volume is already exclusively attached to one node and can't be attached to another

After 6 minutes the OpenStack Cinder Volume is attached to the selected node, and the Pod is inicialized. For an application this behavior is very time.

What you expected to happen:

It is expected that after the order of the movement of the POD to another node, the Cinder Volume will be moved to the selected node and Pod start quickly.

How to reproduce it (as minimally and precisely as possible):

Move a Pod with a Persistent Volume (OpenStack Cinder) to another node of the Kubernetes cluster.

Anything else we need to know?:

Log file kubelet: kubelet.txt

Log file kube-controller-manager: kube-controller-manager.txt

Environment:

Kubernetes version (use kubectl version): Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"269f928217957e7126dc87e6adfa82242bfe5b1e", GitTreeState:"clean", BuildDate:"2017-07-03T15:31:10Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-08-31T08:56:23Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration**: Openstack Mitaka
OS (e.g. from /etc/os-release): NAME="CentOS Linux" VERSION="7 (Core)" ID="centos"
Kernel (e.g. uname -a):

Linux knode2 3.10.0-514.16.1.el7.x86_64 #1 SMP Wed Apr 12 15:04:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Install tools:
Others:

diogo-reis commented 7 years ago

/sig storage cc @kubernetes/sig-storage-bugs

k8s-ci-robot commented 7 years ago

@diogo-reis: Reiterating the mentions to trigger a notification: @kubernetes/sig-storage-bugs

In response to [this](https://github.com/kubernetes/kubernetes/issues/53059#issuecomment-332257244): >/sig storage >cc @kubernetes/sig-storage-bugs Instructions for interacting with me using PR comments are available [here](https://github.com/kubernetes/community/blob/master/contributors/devel/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

diogo-reis commented 7 years ago

/sig openstack

wenlxie commented 7 years ago

@diogo-reis What's your k8s version?

diogo-reis commented 7 years ago

My k8s version is v1.7.5.

airstand commented 7 years ago

k8s version 1.8 .. on AWS:

Events:
  Type     Reason                 Age              From                                                  Message
  ----     ------                 ----             ----                                                  -------
  Warning  FailedScheduling       3m (x2 over 3m)  default-scheduler                                     PersistentVolumeClaim is not bound: "www-web-0" (repeated 9 times)
  Normal   Scheduled              2m               default-scheduler                                     Successfully assigned web-0 to ip-10-222-38-161.eu-west-1.compute.internal
  Normal   SuccessfulMountVolume  2m               kubelet, ip-10-222-38-161.eu-west-1.compute.internal  MountVolume.SetUp succeeded for volume "default-token-td6qx"
  Warning  FailedMount            2m               attachdetach                                          AttachVolume.Attach failed for volume "pvc-37038357-ad9c-11e7-9798-0a5f89d13502" : Error attaching EBS volume "vol-0c159c62175f227b6" to instance "i-0af5130e8d06ffb5a": "IncorrectState: vol-0c159c62175f227b6 is not 'available'.\n\tstatus code: 400, request id: 53305145-566b-457e-b1d2-83272f8fd889"
  Warning  FailedMount            2m               attachdetach                                          AttachVolume.Attach failed for volume "pvc-37038357-ad9c-11e7-9798-0a5f89d13502" : Error attaching EBS volume "vol-0c159c62175f227b6" to instance "i-0af5130e8d06ffb5a": "IncorrectState: vol-0c159c62175f227b6 is not 'available'.\n\tstatus code: 400, request id: cd609dca-2e74-48bd-a309-c90cf98df2a7"
  Normal   SuccessfulMountVolume  2m               kubelet, ip-10-222-38-161.eu-west-1.compute.internal  MountVolume.SetUp succeeded for volume "pvc-37038357-ad9c-11e7-9798-0a5f89d13502"

zetaab commented 7 years ago

I have seen this same on openstack, it takes quite many minutes to get it working. I have kube 1.8

stamak commented 7 years ago

I have the same problem on k8s 1.8.2 (OpenStack)

how to reproduce is described here https://github.com/kubernetes/kubernetes/issues/50004

nexeck commented 6 years ago

Got the same error. We modified the resource request/limits for one statefulset with 3 replicas. K8s moved one of the replicas to another node, which has enough resources, but the volume was still attached to the old node.

K8s version: v1.8.1+coreos.0 Running on AWS

Warning FailedAttachVolume 7m (x2987 over 12m) attachdetach Multi-Attach error for volume "pvc-4fe430e8-db4d-11e7-9931-02138f142c30" Volume is already exclusively attached to one node and can't be attached to another

zetaab commented 6 years ago

cinder detach is fixed here https://github.com/kubernetes/kubernetes/pull/56846 However, there is new ticket https://github.com/kubernetes/kubernetes/issues/58079 about pod failover times. This pod with pv failover times is same problem in all cloud providers not just openstack.

@nexeck AWS detach things are fixed in https://github.com/kubernetes/kubernetes/pull/55893 looks like its coming to kube 1.9

jingxu97 commented 6 years ago

@diogo-reis, sorry for the late reply. You mentioned you "When i move a Pod with the expression "nodeSelector:" to another Node ". Could you please confirm that the pod is first killed from node 1 and then started on node 2. Is node 1 still running?

andyzhangx commented 6 years ago

We got this "Multi-Attach error" also on Azure, in v1.9.6, we found volume in node.volumesInUse is not removed even after pod with that volume has already been moved from the node for a very long time, I filed another issue here: https://github.com/kubernetes/kubernetes/issues/62282

SuperMarioo commented 6 years ago

I'm getting the same issue on "Multi-Attach error" also on Azure, in v1.9.6 .

andyzhangx commented 6 years ago

hi @SuperMarioo, this issue "Multi-Attach error" on Azure is fixed in v1.9.7, PR: https://github.com/kubernetes/kubernetes/pull/62467 Pls follow below link to mitigate: https://github.com/andyzhangx/demo/blob/master/issues/azuredisk-issues.md#5-azure-disk-pvc-multi-attach-error-makes-disk-mount-very-slow-or-mount-failure-forever

bhack commented 6 years ago

What is the status of this on AWS/EBS? I'have the same problem on AWS with v1.9.3

zetaab commented 6 years ago

imo this "bug" exist in all volumetypes. If you have pod with pvc(any type, RWX types excluded) running in node1. You will shutdown that node1 -> the pod will start again in some another node but failovering(it will return that multi-attach error) volumes takes 6-10minutes because it will wait force detach.

Options: 1) I am thinking could we make this force detach time faster, it is currently 6 minutes. 2) Allow force detach for volume if node has shutdown taint (which was added in #60009, no cloudprovider support it yet). By using this for instance cinder failover times is something like 1minute.

bhack commented 6 years ago

Yes it seems that is general.

andyzhangx commented 6 years ago

@zetaab that's correct, on Azure, time cost of disk detach and attach to another node would be around 1 min, so Multi-Attach error within that 1 min is expected, while we found an issue specific in containerized kubelet that UnmountDevice process always fail which lead to disk detach on one node never succeeded on one node, in that case, we hit Multi-Attach error for hours...

dElogics commented 6 years ago

This's not a bug, this's expected behavior. You must not double mount with ReadWriteOnce policy; this's what Kubernetes is trying to avoid. However if the node which's down does not respond within 6 minutes (default), the volumes will be forced associated to the replacement pod.

Reference to this 'knowledge'.

Unfortunately in case of iscsi, the replacement pod always remains in ContainerCreating state which's problematic.

bhack commented 6 years ago

@dElogics I think that there is a subproblem in AWS about provisiong EBS volumes and switching nodes. If it is provisioned as /dev/ and then reattached to be exposed on /dev/nvme nodes like M5. I think it will fail also if it is attached correctly to the node.

AmazingTurtle commented 6 years ago

I'm experiencing the same issue on GKE 1.8.10-gke.0

msau42 commented 6 years ago

@AmazingTurtle can you describe in more detail what happened to the old node to cause your pod to be rescheduled? Did the node become NotReady, or was it upgraded, terminated or repaired?

adampl commented 6 years ago

In my case it became NotReady when I stopped the Docker service. I'm using Rancher 2.0 so everything (including kubelet) is containerized. It runs on 3 bare-metal Ubuntu 16.04 nodes, latest Docker CE and Kubernetes 1.11.1. I have a Deployment with 1 replica that mounts a Ceph RBD PVC in RWO mode.

As long as the node is NotReady, the volume is not detached from it, so it cannot be mounted by another pod on different node.

eBeyond commented 6 years ago

I've got the same issue with k8s 1.11.0 and ceph using dynamic provisioning. This issue also occures when I do a

kubectl apply -f deployment.yml

As such it's not possible to modify something without redeploying using delete/apply... :( (For me it took much longer than 6min)

jackzzj commented 6 years ago

I think I just experienced the same issue on AWS ...

dakleine commented 6 years ago

We are facing the same issue with k8s 1.9.8 and rbd volumes. But in our case the pod was just redeployed on another node due to changes viakubectl edit deployment ...

jingxu97 commented 6 years ago

@dakleine could you please provide what changes you made when edit deployment? What is the status of the old node? Thanks!

dakleine commented 6 years ago

@jingxu97 we only changed the image version of our postgresql deployment. the old node is ready.

christensen143 commented 6 years ago

This happening to me on AWS as well. I have a pod right now that has been stuck in "ContainerCreating" for 15 minutes. Any ideas what to do?

christensen143 commented 6 years ago

Warning FailedMount 7s (x7 over 13m) kubelet, ip-x-x-x-x.x-x-x.compute.internal Unable to mount volumes for pod "gitlab02-runner-678ffc74f4-m2w8m_build(7a473b7f-d23e-11e8-8cfe-0688ae24c2fe)": timeout expired waiting for volumes to attach/mount for pod "build"/"gitlab02-runner-678ffc74f4-m2w8m". list of unattached/unmounted volumes=[data-volume]

jingxu97 commented 6 years ago

@christensen143, looks like your data-volume is not attached. if you can access the kube-controller-manager log on the master, it should print out the detained message related to why it failed to attach the volume.

ncri commented 5 years ago

Just experienced the same on Digital Ocean. The pod is still in ContainerCreating after 13 mins...

postgres-deployment-77c874df64-k4hn9 0/1 ContainerCreating 0 13m

pittar commented 5 years ago

Same issue on AWS. Is there a way to "fix" the volume?

zetaab commented 5 years ago

@pittar you can try restarting all master controllers (one-by-one then you do not have downtime)

Anyway this is happening quite often (still) for us (openstack).

pittar commented 5 years ago

@zetaab , thanks for the suggestion. We are actually on OpenShift Dedicated (managed service), so I guess I'll put in a support ticket.

jingxu97 commented 5 years ago

@pittar could you please provide more details about your issue? In what situation your pod is stuck in "ContainerCreating"? Are you trying to delete your pod and then start it on another node? Thanks!

pittar commented 5 years ago

Hi @jingxu97 , since it's a managed service, I've submitted a support ticket, but I'll try to explain what happened here in case it benefits others.

Last night was a scheduled upgrade of our OpenShift cluster from 3.9 (k8s 1.9) to 3.11 (k8s 1.11). During the upgrade, pods would have been evacuated and re-created, so my guess is certain pods (like postgres) tried to re-attach to a PV (aws ebs) before the old pod had actually shut down. This seems to have left things in a strange state.

This morning, I tried killing the pods and restarting. The first error I would get form the pods was that the volumes were already attached to a different container (aws ebs are not RWX). After killing them again and trying to restart, I would get the timeout error similar to @christensen143 last comment.

One of the pods eventually re-attached (after I killed it and waited a good 10min before starting it again). Another that was in the same state still hasn't been able to start properly (still getting timeout trying to attach/mount the pv).

zetaab commented 5 years ago

@rootfs @jsafrane @thockin do you have guys idea how we could improve this situation? This volume mount problem has been problem for a long time. I have tried to solve this twice, but always storage or node sigs are saying that my solution is incorrect.

We have customer who is using cronjobs each 5 minute, and they do have volume in it as well. Well, you can imagine what will happen when you will ask volumes to mount every 5 minute, force detach time is 6 minutes. I think we can modify force detach time in cluster, but still it does not remove this problem. It seems that this volume mount problem is in all cloudproviders, sometimes it takes 5-20minutes to get volume in place. 20minutes is quite huge time if your application is running production.

edit: there is another issue for this #65392 (it might solve some of these issues)

zetaab commented 5 years ago

I compared kubernetes 1.9 and kube 1.13 with this cronjob with volume thing. In 1.13 volume mounts are working like should, 1.9 does not work correctly. So if you see problems I would say please update cluster first. Then we have problem left if node is shutdown / similar. That another ticket hopefully will solve that

MichaelOrtho commented 5 years ago

I have similar issue in DigitalOcean. If pod is scheduled for deployment on another node it will break as current node+pod are already linked and old pod will not detach before new one is attached.

FIX attempt 1: Add RollingUpdate maxUnavailable: 100% --> FAILED FIX attempt 2: Add FIX1 + add affinity to deploy pod only to one node --> SUCCESS

This means that you will have service for few seconds offline and you will not be able to use cluster nor to scale service to different nodes.

DigitalOcean volumes support only ReadWriteOnce as many others. That means that we need to find some better solution as deployment to one node and accepting downtime is not what Kubernetes is and it heavily undermines entire idea of persistent volumes.

Version: Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:31:33Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}

fejta-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 5 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 5 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 5 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/kubernetes/issues/53059#issuecomment-501387122): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

adipascu commented 5 years ago

I am still having this exact issue on version: v1.12.8 on Google Kubernetes Engine. It happens to me when I run kubectl apply -f app.yaml and make a pod recreate itself. My current fix is to run k delete -f app.yaml before to release the disk and to wait a bit before recreating the pod.

How is this still not fixed? Am I using Kubernetes incorrectly?

Edit: I think StatefulSet should solve this issue.

jonstelly commented 5 years ago

/reopen /remove-lifecycle rotten

k8s-ci-robot commented 5 years ago

@jonstelly: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/kubernetes/kubernetes/issues/53059#issuecomment-521725619): >/reopen >/remove-lifecycle rotten Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kvaps commented 4 years ago

Also affected for me, I'm using linstor-csi plugin

kvaps commented 4 years ago

Steps to reproduce:

Create StatefulSet
Run kubectl get pod -o wide | grep <statefulsetname> for find the node where is it running
kubectl cordon <node>
kubectl delete <podname>
Run kubectl get pod -o wide | grep <statefulsetname> for find the new node is it running Container will stuck for 10 minutes in ContainerCreating state

You also can check kubectl get volumeattachments.storage.k8s.io for track detaching/attaching process.

kubernetes version:

Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:12:17Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}

kubelet version: v1.16.4

csi version:

quay.io/k8scsi/csi-attacher:v1.1.1
quay.io/k8scsi/csi-cluster-driver-registrar:v1.0.1
quay.io/k8scsi/csi-node-driver-registrar:v1.2.0
quay.io/k8scsi/csi-provisioner:v1.3.0
quay.io/k8scsi/csi-snapshotter:v1.1.0

kvaps commented 4 years ago

this is new bug, reported here https://github.com/kubernetes/kubernetes/issues/86281

kubernetes / kubernetes

When a Pod with a PV is moved to another node stuck in ContainerCreating a long time #53059