kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
110.52k stars 39.51k forks source link

PVC/PV not unbound when statefulset is destroyed #50312

Closed Stono closed 6 years ago

Stono commented 7 years ago

Hi, I recently deleted a statefulset, and then recreated it - and it wouldn't start due to a timeout waiting for the storage to complete (GKE, Kubernetes 1.7.2).

The statefulset:

$ kg describe pod gocd-master-0
Name:       gocd-master-0
Namespace:  gocd
Node:       gke-peopledata-preprod-default-pool-4847aa8e-nfb2/10.34.96.2
Start Time: Tue, 08 Aug 2017 12:20:24 +0000
Labels:     app=master
        controller-revision-hash=gocd-master-32721748
        tier=gocd
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"StatefulSet","namespace":"gocd","name":"gocd-master","uid":"7832de11-7c30-11e7-8837-42010a8400ee","apiVers...
Status:     Pending
IP:
Created By: StatefulSet/gocd-master
Controlled By:  StatefulSet/gocd-master
Containers:
  master:
    Container ID:
    Image:      eu.gcr.io/peopledata-product-team/kube-gocd-master:latest
    Image ID:
    Ports:      8153/TCP, 8154/TCP
    State:      Waiting
      Reason:       ContainerCreating
    Ready:      False
    Restart Count:  0
    Environment:
      AGENT_AUTO_REGISTER_KEY:  <set to the key 'agent_key' in secret 'kube-gocd'>  Optional: false
      GO_USERNAME:      <set to the key 'user' in secret 'kube-gocd'>       Optional: false
      GO_PASSWORD:      <set to the key 'pass' in secret 'kube-gocd'>       Optional: false
    Mounts:
      /godata from gocd-master-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9n0qw (ro)
Conditions:
  Type      Status
  Initialized   True
  Ready     False
  PodScheduled  True
Volumes:
  gocd-master-data:
    Type:   PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  gocd-master-data-gocd-master-0
    ReadOnly:   false
  default-token-9n0qw:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-9n0qw
    Optional:   false
QoS Class:  BestEffort
Node-Selectors: <none>
Tolerations:    node.alpha.kubernetes.io/notReady:NoExecute for 300s
        node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
  FirstSeen LastSeen    Count   From                                SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----                                -------------   --------    ------          -------
  2m        2m      1   default-scheduler                               Normal      Scheduled       Successfully assigned gocd-master-0 to gke-peopledata-preprod-default-pool-4847aa8e-nfb2
  2m        2m      1   kubelet, gke-peopledata-preprod-default-pool-4847aa8e-nfb2          Normal      SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "default-token-9n0qw"
  51s       51s     1   kubelet, gke-peopledata-preprod-default-pool-4847aa8e-nfb2          Warning     FailedMount     Unable to mount volumes for pod "gocd-master-0_gocd(f4a02b5d-7c33-11e7-8837-42010a8400ee)": timeout expired waiting for volumes to attach/mount for pod "gocd"/"gocd-master-0". list of unattached/unmounted volumes=[gocd-master-data]
  51s       51s     1   kubelet, gke-peopledata-preprod-default-pool-4847aa8e-nfb2          Warning     FailedSync      Error syncing pod

The PVC (showing as bound, if i delete the SS, it still shows as bound):

$ kg get pvc
NAME                                              STATUS    VOLUME                                     CAPACITY   ACCESSMODES   STORAGECLASS   AGE
gocd-master-data-gocd-master-0                    Bound     pvc-7837cc0e-7c30-11e7-8837-42010a8400ee   75Gi       RWO           standard       36m

The PV:

$ kg get pv
NAME                                       CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM                                                  
pvc-7837cc0e-7c30-11e7-8837-42010a8400ee   75Gi       RWO           Delete          Bound     gocd/gocd-master-data-gocd-master-0                    standard                 36m

The disk:

$ gcloud compute disks describe gke-peopledata-preprod-pvc-7837cc0e-7c30-11e7-8837-42010a8400ee --zone=europe-west1-d
creationTimestamp: '2017-08-08T04:55:28.855-07:00'
description: '{"kubernetes.io/created-for/pv/name":"pvc-7837cc0e-7c30-11e7-8837-42010a8400ee","kubernetes.io/created-for/pvc/name":"gocd-master-data-gocd-master-0","kubernetes.io/created-for/pvc/namespace":"gocd"}'
id: '8460603997884258399'
kind: compute#disk
labelFingerprint: 42WmSpB8rSM=
lastAttachTimestamp: '2017-08-08T04:55:36.139-07:00'
lastDetachTimestamp: '2017-08-08T05:26:30.662-07:00'
name: gke-peopledata-preprod-pvc-7837cc0e-7c30-11e7-8837-42010a8400ee
selfLink: https://www.googleapis.com/compute/v1/projects/peopledata-product-team/zones/europe-west1-d/disks/gke-peopledata-preprod-pvc-7837cc0e-7c30-11e7-8837-42010a8400ee
sizeGb: '75'
status: READY
type: https://www.googleapis.com/compute/v1/projects/peopledata-product-team/zones/europe-west1-d/diskTypes/pd-standard
zone: https://www.googleapis.com/compute/v1/projects/peopledata-product-team/zones/europe-west1-d

I've left it a good 10 minutes now to see if it fixes itself, but it hasn't, and now i'm unable to start the statefulset.

Stono commented 7 years ago

@kubernetes/sig-storage-bug

Stono commented 7 years ago

OK, it took about 20 minutes but it finally cleared the lock... is that normal?

xiangpengzhao commented 7 years ago

/sig storage

bobsongplus commented 7 years ago

same issue, kubernetes1.6.4

msau42 commented 7 years ago

@Stono, @TinySong do you still see this issue?

Can you check kubelet and controller manager logs for error messages related to these volumes?

msau42 commented 6 years ago

/assign

msau42 commented 6 years ago

Please reopen if you still see the issue

/close