Recreation of a host instance for a scylla K8S node causes creation of duplicate PVs for a scylla pod

vponomaryov commented 3 years ago

Describe the bug If we delete K8S node and it's host instance then mark pod with replacement label we get new scylla pod which doesn't find existing PV and gets newly created one for first several times. But after several such attempts we get following list of PVs:

$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                                                 STORAGECLASS       REASON   AGE
local-pv-156ef1e5                          1124Gi     RWO            Delete           Released   scylla/data-sct-cluster-us-east1-b-us-east1-2                         local-raid-disks            5h45m
local-pv-1d1cc3d0                          1124Gi     RWO            Delete           Released   scylla/data-sct-cluster-us-east1-b-us-east1-2                         local-raid-disks            5h58m
local-pv-6d542dd6                          1124Gi     RWO            Delete           Bound      scylla/data-sct-cluster-us-east1-b-us-east1-0                         local-raid-disks            7h43m
local-pv-87cd61eb                          1124Gi     RWO            Delete           Released   scylla/data-sct-cluster-us-east1-b-us-east1-2                         local-raid-disks            6h55m
local-pv-9b2a1931                          1124Gi     RWO            Delete           Released   scylla/data-sct-cluster-us-east1-b-us-east1-2                         local-raid-disks            6h21m
local-pv-a21f039d                          1124Gi     RWO            Delete           Released   scylla/data-sct-cluster-us-east1-b-us-east1-2                         local-raid-disks            5h8m
local-pv-bae562fc                          1124Gi     RWO            Delete           Bound      scylla/data-sct-cluster-us-east1-b-us-east1-1                         local-raid-disks            7h43m
pvc-280b3443-3875-4f4e-9c69-ac44d4a51c21   10Gi       RWO            Delete           Bound      scylla-manager-system/data-scylla-manager-manager-dc-manager-rack-0   standard                    7h52m
pvc-c789e35e-e7b1-445f-9999-33528aa32bd4   10Gi       RWO            Delete           Bound      minio/minio                                                           standard                    7h53m

And PVC for it:

$ kubectl describe pvc data-sct-cluster-us-east1-b-us-east1-2 -n scylla     
Name:          data-sct-cluster-us-east1-b-us-east1-2
Namespace:     scylla
StorageClass:  local-raid-disks
Status:        Pending
Volume:        
Labels:        app=scylla
               app.kubernetes.io/managed-by=scylla-operator
               app.kubernetes.io/name=scylla
               scylla/cluster=sct-cluster
               scylla/datacenter=us-east1-b
               scylla/rack=us-east1
Annotations:   <none>
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Used By:       sct-cluster-us-east1-b-us-east1-2
Events:
  Type    Reason               Age                       From                         Message
  ----    ------               ----                      ----                         -------
  Normal  WaitForPodScheduled  4m56s (x1181 over 4h59m)  persistentvolume-controller  waiting for pod sct-cluster-us-east1-b-us-east1-2 to be scheduled

And failed pod:

$ kubectl describe pod sct-cluster-us-east1-b-us-east1-2 -n scylla    
Name:           sct-cluster-us-east1-b-us-east1-2
Namespace:      scylla
Priority:       0
Node:           <none>
Labels:         app=scylla
                app.kubernetes.io/managed-by=scylla-operator
                app.kubernetes.io/name=scylla
                controller-revision-hash=sct-cluster-us-east1-b-us-east1-849f494fcb
                scylla/cluster=sct-cluster
                scylla/datacenter=us-east1-b
                scylla/rack=us-east1
                statefulset.kubernetes.io/pod-name=sct-cluster-us-east1-b-us-east1-2
Annotations:    kubectl.kubernetes.io/restartedAt: 2021-06-16T13:50:37Z
                prometheus.io/port: 9180
                prometheus.io/scrape: true
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  StatefulSet/sct-cluster-us-east1-b-us-east1

...

Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-sct-cluster-us-east1-b-us-east1-2
    ReadOnly:   false
  shared:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  scylla-config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      scylla-confing
    Optional:  true
  scylla-agent-config-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  scylla-agent-config
    Optional:    true
  scylla-client-config-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  scylla-client-config-secret
    Optional:    true
  scylla-agent-auth-token-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  sct-cluster-auth-token
    Optional:    false
  sct-cluster-member-token-nvppq:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  sct-cluster-member-token-nvppq
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                      From               Message
  ----     ------            ----                     ----               -------
  Warning  FailedScheduling  5m17s (x200 over 4h58m)  default-scheduler  0/8 nodes are available: 2 node(s) didn't find available persistent volumes to bind, 6 Insufficient cpu, 6 Insufficient memory.

To Reproduce Steps to reproduce the behavior:

Deploy scylla operator
Create scylla cluster with 3 members or more
Delete one of K8S node and then it's host instance
Label orphaned scylla member for replacement
Wait while new scylla pod comes up
See list of PVs and how it is used

Expected behavior PV that was part of deleted host instance must be reused. No new PVs must be created.

Config Files If relevant, upload your configuration files here using GitHub, there is no need to upload them to any 3rd party services

Logs kubernetes-c01abadb.tar.gz

Environment:

Platform: GKE
Kubernetes version: 1.19.9-gke.1400
Scylla version: 4.4.2
Scylla-operator version: e.g.: v1.3.0-beta.0-0-gc4dcd8d

Additional context Add any other context about the problem here.

tnozicka commented 3 years ago

PV that was part of deleted host instance must be reused.

If you delete a node with local storage the a new PV must be created on some other node.

I am not sure why there are 5 released PVs though.

tnozicka commented 3 years ago

we have landed big changes since this was reported (in #534), are you still experiencing the issue?

vponomaryov commented 3 years ago

we have landed big changes since this was reported (in #534), are you still experiencing the issue?

Need to reverify it. The automation part we use for it is skipped in general due to the bug.

vponomaryov commented 3 years ago

Trying to verify it faced another bug: https://github.com/scylladb/scylla-operator/issues/687

tnozicka commented 3 years ago

this should be unblocked now, please try again

tnozicka commented 3 years ago

the old PVs could be an issue with the provisioner moving this to 1.5 and we can backport if it proves to be an issue with the operator

vponomaryov commented 3 years ago

this should be unblocked now, please try again

@tnozicka , I reproduced it using latest, for now, operator v1.4.0-alpha.0-87-g4bc9b0c: kubernetes-45cabb2c.tar.gz In this case I used EKS. Number of redundant PVs is number of scylla members recreations:

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                                          STORAGECLASS       REASON   AGE     VOLUMEMODE
local-pv-1121913b                          3537Gi     RWO            Delete           Released    scylla/data-sct-cluster-us-east1-b-us-east1-1                  local-raid-disks            153m    Filesystem
local-pv-19b585e4                          3537Gi     RWO            Delete           Released    scylla/data-sct-cluster-us-east1-b-us-east1-0                  local-raid-disks            127m    Filesystem
local-pv-19c7224d                          3537Gi     RWO            Delete           Released    scylla/data-sct-cluster-us-east1-b-us-east1-0                  local-raid-disks            33m     Filesystem
local-pv-1cfddd87                          3537Gi     RWO            Delete           Released    scylla/data-sct-cluster-us-east1-b-us-east1-2                  local-raid-disks            101m    Filesystem
local-pv-30a26c32                          3537Gi     RWO            Delete           Bound       scylla/data-sct-cluster-us-east1-b-us-east1-0                  local-raid-disks            9m5s    Filesystem
local-pv-36bd02ed                          3537Gi     RWO            Delete           Released    scylla/data-sct-cluster-us-east1-b-us-east1-0                  local-raid-disks            76m     Filesystem
local-pv-4f046023                          3537Gi     RWO            Delete           Bound       scylla/data-sct-cluster-us-east1-b-us-east1-2                  local-raid-disks            16m     Filesystem
local-pv-6d7001ab                          3537Gi     RWO            Delete           Released    scylla/data-sct-cluster-us-east1-b-us-east1-2                  local-raid-disks            49m     Filesystem
local-pv-86081c4d                          3537Gi     RWO            Delete           Released    scylla/data-sct-cluster-us-east1-b-us-east1-2                  local-raid-disks            172m    Filesystem
local-pv-9106241e                          3537Gi     RWO            Delete           Released    scylla/data-sct-cluster-us-east1-b-us-east1-1                  local-raid-disks            165m    Filesystem
local-pv-9c0cd8b5                          3537Gi     RWO            Delete           Released    scylla/data-sct-cluster-us-east1-b-us-east1-2                  local-raid-disks            3h43m   Filesystem
local-pv-9df96263                          3537Gi     RWO            Delete           Bound       scylla/data-sct-cluster-us-east1-b-us-east1-1                  local-raid-disks            85m     Filesystem
local-pv-badf726e                          3537Gi     RWO            Delete           Released    scylla/data-sct-cluster-us-east1-b-us-east1-0                  local-raid-disks            59m     Filesystem
local-pv-c67780b                           3537Gi     RWO            Delete           Released    scylla/data-sct-cluster-us-east1-b-us-east1-1                  local-raid-disks            111m    Filesystem
local-pv-d3cda7a                           3537Gi     RWO            Delete           Released    scylla/data-sct-cluster-us-east1-b-us-east1-0                  local-raid-disks            137m    Filesystem
local-pv-ebb69c04                          3537Gi     RWO            Delete           Available                                                                  local-raid-disks            7m30s   Filesystem
local-pv-f545bc86                          3537Gi     RWO            Delete           Released    scylla/data-sct-cluster-us-east1-b-us-east1-0                  local-raid-disks            163m    Filesystem
local-pv-f72642c1                          3537Gi     RWO            Delete           Released    scylla/data-sct-cluster-us-east1-b-us-east1-1                  local-raid-disks            3h43m   Filesystem
pvc-70b73d04-27df-401d-a8f1-dac079290045   10Gi       RWO            Delete           Bound       scylla-manager/data-scylla-manager-manager-dc-manager-rack-0   gp2                         3h45m   Filesystem
pvc-7d3a34bc-78b1-432b-9980-511262cda52f   10Gi       RWO            Delete           Bound       minio/minio                                                    gp2                         3h45m   Filesystem

scylla-operator-bot[bot] commented 4 weeks ago

The Scylla Operator project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 30d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out

/lifecycle stale

scylladb / scylla-operator

Recreation of a host instance for a scylla K8S node causes creation of duplicate PVs for a scylla pod #643