openebs-archive / cstor-operators

Collection of OpenEBS cStor Data Engine Operators
https://openebs.io
Apache License 2.0
94 stars 69 forks source link

cStor using the removed APIs in k8s 1.25 requires changes #435

Open Ab-hishek opened 2 years ago

Ab-hishek commented 2 years ago

Problem Description

When creating application with cStor provisioned volume(3 replicas), app gets stuck in container creating state.

Environment details: Kubeadm based 4-node(1 master & 3 workers) cluster with K8s 1.25 version:

[root@k8s-master-640 ~]# kubectl get nodes
NAME             STATUS   ROLES           AGE   VERSION
k8s-master-640   Ready    control-plane   20h   v1.25.0
k8s-node1-641    Ready    <none>          19h   v1.25.0
k8s-node2-642    Ready    <none>          19h   v1.25.0
k8s-node3-643    Ready    <none>          19h   v1.25.0

Each node is having 3 disks attached to it.

Steps followed to create a cStor volume:

  1. Created a CSPC using the 3 disks on all the 3 worker nodes.
  2. CSPC created successfully with the provisioned == desired instances(CSPI) and the pool pods are also in running state.
  3. Created a cStor volume with 3 replicas mentioned in the StorageClass.
  4. PVC gets bounds to its respective PV.
  5. CVR are created and all are in healthy state
  6. Deployed an application with the above created PVC.

Describe of the application pod:

Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  7m54s                  default-scheduler  0/4 nodes are available: 4 pod has unbound immediate PersistentVolumeClaims. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
  Normal   Scheduled         7m52s                  default-scheduler  Successfully assigned default/wordpress-5fb7bff8dd-csqrb to k8s-node1-641
  Warning  FailedMount       2m3s (x10 over 7m43s)  kubelet            MountVolume.MountDevice failed for volume "pvc-14297415-5f2a-406f-bf8b-87a1a5006742" : rpc error: code = Internal desc = Waiting for pvc-14297415-5f2a-406f-bf8b-87a1a5006742's CVC to be bound
  Warning  FailedMount       77s (x3 over 5m50s)    kubelet            Unable to attach or mount volumes: unmounted volumes=[wordpress-persistent-storage], unattached volumes=[wordpress-persistent-storage kube-api-access-zwkx9]: timed out waiting for the condition'

Describe of CVC:

Events:
  Type     Reason        Age                     From                         Message
  ----     ------        ----                    ----                         -------
  Warning  Provisioning  8m22s (x4 over 8m40s)   cstorvolumeclaim-controller  failed to create PDB for volume: pvc-14297415-5f2a-406f-bf8b-87a1a5006742: failed to list PDB belongs to pools with selector openebs.io/cstor-disk-pool-ffvp=true,openebs.io/cstor-disk-pool-l2fb=true,openebs.io/cstor-disk-pool-54zn=true: the server could not find the requested resource
  Warning  Provisioning  4m47s (x4 over 8m36s)   cstorvolumeclaim-controller  failed to create PDB for volume: pvc-14297415-5f2a-406f-bf8b-87a1a5006742: failed to list PDB belongs to pools with selector openebs.io/cstor-disk-pool-l2fb=true,openebs.io/cstor-disk-pool-54zn=true,openebs.io/cstor-disk-pool-ffvp=true: the server could not find the requested resource
  Warning  Provisioning  3m17s (x18 over 8m42s)  cstorvolumeclaim-controller  failed to create PDB for volume: pvc-14297415-5f2a-406f-bf8b-87a1a5006742: failed to list PDB belongs to pools with selector openebs.io/cstor-disk-pool-54zn=true,openebs.io/cstor-disk-pool-ffvp=true,openebs.io/cstor-disk-pool-l2fb=true: the server could not find the requested resource

Logs from one of the pool pods:

I0907 06:52:21.373440       8 event.go:282] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn", UID:"7f1d146f-4c2c-4a91-a3b0-9b0500867ce1", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"138978", FieldPath:""}): type: 'Normal' reason: 'Synced' Received Resource create event
I0907 06:52:21.389429       8 handler.go:226] will process add event for cvr {pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn} as volume {cstor-fb027a66-716a-4abf-b643-3a336cc3da6a/pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736}
I0907 06:52:21.393542       8 handler.go:572] cVR 'pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn': uid '7f1d146f-4c2c-4a91-a3b0-9b0500867ce1': phase 'Init': is_empty_status: false
I0907 06:52:21.393557       8 handler.go:584] cVR pending: 7f1d146f-4c2c-4a91-a3b0-9b0500867ce1
2022-09-07T06:52:21.527Z        INFO    volumereplica/volumereplica.go:308              {"eventcode": "cstor.volume.replica.create.success", "msg": "Successfully created CStor volume replica", "rname": "cstor-fb027a66-716a-4abf-b643-3a336cc3da6a/pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736"}
I0907 06:52:21.527245       8 handler.go:469] cVR creation successful: pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn, 7f1d146f-4c2c-4a91-a3b0-9b0500867ce1
I0907 06:52:21.527559       8 event.go:282] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn", UID:"7f1d146f-4c2c-4a91-a3b0-9b0500867ce1", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"138980", FieldPath:""}): type: 'Normal' reason: 'Created' Resource created successfully
I0907 06:52:21.538547       8 event.go:282] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn", UID:"7f1d146f-4c2c-4a91-a3b0-9b0500867ce1", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"138980", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-fb027a66-716a-4abf-b643-3a336cc3da6a/pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736 with err 11
Error: exit status 11
I0907 06:52:21.563031       8 event.go:282] Event(v1.ObjectReference{Kind:"CStorVolumeReplica", Namespace:"openebs", Name:"pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736-cstor-disk-pool-54zn", UID:"7f1d146f-4c2c-4a91-a3b0-9b0500867ce1", APIVersion:"cstor.openebs.io/v1", ResourceVersion:"139013", FieldPath:""}): type: 'Warning' reason: 'SyncFailed' failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-fb027a66-716a-4abf-b643-3a336cc3da6a/pvc-5a9e63ce-1c6d-4c53-bb7f-dd4782360736 with err 11
Error: exit status 11

How to solve

Upon debugging, found out that cStor operators is using v1beta1 version of PodDisruptBudegt object in its codebase which was deprecated in K8s 1.21 version and is completely removed in K8s 1.25 version.

We need upgrade the usage version of PodDisruptBudegt to v1 in the codebase to enable cStor to work in K8s 1.25 or later versions

ThomasBuchinger commented 2 years ago

In case someone find this issue: The fix is already merged and will be release with 3.4.0 (?) https://github.com/openebs/cstor-operators/pull/436

Godfunc commented 2 years ago

I test it. It was supported in v3.4.x

jadsy2107 commented 2 years ago

I can confirm that the issue was resolved by using 3.4.0 - 3.3.0 failed and the minute i upgraded the deployments and all reference to the 3.3.0 image to be 3.4.0 - everything worked.

willzhang commented 1 year ago

waitting v3.4.0 helm charts, but anyway deploy 3.4.0 now?

jadsy2107 commented 1 year ago

No need to wait, https://github.com/openebs/velero-plugin/issues/183

Read that through and see last comments for cstor-operator that works :)


From: will @.> Sent: Thursday, November 17, 2022 3:56:08 AM To: openebs/cstor-operators @.> Cc: Jad Seifeddine @.>; Comment @.> Subject: Re: [openebs/cstor-operators] cStor using the removed APIs in k8s 1.25 requires changes (Issue #435)

waitting v3.4.0 helm charts

— Reply to this email directly, view it on GitHubhttps://github.com/openebs/cstor-operators/issues/435#issuecomment-1317346917, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AZR4BBMKYPJSPRVTARGNEDDWIUG2RANCNFSM6AAAAAAQIMSKOU. You are receiving this because you commented.Message ID: @.***>

jadsy2107 commented 1 year ago

https://github.com/openebs/velero-plugin/issues/183#issuecomment-1317675988

mmelyp commented 1 year ago

any plan to release helm chart 3.4.0?