openebs-archive / cstor-operators

Collection of OpenEBS cStor Data Engine Operators
https://openebs.io
Apache License 2.0
94 stars 69 forks source link

stuck at FailedMount : UnmountUnderProgress #283

Open survivant opened 3 years ago

survivant commented 3 years ago

This morning I have again this error :

 Warning  FailedMount  2m1s (x3 over 6m5s)  kubelet            MountVolume.MountDevice failed for volume "pvc-a8f02b71-52ae-4afb-816a-cfc8a5e4a1de" : rpc error: code = Internal desc = Volume pvc-a8f02b71-52ae-4afb-816a-cfc8a5e4a1de Busy, status: UnmountUnderProgress

It's the same scenario again. I did a helm install myapp. it was working for 7 days. This morning the database crached, so I did a : helm delete myapp. Waited that all the pods were completly removed. After that I did again : helm install myapp. Not now the database pod is stucked at : UnmountUnderProgress. We never found how to fix that issue.

I have trouble telling my dev team that I have no idea what going on. I try also what was suggested last time.. but I still have that error when I try the command with zpool..

root@cspc-iep-mirror-szdj-67659c8c66-jztlt:/# zpool status
  pool: cstor-e31e3b4a-78a9-4a92-b2a8-7fef79194a77
 state: ONLINE
  scan: none requested
config:
        NAME                                             STATE     READ WRITE CKSUM
        cstor-e31e3b4a-78a9-4a92-b2a8-7fef79194a77       ONLINE       0     0     0
          mirror-0                                       ONLINE       0     0     0
            scsi-0ATA_ST8000VE000-2P61_WKD3G868-part1    ONLINE       0     0     0
            scsi-1ATA_ST8000VE000-2P6101_WKD3EQ63-part1  ONLINE       0     0     0
errors: No known data errors
root@cspc-iep-mirror-szdj-67659c8c66-jztlt:/# zpool scrub cstor-e31e3b4a-78a9-4a92-b2a8-7fef79194a77
cannot scrub cstor-e31e3b4a-78a9-4a92-b2a8-7fef79194a77: operation not supported on this type of pool
root@cspc-iep-mirror-szdj-67659c8c66-jztlt:/#

What I should do next ?

I have openebs 2.7.0

survivant commented 3 years ago
root@test-pcl4004:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4", GitCommit:"e87da0bd6e03ec3fea7933c4b5263d151aafd07c", GitTreeState:"clean", BuildDate:"2021-02-18T16:12:00Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"6b1d87acf3c8253c123756b9e61dac642678305f", GitTreeState:"clean", BuildDate:"2021-03-18T01:02:01Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
root@test-pcl4004:~#
survivant commented 3 years ago

I deleted again my application. waited for 10 minutes after that all my resources were removed. After that I reinstalled them and now I have this error

Events:
  Type     Reason       Age                  From               Message
  ----     ------       ----                 ----               -------
  Normal   Scheduled    3m22s                default-scheduler  Successfully assigned default/production-manager-mariadb-0 to test-pcl4010
  Warning  FailedMount  80s                  kubelet            Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data config default-token-jgzp8]: timed out waiting for the condition
  Warning  FailedMount  74s (x9 over 3m22s)  kubelet            MountVolume.MountDevice failed for volume "pvc-a8f02b71-52ae-4afb-816a-cfc8a5e4a1de" : rpc error: code = Internal desc = rpc error: code = Internal desc = cstorvolumeconfigs.cstor.openebs.io "pvc-a8f02b71-52ae-4afb-816a-cfc8a5e4a1de" not found
root@test-pcl4004:~#
survivant commented 3 years ago
root@test-pcl4004:~# kubectl -n openebs get all | grep pvc-a8f02b71-52ae-4afb-816a-cfc8a5e4a1de
root@test-pcl4004:~#
survivant commented 3 years ago

I did that

kubectl -n openebs delete pods --all --wait=false --grace-period=0

and now I have this

Events:
  Type     Reason       Age                 From               Message
  ----     ------       ----                ----               -------
  Normal   Scheduled    13m                 default-scheduler  Successfully assigned default/production-manager-mariadb-0 to test-pcl4010
  Warning  FailedMount  6m55s               kubelet            Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[default-token-jgzp8 data config]: timed out waiting for the condition
  Warning  FailedMount  4m40s               kubelet            Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[config default-token-jgzp8 data]: timed out waiting for the condition
  Warning  FailedMount  73s (x14 over 13m)  kubelet            MountVolume.MountDevice failed for volume "pvc-a8f02b71-52ae-4afb-816a-cfc8a5e4a1de" : rpc error: code = Internal desc = rpc error: code = Internal desc = cstorvolumeconfigs.cstor.openebs.io "pvc-a8f02b71-52ae-4afb-816a-cfc8a5e4a1de" not found
  Warning  FailedMount  6s (x4 over 11m)    kubelet            Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data config default-token-jgzp8]: timed out waiting for the condition
root@test-pcl4004:~#
survivant commented 3 years ago

I found a solution.. don'T like it..

uninstall the application. DELETE THE PVC and redeploy the application. but.. when we do that.. we lose all the data :(