storageos / storageos.github.io

Public documentation for StorageOS, persistent storage for Docker and Kubernetes
https://docs.storageos.com
16 stars 17 forks source link

missing secret #286

Closed laser-dev-support closed 5 years ago

laser-dev-support commented 5 years ago

Hi all,

we are having some issue on more than one installation of StorageOS.

Installation We followed this installation guide: https://docs.storageos.com/docs/platforms/rancher/index

Infrastructure Rancher clean installation (2.2.7) on bare metal on both platforms. On one we have a single worker node, on the other 3 worker nodes. All the required network ports have been open.

Configuration

Issue After creating (from the Rancher UI) a PVC and using it for a while, a new one was created for a different namespace. Rancher could not finalise the deployment because of the PVC being ready. I checked the StorageOS UI and noticed that the volume was created with status healthy. I then proceeded to remove the deployment from Rancher and delete the volume from the StorageOS UI. It then turned into "decommissioned". In a last attempt, I also tried to reboot the cluster. This made also the first volume unavailable.

I tried to disable the encryption rule, but nothing changed. I checked the logs and I am flooded with these messages (rate of 1 per second): time="2019-09-03T08:19:10.428363523Z" level=error msg="dataplane configuration failed, could not get volume crypto key" action=create error="secrets \"vol-key.d07c42aa-05e4-2a0a-1e59-6707614ef584\" not found" id=d07c42aa-05e4-2a0a-1e59-6707614ef584-162625 inode=162625 module=director-volume replicas="[]" revision=281506 time="2019-09-03T08:19:10.428469525Z" level=warning msg="partial dataplane configuration applied - will retry" error="diff process encoutered errors: secrets \"vol-key.d07c42aa-05e4-2a0a-1e59-6707614ef584\" not found" module=statesync

do you have any idea on how to troubleshoot/fix this?

thanks a lot in advance Fabio

avestuk commented 5 years ago

@laser-dev-support that error message is reporting that the secret that holds the encryption key for a volume was not found. You can create an empty secret with the name vol-key.d07c42aa-05e4-2a0a-1e59-6707614ef584 to resolve the error temporarily so you can delete the volume. As the encryption key has been deleted anyway you can no longer read from that volume.

laser-dev-support commented 5 years ago

Thanks for the answer, do you have any idea why this happened in the first place? Anything I could do to avoid this happening again?

laser-dev-support commented 5 years ago

adding a further different line: time="2019-09-03T09:05:29.245382561Z" level=error msg="method failed" action=create category=volume endpoint="unix://var/lib/kubelet/plugins_registry/storageos/csi.sock" error="rpc error: code = Unknown desc = stat /var/lib/kubelet/plugins/kubernetes.io~storageos/devices/f4123256-eace-c585-2804-457140500116: no such file or directory" method=/csi.v1.Node/NodePublishVolume module=csi namespace=default size=6 volume=pvc-74fadae0-cb7b-11e9-8304-9600002c134d

avestuk commented 5 years ago

Thanks for the answer, do you have any idea why this happened in the first place? Anything I could do to avoid this happening again?

It happens when the volume crypto key is deleted, for a volume that still exists, and StorageOS is restarted. The volume crypto key exists as a kubernetes secret in the namespace that StorageOS is installed into.

Can you provide some more context on the further line?

laser-dev-support commented 5 years ago

Hi, I created a secret with named vol-key.d07c42aa-05e4-2a0a-1e59-6707614ef584 `

kubectl get secrets --namespace=storageos-operator NAME TYPE DATA AGE default-token-ddjgc kubernetes.io/service-account-token 3 7d21h storageos-api kubernetes.io/storageos 2 7d21h storageos-operator-sa-token-86l5w kubernetes.io/service-account-token 3 7d21h vol-key.d07c42aa-05e4-2a0a-1e59-6707614ef584 kubernetes.io/storageos 0 57s ` is this correct? do I have to restart anything?

thanks a lot in advance fabio

avestuk commented 5 years ago

@laser-dev-support You'll want to create the secret in the namespace that StorageOS is installed into. Most likely that's storageos but you can find out with kubectl get pods --all-namespaces -l app=storageos

avestuk commented 5 years ago

I believe that creating the secret will solve the issue without requiring a restart but check the logs of the StorageOS pod to be sure.

laser-dev-support commented 5 years ago

Hi there, unfortunately that didn't help. I could make everything work again by installing the storageos cli and forcing the removal of the volume with the missing key.

thanks a lot again