Open minrk opened 3 years ago
Update: resize failed in a way that looks like it's not going to recover, presumably due to an OVH cluster configuration error out of our control (cc @mael-le-gal):
$ kubectl describe pvc ovh-prometheus-server
Warning ExternalExpanding 27m volume_expand Ignoring the PVC: didn't find a plugin capable of expanding the volume; waiting for an external controller to process this PVC.
Warning VolumeResizeFailed 25m (x9 over 27m) external-resizer cinder.csi.openstack.org resize volume ovh-managed-kubernetes-wnucc3-pvc-3743fa2e-6b64-4286-91de-5294284f0952 failed: rpc error: code = Internal desc = Could not resize volume "dcad7bcc-f918-4a48-8d6d-610e0fc4f485" to size 50: Expected HTTP response code [202] when accessing [POST https://volume.compute.gra5.cloud.ovh.net/v3/2bc16af8026e45c6a34cd6c9c4c1703a/volumes/dcad7bcc-f918-4a48-8d6d-610e0fc4f485/action], but got 406 instead
{"computeFault": {"message": "Version 3.42 is not supported by the API. Minimum is 3.0 and maximum is 3.15.", "code": 406}}
Normal Resizing 21m (x10 over 27m) external-resizer cinder.csi.openstack.org External resizer is resizing volume ovh-managed-kubernetes-wnucc3-pvc-3743fa2e-6b64-4286-91de-5294284f0952
Normal FileSystemResizeRequired 21m external-resizer cinder.csi.openstack.org Require file system resize of volume on node
so deleted the pvc (kubectl delete pvc ovh-prometheus-server
), restarted the pod, and restarted deployment action. This means a loss of prometheus data on OVH, but that data is ephemeral anyway.
I don't have time to write a full incident report with grant deadlines rapidly approaching, but putting down some notes here first: