Open phoenix-bjoern opened 2 years ago
My question is why the storage pool should be deleted? The only way this is triggered is if operator.satelliteSet.storagePools
was modified (removed entries).
There are no additional checks happening in the operator, so it just tries to delete the storage pool, even if there are resources or snapshots still present. Note that this does include simple replicas, even if the drbd device is not actively mounted.
So my advise would be:
LinstorSatelliteSet
resource has the expected storage pools
Thanks @WanzenBug for the fast reply. Actually neither storage pool nor the list of nodes has changed. And these errors occur on almost all our clusters which have been updated to Piraeus 1.7.
Here are screenshots from the linstor cmd output:
The nodes and store pools look good and the storage pool is also referenced correctly for the resources.
I've checked the output of kubectl get LinstorSatelliteSet.piraeus.linbit.com piraeus-op-ns
. Is it maybe a problem that the storage pools haven't be announced in the values file for the new CRD configuration? In the output the storagePools are empty (because we haven't declared it on the helm upgrade) but the SatelliteStatus OFC lists the storage pools which have been created on the first Linstor deployment on the cluster:
sslSecret: null
storagePools:
lvmPools: []
lvmThinPools: []
zfsPools: []
tolerations: []
status:
SatelliteStatuses:
- connectionStatus: ONLINE
nodeName: de-fra-node10
registeredOnController: true
storagePoolStatus:
- freeCapacity: 9223372036854775807
name: DfltDisklessStorPool
nodeName: de-fra-node10
provider: DISKLESS
totalCapacity: 9223372036854775807
- freeCapacity: 1132126535
name: lvm-thin
nodeName: de-fra-node10
provider: LVM_THIN
totalCapacity: 1677721600
- connectionStatus: ONLINE
nodeName: de-fra-node8
registeredOnController: true
storagePoolStatus:
- freeCapacity: 9223372036854775807
name: DfltDisklessStorPool
nodeName: de-fra-node8
provider: DISKLESS
totalCapacity: 9223372036854775807
- freeCapacity: 1132126535
name: lvm-thin
nodeName: de-fra-node8
provider: LVM_THIN
totalCapacity: 1677721600
- connectionStatus: ONLINE
nodeName: de-fra-node9
registeredOnController: true
storagePoolStatus:
- freeCapacity: 9223372036854775807
name: DfltDisklessStorPool
nodeName: de-fra-node9
provider: DISKLESS
totalCapacity: 9223372036854775807
- freeCapacity: 1132126535
name: lvm-thin
nodeName: de-fra-node9
provider: LVM_THIN
totalCapacity: 1677721600
errors:
- "Message: 'The specified storage pool 'lvm-thin' on node 'de-fra-node9' can not
be deleted as volumes / snapshot-volumes are still using it.'; Details: 'Volumes
/ snapshot-volumes that are still using the storage pool: \n Node name: 'de-fra-node9',
resource name: 'pvc-1e59589f-e04e-4aee-a1c6-0561a764a7e8', volume number: 0\n
\ Node name: 'de-fra-node9', resource name: 'pvc-2c8ea040-9651-4501-a0d9-7b3920c82ec8',
volume number: 0\n Node name: 'de-fra-node9', resource name: 'pvc-4506c792-6fbb-43b6-a6ca-745084259e0d',
volume number: 0\n Node name: 'de-fra-node9', resource name: 'pvc-4e6cf7f8-2a42-4129-9a9b-cba310b3ed9e',
volume number: 0\n Node name: 'de-fra-node9', resource name: 'pvc-7afac331-4319-4ca7-b587-1ec267dc63b8',
volume number: 0\n Node name: 'de-fra-node9', resource name: 'pvc-7c07be83-43aa-44af-b9c6-c600339ad6a8',
volume number: 0\n Node name: 'de-fra-node9', resource name: 'pvc-8366a898-20ca-4f71-abee-cc3c40ff8bf1',
volume number: 0\n Node name: 'de-fra-node9', resource name: 'pvc-a2bff230-7ca6-4e93-a273-15d217967def',
volume number: 0\n Node name: 'de-fra-node9', resource name: 'pvc-b1cb2468-4c1b-4872-8040-9e9860c45c76',
volume number: 0\n Node name: 'de-fra-node9', resource name: 'pvc-cabb808d-d90b-4c59-831f-a3f08420effc',
volume number: 0\nNode: de-fra-node9, Storage pool name: lvm-thin'; Correction:
'Delete the listed volumes and snapshot-volumes first.'; Reports: '[61F976B7-00000-071670]'"
- "Message: 'The specified storage pool 'lvm-thin' on node 'de-fra-node8' can not
be deleted as volumes / snapshot-volumes are still using it.'; Details: 'Volumes
/ snapshot-volumes that are still using the storage pool: \n Node name: 'de-fra-node8',
resource name: 'pvc-1e59589f-e04e-4aee-a1c6-0561a764a7e8', volume number: 0\n
\ Node name: 'de-fra-node8', resource name: 'pvc-2c8ea040-9651-4501-a0d9-7b3920c82ec8',
volume number: 0\n Node name: 'de-fra-node8', resource name: 'pvc-4506c792-6fbb-43b6-a6ca-745084259e0d',
volume number: 0\n Node name: 'de-fra-node8', resource name: 'pvc-4e6cf7f8-2a42-4129-9a9b-cba310b3ed9e',
volume number: 0\n Node name: 'de-fra-node8', resource name: 'pvc-7afac331-4319-4ca7-b587-1ec267dc63b8',
volume number: 0\n Node name: 'de-fra-node8', resource name: 'pvc-7c07be83-43aa-44af-b9c6-c600339ad6a8',
volume number: 0\n Node name: 'de-fra-node8', resource name: 'pvc-8366a898-20ca-4f71-abee-cc3c40ff8bf1',
volume number: 0\n Node name: 'de-fra-node8', resource name: 'pvc-a2bff230-7ca6-4e93-a273-15d217967def',
volume number: 0\n Node name: 'de-fra-node8', resource name: 'pvc-b1cb2468-4c1b-4872-8040-9e9860c45c76',
volume number: 0\n Node name: 'de-fra-node8', resource name: 'pvc-cabb808d-d90b-4c59-831f-a3f08420effc',
volume number: 0\nNode: de-fra-node8, Storage pool name: lvm-thin'; Correction:
'Delete the listed volumes and snapshot-volumes first.'; Reports: '[61F976B7-00000-071671]'"
- "Message: 'The specified storage pool 'lvm-thin' on node 'de-fra-node10' can not
be deleted as volumes / snapshot-volumes are still using it.'; Details: 'Volumes
/ snapshot-volumes that are still using the storage pool: \n Node name: 'de-fra-node10',
resource name: 'pvc-1e59589f-e04e-4aee-a1c6-0561a764a7e8', volume number: 0\n
\ Node name: 'de-fra-node10', resource name: 'pvc-2c8ea040-9651-4501-a0d9-7b3920c82ec8',
volume number: 0\n Node name: 'de-fra-node10', resource name: 'pvc-4506c792-6fbb-43b6-a6ca-745084259e0d',
volume number: 0\n Node name: 'de-fra-node10', resource name: 'pvc-4e6cf7f8-2a42-4129-9a9b-cba310b3ed9e',
volume number: 0\n Node name: 'de-fra-node10', resource name: 'pvc-7afac331-4319-4ca7-b587-1ec267dc63b8',
volume number: 0\n Node name: 'de-fra-node10', resource name: 'pvc-7c07be83-43aa-44af-b9c6-c600339ad6a8',
volume number: 0\n Node name: 'de-fra-node10', resource name: 'pvc-8366a898-20ca-4f71-abee-cc3c40ff8bf1',
volume number: 0\n Node name: 'de-fra-node10', resource name: 'pvc-a2bff230-7ca6-4e93-a273-15d217967def',
volume number: 0\n Node name: 'de-fra-node10', resource name: 'pvc-b1cb2468-4c1b-4872-8040-9e9860c45c76',
volume number: 0\n Node name: 'de-fra-node10', resource name: 'pvc-cabb808d-d90b-4c59-831f-a3f08420effc',
volume number: 0\nNode: de-fra-node10, Storage pool name: lvm-thin'; Correction:
'Delete the listed volumes and snapshot-volumes first.'; Reports: '[61F976B7-00000-071672]'"```
Is it maybe a problem that the storage pools haven't be announced in the values file for the new CRD configuration?
Yes. If they are not set on helm upgrade, helm just removes them, and then the operator tries to delete the storage pool in a loop. It's a bit cumbersome, I know, but that's what helm does :shrug:
So you should edit the LinstorSatelliteSet to say:
spec:
storagePools:
lvmThinPools:
- name: lvm-thin
volumeGroup: vg-pool
thinVolume: disk-redundant
And also save that to your helm overrides for the next upgrade.
Awesome, thanks for your help @WanzenBug , that actually resolved the problem. Our expectation was that the new CRD would get merged with the existing information in etcd, so we skipped the setting in the values.yaml. Maybe this should be added to
Since the update to 1.7.0 the linstor-controller every few seconds throws the error in the log
The specified storage pool 'lvm-thin' on node 'host9' can not be deleted as volumes / snapshot-volumes are still using it.
. Even if all containers with DRBD resources are scaled down and no DRBD resource is mounted on the cluster nodes the message doesn't disappear.Is this a bug during the upgrade routine or is it smth we have to resolve manually?
Here is the error report: