piraeusdatastore / piraeus-operator

The Piraeus Operator manages LINSTOR clusters in Kubernetes.
https://piraeus.io/
Apache License 2.0
394 stars 63 forks source link

Upgrade to v2.2.0 (from v2.1.1) fails as controller backup secret cannot be created (data too long) #541

Closed RichardSufliarsky closed 1 year ago

RichardSufliarsky commented 1 year ago

Just applied the upgrade kubectl apply --server-side -k "https://github.com/piraeusdatastore/piraeus-operator/config/default?ref=v2" and linstor-controller pod is in crash loop as the run-migration init container can't create secret (the size of it is more than 1MB).

Is there any way how to quickly fix it?

time="2023-10-16T14:02:38Z" level=info msg="running k8s-await-election" version=refs/tags/v0.3.1
time="2023-10-16T14:02:38Z" level=info msg="no status endpoint specified, will not be created"
I1016 14:02:38.300681       1 leaderelection.go:248] attempting to acquire leader lease piraeus-datastore/linstor-controller...
I1016 14:02:38.312779       1 leaderelection.go:258] successfully acquired lease piraeus-datastore/linstor-controller
time="2023-10-16T14:02:38Z" level=info msg="long live our new leader: 'linstor-controller-7b94cbbc97-lsfp5'!"
time="2023-10-16T14:02:38Z" level=info msg="starting command '/usr/bin/piraeus-entry.sh' with arguments: '[runMigration]'"
Loading configuration file "/etc/linstor/linstor.toml"
INFO:    Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule"
INFO:    Extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule" is not installed
INFO:    Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule"
DEBUG:   Constructing instance of module "com.linbit.linstor.modularcrypto.JclCryptoModule" with default constructor
INFO:    Dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule" was successful
INFO:    Cryptography provider: Using default cryptography module
INFO:    Kubernetes-CRD connection URL is "k8s"
14:02:39.110 [main] DEBUG io.fabric8.kubernetes.client.Config -- Trying to configure client from Kubernetes config...
14:02:39.113 [main] DEBUG io.fabric8.kubernetes.client.Config -- Did not find Kubernetes config at: [/root/.kube/config]. Ignoring.
14:02:39.114 [main] DEBUG io.fabric8.kubernetes.client.Config -- Trying to configure client from service account...
14:02:39.114 [main] DEBUG io.fabric8.kubernetes.client.Config -- Found service account host and port: 10.96.0.1:443
14:02:39.114 [main] DEBUG io.fabric8.kubernetes.client.Config -- Found service account ca cert at: [/var/run/secrets/kubernetes.io/serviceaccount/ca.crt}].
14:02:39.115 [main] DEBUG io.fabric8.kubernetes.client.Config -- Found service account token at: [/var/run/secrets/kubernetes.io/serviceaccount/token].
14:02:39.115 [main] DEBUG io.fabric8.kubernetes.client.Config -- Trying to configure client namespace from Kubernetes service account namespace path...
14:02:39.115 [main] DEBUG io.fabric8.kubernetes.client.Config -- Found service account namespace at: [/var/run/secrets/kubernetes.io/serviceaccount/namespace].
14:02:39.122 [main] DEBUG io.fabric8.kubernetes.client.utils.HttpClientUtils -- Using httpclient io.fabric8.kubernetes.client.okhttp.OkHttpClientFactory factory
TRACE:   Found database version 11
needs migration
Error from server (NotFound): secrets "linstor-backup-for-linstor-controller-7b94cbbc97-lsfp5" not found
crds.yaml
ebsremotes.internal.linstor.linbit.com.yaml
files.internal.linstor.linbit.com.yaml
keyvaluestore.internal.linstor.linbit.com.yaml
layerbcachevolumes.internal.linstor.linbit.com.yaml
layercachevolumes.internal.linstor.linbit.com.yaml
layerdrbdresourcedefinitions.internal.linstor.linbit.com.yaml
layerdrbdresources.internal.linstor.linbit.com.yaml
layerdrbdvolumedefinitions.internal.linstor.linbit.com.yaml
layerdrbdvolumes.internal.linstor.linbit.com.yaml
layerluksvolumes.internal.linstor.linbit.com.yaml
layeropenflexresourcedefinitions.internal.linstor.linbit.com.yaml
layeropenflexvolumes.internal.linstor.linbit.com.yaml
layerresourceids.internal.linstor.linbit.com.yaml
layerstoragevolumes.internal.linstor.linbit.com.yaml
layerwritecachevolumes.internal.linstor.linbit.com.yaml
linstorremotes.internal.linstor.linbit.com.yaml
linstorversion.internal.linstor.linbit.com.yaml
nodeconnections.internal.linstor.linbit.com.yaml
nodenetinterfaces.internal.linstor.linbit.com.yaml
nodes.internal.linstor.linbit.com.yaml
nodestorpool.internal.linstor.linbit.com.yaml
propscontainers.internal.linstor.linbit.com.yaml
resourceconnections.internal.linstor.linbit.com.yaml
resourcedefinitions.internal.linstor.linbit.com.yaml
resourcegroups.internal.linstor.linbit.com.yaml
resources.internal.linstor.linbit.com.yaml
rollback.internal.linstor.linbit.com.yaml
s3remotes.internal.linstor.linbit.com.yaml
satellitescapacity.internal.linstor.linbit.com.yaml
schedules.internal.linstor.linbit.com.yaml
secaccesstypes.internal.linstor.linbit.com.yaml
secaclmap.internal.linstor.linbit.com.yaml
secconfiguration.internal.linstor.linbit.com.yaml
secdfltroles.internal.linstor.linbit.com.yaml
secidentities.internal.linstor.linbit.com.yaml
secidrolemap.internal.linstor.linbit.com.yaml
secobjectprotection.internal.linstor.linbit.com.yaml
secroles.internal.linstor.linbit.com.yaml
sectyperules.internal.linstor.linbit.com.yaml
sectypes.internal.linstor.linbit.com.yaml
spacehistory.internal.linstor.linbit.com.yaml
storpooldefinitions.internal.linstor.linbit.com.yaml
trackingdate.internal.linstor.linbit.com.yaml
volumeconnections.internal.linstor.linbit.com.yaml
volumedefinitions.internal.linstor.linbit.com.yaml
volumegroups.internal.linstor.linbit.com.yaml
volumes.internal.linstor.linbit.com.yaml
error: failed to create secret Secret "linstor-backup-for-linstor-controller-7b94cbbc97-lsfp5" is invalid: data: Too long: must have at most 1048576 bytes
time="2023-10-16T14:03:02Z" level=fatal msg="failed to run" err="exit status 1"
WanzenBug commented 1 year ago

You can create the expected back up locally:

mkdir linstor-backup && cd linstor-backup
kubectl get crds | grep -o ".*.internal.linstor.linbit.com" | xargs kubectl get crds -oyaml > crds.yaml
kubectl get crds | grep -o ".*.internal.linstor.linbit.com" | xargs -i{} sh -c "kubectl get {} -oyaml > {}.yaml"

Afterwards, you can create a simple "empty" secret with the expected name linstor-backup-for-linstor-controller-7b94cbbc97-lsfp5. Then the migration should continue to run.

RichardSufliarsky commented 1 year ago

@WanzenBug thank you, saved my day again.

dimm0 commented 2 weeks ago

Can this be fixed permanently? I hit this every time I upgrade

RichardSufliarsky commented 2 weeks ago

There seems to be no quick fix available as the backup is already compressed https://github.com/LINBIT/linstor-server/blob/4c1a5dd4e96fea6a27f8bb34e7c0a4c54133a8b7/scripts/entry.sh#L22, so it would require to implement logic that would split the backup into more secrets and that means that check for the existing backup would need to be also more sophisticated.

dimm0 commented 2 weeks ago

Can we at least get an option to disable the backup?