Closed perfectra1n closed 9 months ago
Hey there, have you tried upgrading to 0.23.1
? I've fixed an issue related to the certificates.
Hi! Yeah, I just upgraded to 0.23.1
as well:
Now I get the following output when viewing the application from ArgoCD:
Name: argocd/mariadb-op
Project: default
Server: https://kubernetes.default.svc
Namespace: databases
URL: https://argocd.domain.network/applications/mariadb-op
Repo: https://mariadb-operator.github.io/mariadb-operator
Target: 0.23.1
Path:
SyncWindow: Sync Allowed
Sync Policy: Automated (Prune)
Sync Status: Synced to 0.23.1
Health Status: Degraded
GROUP KIND NAMESPACE NAME STATUS HEALTH HOOK MESSAGE
Namespace databases Succeeded Synced namespace/databases serverside-applied
Secret databases mariadb-op-mariadb-operator-webhook-cert Succeeded Pruned pruned
Service databases mariadb-op-mariadb-operator-webhook Synced Healthy
ServiceAccount databases mariadb-op-mariadb-operator Synced
ServiceAccount databases mariadb-op-mariadb-operator-webhook Synced
admissionregistration.k8s.io MutatingWebhookConfiguration mariadb-op-mariadb-operator-webhook Synced
admissionregistration.k8s.io ValidatingWebhookConfiguration mariadb-op-mariadb-operator-webhook Synced
apiextensions.k8s.io CustomResourceDefinition backups.mariadb.mmontes.io Synced
apiextensions.k8s.io CustomResourceDefinition connections.mariadb.mmontes.io Synced
apiextensions.k8s.io CustomResourceDefinition databases.mariadb.mmontes.io Synced
apiextensions.k8s.io CustomResourceDefinition grants.mariadb.mmontes.io Synced
apiextensions.k8s.io CustomResourceDefinition mariadbs.mariadb.mmontes.io Synced
apiextensions.k8s.io CustomResourceDefinition restores.mariadb.mmontes.io Synced
apiextensions.k8s.io CustomResourceDefinition sqljobs.mariadb.mmontes.io Synced
apiextensions.k8s.io CustomResourceDefinition users.mariadb.mmontes.io Synced
apps Deployment databases mariadb-op-mariadb-operator Synced Healthy
apps Deployment databases mariadb-op-mariadb-operator-webhook Synced Healthy
cert-manager.io Certificate databases mariadb-op-mariadb-operator-webhook-cert Synced Degraded
cert-manager.io Issuer databases mariadb-op-mariadb-operator-selfsigned-issuer Synced Healthy
rbac.authorization.k8s.io ClusterRole mariadb-op-mariadb-operator Synced
rbac.authorization.k8s.io ClusterRoleBinding mariadb-op-mariadb-operator Synced
rbac.authorization.k8s.io ClusterRoleBinding mariadb-op-mariadb-operator:auth-delegator Synced
rbac.authorization.k8s.io Role databases mariadb-op-mariadb-operator Synced
rbac.authorization.k8s.io RoleBinding databases mariadb-op-mariadb-operator Synced
So it looks like the mariadb-op-mariadb-operator-webhook-cert
is in some kind of a weird state now, I'll keep trying to debug...
Could it possibly be because the same resource of the same name was pruned above? I'm assuming not since it's a differente resource type...
I had to delete the cert mariadb-op-mariadb-operator-webhook-cert
, which was in the Degraded
state, and it looks like it's recreated now:
Name: argocd/mariadb-op
Project: default
Server: https://kubernetes.default.svc
Namespace: databases
URL: https://argocd.domain.network/applications/mariadb-op
Repo: https://mariadb-operator.github.io/mariadb-operator
Target: 0.23.1
Path:
SyncWindow: Sync Allowed
Sync Policy: Automated (Prune)
Sync Status: OutOfSync from 0.23.1
Health Status: Healthy
GROUP KIND NAMESPACE NAME STATUS HEALTH HOOK MESSAGE
Namespace databases Succeeded Synced namespace/databases serverside-applied
Secret databases mariadb-op-mariadb-operator-webhook-cert OutOfSync pruned
Service databases mariadb-op-mariadb-operator-webhook Synced Healthy
ServiceAccount databases mariadb-op-mariadb-operator Synced
ServiceAccount databases mariadb-op-mariadb-operator-webhook Synced
admissionregistration.k8s.io MutatingWebhookConfiguration mariadb-op-mariadb-operator-webhook Synced
admissionregistration.k8s.io ValidatingWebhookConfiguration mariadb-op-mariadb-operator-webhook Synced
apiextensions.k8s.io CustomResourceDefinition backups.mariadb.mmontes.io Synced
apiextensions.k8s.io CustomResourceDefinition connections.mariadb.mmontes.io Synced
apiextensions.k8s.io CustomResourceDefinition databases.mariadb.mmontes.io Synced
apiextensions.k8s.io CustomResourceDefinition grants.mariadb.mmontes.io Synced
apiextensions.k8s.io CustomResourceDefinition mariadbs.mariadb.mmontes.io Synced
apiextensions.k8s.io CustomResourceDefinition restores.mariadb.mmontes.io Synced
apiextensions.k8s.io CustomResourceDefinition sqljobs.mariadb.mmontes.io Synced
apiextensions.k8s.io CustomResourceDefinition users.mariadb.mmontes.io Synced
apps Deployment databases mariadb-op-mariadb-operator Synced Healthy
apps Deployment databases mariadb-op-mariadb-operator-webhook Synced Healthy
cert-manager.io Certificate databases mariadb-op-mariadb-operator-webhook-cert Synced Healthy
cert-manager.io Issuer databases mariadb-op-mariadb-operator-selfsigned-issuer Synced Healthy
rbac.authorization.k8s.io ClusterRole mariadb-op-mariadb-operator Synced
rbac.authorization.k8s.io ClusterRoleBinding mariadb-op-mariadb-operator Synced
rbac.authorization.k8s.io ClusterRoleBinding mariadb-op-mariadb-operator:auth-delegator Synced
rbac.authorization.k8s.io Role databases mariadb-op-mariadb-operator Synced
rbac.authorization.k8s.io RoleBinding databases mariadb-op-mariadb-operator Synced
But then the same error reoccurs:
one or more objects failed to apply, reason: Internal error occurred: failed calling webhook "mmariadb.kb.io": failed to call webhook: Post "https://mariadb-op-mariadb-operator-webhook.databases.svc:443/mutate-mariadb-mmontes-io-v1alpha1-mariadb?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "x509: invalid signature: parent certificate cannot sign this kind of certificate" while trying to verify candidate authority certificate "mariadb-op-mariadb-operator-webhook.databases.svc").
Here are the values that I'm using for the operator's deployment:
clusterName: "newcluster.local"
ha:
enabled: true
webhook:
cert:
certManager:
enabled: true
And here's the Database:
apiVersion: mariadb.mmontes.io/v1alpha1
kind: MariaDB
metadata:
name: mariadb
namespace: databases
annotations:
argocd.argoproj.io/compare-options: IgnoreExtraneous
argocd.argoproj.io/sync-options: Prune=false
spec:
# Error writing Galera config: open /etc/mysql/mariadb.conf.d/0-galera.cnf: permission denied
podSecurityContext:
runAsUser: 0
rootPasswordSecretKeyRef:
name: mariadb-creds
key: root-password
database: mariadb
username: mainuser
passwordSecretKeyRef:
name: mariadb-creds
key: password
image: mariadb:11.0.3
port: 3306
replicas: 5
galera:
enabled: true
primary:
podIndex: 0
automaticFailover: true
sst: mariabackup
replicaThreads: 1
agent:
image: ghcr.io/mariadb-operator/agent:v0.0.3
port: 5555
kubernetesAuth:
enabled: true
gracefulShutdownTimeout: 5s
recovery:
enabled: true
clusterHealthyTimeout: 3m0s
clusterBootstrapTimeout: 10m0s
podRecoveryTimeout: 5m0s
podSyncTimeout: 5m0s
initContainer:
image: ghcr.io/mariadb-operator/init:v0.0.6
volumeClaimTemplate:
resources:
requests:
storage: 300Mi
accessModes:
- ReadWriteOnce
service:
type: LoadBalancer
annotations:
metallb.universe.tf/ip-allocated-from-pool: first-pool
metallb.universe.tf/loadBalancerIPs: 10.11.0.30
connection:
secretName: mariadb-galera-conn
secretTemplate:
key: dsn
I'm getting that error to occur when updating the spec.image
within the above MariaDB
resource to a newer version.
Hey there! thanks for reporting this with so many details.
I've managed to install the 0.23.1
chart with the same values as you and to successfully apply a Mariadb
resources later on, which means that the webhook is responding correctly.
Judging by the x509 error you reported, it seems like the CA that signed the new certificates is unknown and untrusted by the webhook. This CA is injected by cert-manager in the ValidatingWebhookConfiguration
and the MutatingWebhookConfiguration
objects, which probably might be in a weird intermediate state. Could you try deleting them and resyncing your Argocd so we get some fresh new ones?
Thanks for reviewing my deluge of information! I always try to report too much rather than too little.
I went ahead and deleted the ValidatingWebhookConfiguration
and MutatingWebhookConfiguration
resources managed by mariadb-operator
.
Interestingly enough, it appears as though the Secret
named mariadb-op-mariadb-operator-webhook-cert
is just being created over and over again, even though I'm using the Certificate
resource named mariadb-op-mariadb-operator-webhook-cert
with the new chart.
I see that you have:
{{- if not .Values.webhook.cert.certManager.enabled }}
within webhook-secret.yaml, so I'm surprised it's still being recreated...
It's just spamming CertificateRequests
too 😬
I see the same logs as #267 too, from the webhook pod...
{"level":"info","ts":1701813745.8802757,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":1701813808.9075487,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":1701813808.9087875,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":1701813897.8882475,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":1701813897.8895173,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":1701813982.8825915,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":1701813982.8852727,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":1701814061.2421184,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":1701814061.2433603,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
I'm seriously trash at Golang, so please do take what I say with a grain of salt...
Is it possible that during reconcile, the controller is "erroring" as it's waiting for a cert from certmanager (couldn't find the certmanager code, my fault lol) and then creating the key here?
I'm really not sure why it's just creating CertificateRequest
after CertificateRequest
, instead of waiting for certmanager to pick them up?
Webhook logs:
{"level":"info","ts":1701814552.278027,"logger":"setup","msg":"Starting manager"}
{"level":"info","ts":1701814552.2785738,"logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":1701814552.2787123,"logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8080","secure":false}
{"level":"info","ts":1701814552.2788506,"msg":"starting server","kind":"health probe","addr":"[::]:8081"}
{"level":"info","ts":1701814552.2788734,"logger":"controller-runtime.webhook","msg":"Starting webhook server"}
{"level":"info","ts":1701814552.2794907,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":1701814552.2795625,"logger":"controller-runtime.webhook","msg":"Serving webhook server","host":"","port":10250}
{"level":"info","ts":1701814552.2797909,"logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}
{"level":"debug","ts":1701814554.1360002,"logger":"controller-runtime.certwatcher","msg":"certificate event","event":"REMOVE \"/tmp/k8s-webhook-server/serving-certs/tls.key\""}
{"level":"info","ts":1701814554.1377637,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"debug","ts":1701814554.1379616,"logger":"controller-runtime.certwatcher","msg":"certificate event","event":"REMOVE \"/tmp/k8s-webhook-server/serving-certs/tls.crt\""}
{"level":"info","ts":1701814554.1401744,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"debug","ts":1701814576.0212483,"logger":"controller-runtime.certwatcher","msg":"certificate event","event":"REMOVE \"/tmp/k8s-webhook-server/serving-certs/tls.key\""}
{"level":"info","ts":1701814576.023399,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"debug","ts":1701814576.0234637,"logger":"controller-runtime.certwatcher","msg":"certificate event","event":"REMOVE \"/tmp/k8s-webhook-server/serving-certs/tls.crt\""}
{"level":"info","ts":1701814576.0247002,"logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"debug","ts":1701814663.880187,"logger":"controller-runtime.certwatcher","msg":"certificate event","event":"REMOVE \"/tmp/k8s-webhook-server/serving-certs/tls.key\""}
Operator logs:
{"level":"info","ts":1701814553.3222492,"logger":"setup","msg":"Starting manager"}
{"level":"info","ts":1701814553.3223999,"logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":1701814553.32256,"msg":"starting server","kind":"health probe","addr":"[::]:8081"}
{"level":"info","ts":1701814553.3225813,"logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8080","secure":false}
I1205 22:15:53.424785 1 leaderelection.go:250] attempting to acquire leader lease databases/mariadb-operator.mmontes.io...
I1205 22:16:15.417191 1 leaderelection.go:260] successfully acquired lease databases/mariadb-operator.mmontes.io
{"level":"debug","ts":1701814575.417311,"logger":"events","msg":"mariadb-op-mariadb-operator-5867955d4b-gzqhm_82d870e6-e730-478c-80b5-23984d7c93fc became leader","type":"Normal","object":{"kind":"Lease","namespace":"databases","name":"mariadb-operator.mmontes.io","uid":"c1aeccc4-2f0c-45f9-85b2-9d223c665e00","apiVersion":"coordination.k8s.io/v1","resourceVersion":"255861041"},"reason":"LeaderElection"}
{"level":"info","ts":1701814575.4184875,"msg":"Starting EventSource","controller":"restore","controllerGroup":"mariadb.mmontes.io","controllerKind":"Restore","source":"kind source: *v1alpha1.Restore"}
{"level":"info","ts":1701814575.4196618,"msg":"Starting EventSource","controller":"restore","controllerGroup":"mariadb.mmontes.io","controllerKind":"Restore","source":"kind source: *v1.Job"}
{"level":"info","ts":1701814575.4197085,"msg":"Starting Controller","controller":"restore","controllerGroup":"mariadb.mmontes.io","controllerKind":"Restore"}
{"level":"info","ts":1701814575.4225392,"msg":"Starting EventSource","controller":"connection","controllerGroup":"mariadb.mmontes.io","controllerKind":"Connection","source":"kind source: *v1alpha1.Connection"}
{"level":"info","ts":1701814575.4229555,"msg":"Starting EventSource","controller":"mariadb","controllerGroup":"mariadb.mmontes.io","controllerKind":"MariaDB","source":"kind source: *v1alpha1.MariaDB"}
{"level":"info","ts":1701814575.4231176,"msg":"Starting EventSource","controller":"mariadb","controllerGroup":"mariadb.mmontes.io","controllerKind":"MariaDB","source":"kind source: *v1alpha1.Connection"}
{"level":"info","ts":1701814575.4232342,"msg":"Starting EventSource","controller":"mariadb","controllerGroup":"mariadb.mmontes.io","controllerKind":"MariaDB","source":"kind source: *v1alpha1.Restore"}
{"level":"info","ts":1701814575.4232583,"msg":"Starting EventSource","controller":"mariadb","controllerGroup":"mariadb.mmontes.io","controllerKind":"MariaDB","source":"kind source: *v1.ConfigMap"}
{"level":"info","ts":1701814575.423286,"msg":"Starting EventSource","controller":"mariadb","controllerGroup":"mariadb.mmontes.io","controllerKind":"MariaDB","source":"kind source: *v1.Service"}
{"level":"info","ts":1701814575.4233344,"msg":"Starting EventSource","controller":"mariadb","controllerGroup":"mariadb.mmontes.io","controllerKind":"MariaDB","source":"kind source: *v1.Secret"}
{"level":"info","ts":1701814575.4233549,"msg":"Starting EventSource","controller":"mariadb","controllerGroup":"mariadb.mmontes.io","controllerKind":"MariaDB","source":"kind source: *v1.Event"}
{"level":"info","ts":1701814575.4233735,"msg":"Starting EventSource","controller":"mariadb","controllerGroup":"mariadb.mmontes.io","controllerKind":"MariaDB","source":"kind source: *v1.ServiceAccount"}
{"level":"info","ts":1701814575.4233916,"msg":"Starting EventSource","controller":"mariadb","controllerGroup":"mariadb.mmontes.io","controllerKind":"MariaDB","source":"kind source: *v1.StatefulSet"}
{"level":"info","ts":1701814575.423419,"msg":"Starting EventSource","controller":"backup","controllerGroup":"mariadb.mmontes.io","controllerKind":"Backup","source":"kind source: *v1alpha1.Backup"}
{"level":"info","ts":1701814575.4235032,"msg":"Starting EventSource","controller":"backup","controllerGroup":"mariadb.mmontes.io","controllerKind":"Backup","source":"kind source: *v1.CronJob"}
{"level":"info","ts":1701814575.4235187,"msg":"Starting EventSource","controller":"mariadb","controllerGroup":"mariadb.mmontes.io","controllerKind":"MariaDB","source":"kind source: *v1.PodDisruptionBudget"}
{"level":"info","ts":1701814575.4235673,"msg":"Starting EventSource","controller":"backup","controllerGroup":"mariadb.mmontes.io","controllerKind":"Backup","source":"kind source: *v1.Job"}
{"level":"info","ts":1701814575.4242637,"msg":"Starting Controller","controller":"backup","controllerGroup":"mariadb.mmontes.io","controllerKind":"Backup"}
{"level":"info","ts":1701814575.425194,"msg":"Starting EventSource","controller":"statefulset","controllerGroup":"apps","controllerKind":"StatefulSet","source":"kind source: *v1.StatefulSet"}
{"level":"info","ts":1701814575.425274,"msg":"Starting Controller","controller":"statefulset","controllerGroup":"apps","controllerKind":"StatefulSet"}
{"level":"info","ts":1701814575.4252162,"msg":"Starting EventSource","controller":"sqljob","controllerGroup":"mariadb.mmontes.io","controllerKind":"SqlJob","source":"kind source: *v1alpha1.SqlJob"}
{"level":"info","ts":1701814575.4253242,"msg":"Starting EventSource","controller":"connection","controllerGroup":"mariadb.mmontes.io","controllerKind":"Connection","source":"kind source: *v1.Secret"}
{"level":"info","ts":1701814575.4235914,"msg":"Starting EventSource","controller":"mariadb","controllerGroup":"mariadb.mmontes.io","controllerKind":"MariaDB","source":"kind source: *v1.Role"}
{"level":"info","ts":1701814575.4254615,"msg":"Starting EventSource","controller":"sqljob","controllerGroup":"mariadb.mmontes.io","controllerKind":"SqlJob","source":"kind source: *v1.ConfigMap"}
{"level":"info","ts":1701814575.4263847,"msg":"Starting Controller","controller":"connection","controllerGroup":"mariadb.mmontes.io","controllerKind":"Connection"}
{"level":"info","ts":1701814575.4264429,"msg":"Starting EventSource","controller":"pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":1701814575.4265463,"msg":"Starting EventSource","controller":"pod","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":1701814575.4265692,"msg":"Starting EventSource","controller":"grant","controllerGroup":"mariadb.mmontes.io","controllerKind":"Grant","source":"kind source: *v1alpha1.Grant"}
{"level":"info","ts":1701814575.4278758,"msg":"Starting Controller","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":1701814575.4279294,"msg":"Starting Controller","controller":"pod","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":1701814575.4235246,"msg":"Starting EventSource","controller":"user","controllerGroup":"mariadb.mmontes.io","controllerKind":"User","source":"kind source: *v1alpha1.User"}
{"level":"info","ts":1701814575.4254866,"msg":"Starting EventSource","controller":"mariadb","controllerGroup":"mariadb.mmontes.io","controllerKind":"MariaDB","source":"kind source: *v1.RoleBinding"}
{"level":"info","ts":1701814575.4281435,"msg":"Starting EventSource","controller":"grant","controllerGroup":"mariadb.mmontes.io","controllerKind":"Grant","source":"kind source: *v1alpha1.User"}
{"level":"info","ts":1701814575.428208,"msg":"Starting Controller","controller":"grant","controllerGroup":"mariadb.mmontes.io","controllerKind":"Grant"}
{"level":"info","ts":1701814575.4281247,"msg":"Starting Controller","controller":"user","controllerGroup":"mariadb.mmontes.io","controllerKind":"User"}
{"level":"info","ts":1701814575.4284348,"msg":"Starting EventSource","controller":"database","controllerGroup":"mariadb.mmontes.io","controllerKind":"Database","source":"kind source: *v1alpha1.Database"}
{"level":"info","ts":1701814575.4285102,"msg":"Starting EventSource","controller":"mariadb","controllerGroup":"mariadb.mmontes.io","controllerKind":"MariaDB","source":"kind source: *v1.ClusterRoleBinding"}
{"level":"info","ts":1701814575.4286468,"msg":"Starting Controller","controller":"database","controllerGroup":"mariadb.mmontes.io","controllerKind":"Database"}
{"level":"info","ts":1701814575.4286635,"msg":"Starting Controller","controller":"mariadb","controllerGroup":"mariadb.mmontes.io","controllerKind":"MariaDB"}
{"level":"info","ts":1701814575.4288566,"msg":"Starting EventSource","controller":"sqljob","controllerGroup":"mariadb.mmontes.io","controllerKind":"SqlJob","source":"kind source: *v1.CronJob"}
{"level":"info","ts":1701814575.429329,"msg":"Starting EventSource","controller":"sqljob","controllerGroup":"mariadb.mmontes.io","controllerKind":"SqlJob","source":"kind source: *v1.Job"}
{"level":"info","ts":1701814575.4293923,"msg":"Starting Controller","controller":"sqljob","controllerGroup":"mariadb.mmontes.io","controllerKind":"SqlJob"}
{"level":"info","ts":1701814575.9881175,"msg":"Starting workers","controller":"grant","controllerGroup":"mariadb.mmontes.io","controllerKind":"Grant","worker count":1}
{"level":"info","ts":1701814575.9881206,"msg":"Starting workers","controller":"statefulset","controllerGroup":"apps","controllerKind":"StatefulSet","worker count":1}
{"level":"info","ts":1701814575.9891937,"logger":"galera.health","msg":"Checking Galera cluster health","controller":"statefulset","controllerGroup":"apps","controllerKind":"StatefulSet","StatefulSet":{"name":"mariadb","namespace":"databases"},"namespace":"databases","name":"mariadb","reconcileID":"22c41f16-69c6-4b5a-ae2c-70561ca18d79"}
{"level":"debug","ts":1701814575.989259,"logger":"galera.health","msg":"StatefulSet ready replicas","controller":"statefulset","controllerGroup":"apps","controllerKind":"StatefulSet","StatefulSet":{"name":"mariadb","namespace":"databases"},"namespace":"databases","name":"mariadb","reconcileID":"22c41f16-69c6-4b5a-ae2c-70561ca18d79","replicas":5}
{"level":"info","ts":1701814576.0064301,"msg":"Starting workers","controller":"user","controllerGroup":"mariadb.mmontes.io","controllerKind":"User","worker count":1}
{"level":"info","ts":1701814576.0066135,"msg":"Starting workers","controller":"database","controllerGroup":"mariadb.mmontes.io","controllerKind":"Database","worker count":1}
{"level":"info","ts":1701814576.0069342,"msg":"Starting workers","controller":"connection","controllerGroup":"mariadb.mmontes.io","controllerKind":"Connection","worker count":1}
{"level":"info","ts":1701814576.0067024,"msg":"Starting workers","controller":"pod","controllerGroup":"","controllerKind":"Pod","worker count":1}
{"level":"info","ts":1701814576.0389016,"msg":"Starting workers","controller":"restore","controllerGroup":"mariadb.mmontes.io","controllerKind":"Restore","worker count":1}
{"level":"info","ts":1701814576.0390863,"msg":"Starting workers","controller":"sqljob","controllerGroup":"mariadb.mmontes.io","controllerKind":"SqlJob","worker count":1}
{"level":"info","ts":1701814576.0392168,"msg":"Starting workers","controller":"pod","controllerGroup":"","controllerKind":"Pod","worker count":1}
{"level":"debug","ts":1701814576.0398002,"msg":"Reconciling Pod in Ready state","controller":"pod","controllerGroup":"","controllerKind":"Pod","Pod":{"name":"mariadb-3","namespace":"databases"},"namespace":"databases","name":"mariadb-3","reconcileID":"f3ac37e3-4a1f-49f0-a255-93e1f8391ba9","pod":"mariadb-3"}
{"level":"debug","ts":1701814576.0401933,"msg":"Reconciling Pod in Ready state","controller":"pod","controllerGroup":"","controllerKind":"Pod","Pod":{"name":"mariadb-2","namespace":"databases"},"namespace":"databases","name":"mariadb-2","reconcileID":"4b3819b9-568a-44e1-b1d1-a8841d807906","pod":"mariadb-2"}
{"level":"debug","ts":1701814576.0404835,"msg":"Reconciling Pod in Ready state","controller":"pod","controllerGroup":"","controllerKind":"Pod","Pod":{"name":"mariadb-1","namespace":"databases"},"namespace":"databases","name":"mariadb-1","reconcileID":"9b560c72-fb59-4594-a7d8-0113f7d3e1af","pod":"mariadb-1"}
{"level":"debug","ts":1701814576.0407934,"msg":"Reconciling Pod in Ready state","controller":"pod","controllerGroup":"","controllerKind":"Pod","Pod":{"name":"mariadb-0","namespace":"databases"},"namespace":"databases","name":"mariadb-0","reconcileID":"6270af06-cc36-4112-806a-0230a4d6ded7","pod":"mariadb-0"}
{"level":"debug","ts":1701814576.0411425,"msg":"Reconciling Pod in Ready state","controller":"pod","controllerGroup":"","controllerKind":"Pod","Pod":{"name":"mariadb-4","namespace":"databases"},"namespace":"databases","name":"mariadb-4","reconcileID":"7d36688b-8e28-4497-8d1c-c845432336e3","pod":"mariadb-4"}
{"level":"info","ts":1701814576.05047,"msg":"Starting workers","controller":"backup","controllerGroup":"mariadb.mmontes.io","controllerKind":"Backup","worker count":1}
{"level":"info","ts":1701814576.0505714,"msg":"Starting workers","controller":"mariadb","controllerGroup":"mariadb.mmontes.io","controllerKind":"MariaDB","worker count":1}
{"level":"debug","ts":1701814576.1094224,"msg":"Checking connection health","controller":"connection","controllerGroup":"mariadb.mmontes.io","controllerKind":"Connection","Connection":{"name":"mariadb-primary","namespace":"databases"},"namespace":"databases","name":"mariadb-primary","reconcileID":"56d6a029-d83b-41e8-a5d1-0824b061f9d1"}
{"level":"debug","ts":1701814576.3680446,"msg":"Checking connection health","controller":"connection","controllerGroup":"mariadb.mmontes.io","controllerKind":"Connection","Connection":{"name":"mariadb-secondary","namespace":"databases"},"namespace":"databases","name":"mariadb-secondary","reconcileID":"31f41e37-6244-4822-a4d1-42c4e49ff995"}
{"level":"debug","ts":1701814576.453957,"msg":"Checking connection health","controller":"connection","controllerGroup":"mariadb.mmontes.io","controllerKind":"Connection","Connection":{"name":"mariadb","namespace":"databases"},"namespace":"databases","name":"mariadb","reconcileID":"e631959a-cc62-40fe-b73f-b4d7547b20c4"}
Well I was able to stop it from erroring constantly by removing the certManager
values, so it's just:
clusterName: "newcluster.local"
ha:
enabled: true
However, after changing the spec.image
from 11.0.3
to 11.2.2
, I just get the following error over and over again as it tries to do a rolling update...
2023-12-05 23:53:10+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:11.2.2+maria~ubu2204 started.
2023-12-05 23:53:11+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2023-12-05 23:53:11+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:11.2.2+maria~ubu2204 started.
2023-12-05 23:53:11+00:00 [Note] [Entrypoint]: MariaDB upgrade information missing, assuming required
2023-12-05 23:53:11+00:00 [Note] [Entrypoint]: MariaDB upgrade (mariadb-upgrade) required, but skipped due to $MARIADB_AUTO_UPGRADE setting
2023-12-05 23:53:11 0 [Note] Starting MariaDB 11.2.2-MariaDB-1:11.2.2+maria~ubu2204 source revision 929532a9426d085111c24c63de9c23cc54382259 as process 1
2023-12-05 23:53:11 0 [Note] WSREP: Loading provider /usr/lib/galera/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
2023-12-05 23:53:11 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2023-12-05 23:53:11 0 [Note] WSREP: wsrep_load(): Galera 26.4.16(r7dce5149) by Codership Oy <info@codership.com> loaded successfully.
2023-12-05 23:53:11 0 [Note] WSREP: Initializing allowlist service v1
2023-12-05 23:53:11 0 [Note] WSREP: Initializing event service v1
2023-12-05 23:53:11 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2023-12-05 23:53:11 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 0
2023-12-05 23:53:11 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: 82507f11-767c-11ee-b214-532d45cfffd9
Seqno: -1 - -1
Offset: -1
Synced: 0
2023-12-05 23:53:11 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 82507f11-767c-11ee-b214-532d45cfffd9, offset: -1
2023-12-05 23:53:11 0 [Note] WSREP: GCache::RingBuffer initial scan... 0.0% ( 0/134217752 bytes) complete.
2023-12-05 23:53:11 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
2023-12-05 23:53:11 0 [Note] WSREP: Recovering GCache ring buffer: Recovery failed, need to do full reset.
2023-12-05 23:53:11 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = mariadb-4.mariadb-internal.databases.svc.newcluster.local; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.keep_plaintext_size = 128M; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.fc_single_primary = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0
2023-12-05 23:53:11 0 [Note] WSREP: Start replication
2023-12-05 23:53:11 0 [Note] WSREP: Connecting with bootstrap option: 0
2023-12-05 23:53:11 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2023-12-05 23:53:11 0 [Note] WSREP: protonet asio version 0
2023-12-05 23:53:11 0 [Note] WSREP: Using CRC-32C for message checksums.
2023-12-05 23:53:11 0 [Note] WSREP: backend: asio
2023-12-05 23:53:11 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2023-12-05 23:53:11 0 [Note] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2023-12-05 23:53:11 0 [Note] WSREP: restore pc from disk failed
2023-12-05 23:53:11 0 [Note] WSREP: GMCast version 0
2023-12-05 23:53:11 0 [Note] WSREP: (72a61da0-93c4, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2023-12-05 23:53:11 0 [Note] WSREP: (72a61da0-93c4, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2023-12-05 23:53:11 0 [Note] WSREP: EVS version 1
2023-12-05 23:53:11 0 [Note] WSREP: gcomm: connecting to group 'mariadb-operator', peer 'mariadb-0.mariadb-internal.databases.svc.newcluster.local:,mariadb-1.mariadb-internal.databases.svc.newcluster.local:,mariadb-2.mariadb-internal.databases.svc.newcluster.local:,mariadb-3.mariadb-internal.databases.svc.newcluster.local:,mariadb-4.mariadb-internal.databases.svc.newcluster.local:'
2023-12-05 23:53:11 0 [Note] WSREP: (72a61da0-93c4, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://10.233.101.135:4567
2023-12-05 23:53:11 0 [Note] WSREP: (72a61da0-93c4, 'tcp://0.0.0.0:4567') connection established to c017ceee-af4a tcp://10.233.69.15:4567
2023-12-05 23:53:11 0 [Note] WSREP: (72a61da0-93c4, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2023-12-05 23:53:11 0 [Note] WSREP: (72a61da0-93c4, 'tcp://0.0.0.0:4567') connection established to 37718ade-864d tcp://10.233.94.155:4567
2023-12-05 23:53:11 0 [Note] WSREP: (72a61da0-93c4, 'tcp://0.0.0.0:4567') connection established to ac774553-8db1 tcp://10.233.91.185:4567
2023-12-05 23:53:11 0 [Note] WSREP: (72a61da0-93c4, 'tcp://0.0.0.0:4567') connection established to a0738f2d-bcd9 tcp://10.233.91.190:4567
2023-12-05 23:53:13 0 [Note] WSREP: EVS version upgrade 0 -> 1
2023-12-05 23:53:13 0 [Note] WSREP: declaring 37718ade-864d at tcp://10.233.94.155:4567 stable
2023-12-05 23:53:13 0 [Note] WSREP: declaring a0738f2d-bcd9 at tcp://10.233.91.190:4567 stable
2023-12-05 23:53:13 0 [Note] WSREP: declaring ac774553-8db1 at tcp://10.233.91.185:4567 stable
2023-12-05 23:53:13 0 [Note] WSREP: declaring c017ceee-af4a at tcp://10.233.69.15:4567 stable
2023-12-05 23:53:13 0 [Note] WSREP: PC protocol upgrade 0 -> 1
2023-12-05 23:53:14 0 [Note] WSREP: Node 37718ade-864d state prim
2023-12-05 23:53:14 0 [Note] WSREP: view(view_id(PRIM,37718ade-864d,383) memb {
37718ade-864d,0
72a61da0-93c4,0
a0738f2d-bcd9,0
ac774553-8db1,0
c017ceee-af4a,0
} joined {
} left {
} partitioned {
})
2023-12-05 23:53:14 0 [Note] WSREP: save pc into disk
2023-12-05 23:53:14 0 [Note] WSREP: gcomm: connected
2023-12-05 23:53:14 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2023-12-05 23:53:14 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2023-12-05 23:53:14 0 [Note] WSREP: Opened channel 'mariadb-operator'
2023-12-05 23:53:14 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 5
2023-12-05 23:53:14 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2023-12-05 23:53:14 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 742e9eca-93c9-11ee-9732-16a8b18858ee
2023-12-05 23:53:14 0 [Note] WSREP: STATE EXCHANGE: got state msg: 742e9eca-93c9-11ee-9732-16a8b18858ee from 0 (mariadb-3)
2023-12-05 23:53:14 0 [Note] WSREP: Initializing config service v1
2023-12-05 23:53:14 0 [Note] WSREP: STATE EXCHANGE: got state msg: 742e9eca-93c9-11ee-9732-16a8b18858ee from 2 (mariadb-1)
2023-12-05 23:53:14 0 [Note] WSREP: STATE EXCHANGE: got state msg: 742e9eca-93c9-11ee-9732-16a8b18858ee from 3 (mariadb-0)
2023-12-05 23:53:14 0 [Note] WSREP: STATE EXCHANGE: got state msg: 742e9eca-93c9-11ee-9732-16a8b18858ee from 4 (mariadb-2)
2023-12-05 23:53:14 1 [Note] WSREP: Starting rollbacker thread 1
2023-12-05 23:53:14 2 [Note] WSREP: Starting applier thread 2
2023-12-05 23:53:14 0 [Note] WSREP: Deinitializing config service v1
2023-12-05 23:53:14 0 [Note] WSREP: STATE EXCHANGE: got state msg: 742e9eca-93c9-11ee-9732-16a8b18858ee from 1 (mariadb-4)
2023-12-05 23:53:14 0 [Note] WSREP: Quorum results:
version = 6,
component = PRIMARY,
conf_id = 358,
members = 4/5 (joined/total),
act_id = 13314,
last_appl. = 13202,
protocols = 2/10/4 (gcs/repl/appl),
vote policy= 0,
group UUID = 82507f11-767c-11ee-b214-532d45cfffd9
2023-12-05 23:53:14 0 [Note] WSREP: Flow-control interval: [36, 36]
2023-12-05 23:53:14 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 13315)
2023-12-05 23:53:14 2 [Note] WSREP: ####### processing CC 13315, local, ordered
2023-12-05 23:53:14 2 [Note] WSREP: Process first view: 82507f11-767c-11ee-b214-532d45cfffd9 my uuid: 72a61da0-93c9-11ee-93c4-479fd91a36f4
2023-12-05 23:53:14 2 [Note] WSREP: Server mariadb-4 connected to cluster at position 82507f11-767c-11ee-b214-532d45cfffd9:13315 with ID 72a61da0-93c9-11ee-93c4-479fd91a36f4
2023-12-05 23:53:14 2 [Note] WSREP: Server status change disconnected -> connected
2023-12-05 23:53:14 2 [Note] WSREP: ####### My UUID: 72a61da0-93c9-11ee-93c4-479fd91a36f4
2023-12-05 23:53:14 2 [Note] WSREP: Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 10), state transfer needed: yes
2023-12-05 23:53:14 0 [Note] WSREP: Service thread queue flushed.
2023-12-05 23:53:14 2 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: -1
2023-12-05 23:53:14 2 [Note] WSREP: State transfer required:
Group state: 82507f11-767c-11ee-b214-532d45cfffd9:13315
Local state: 00000000-0000-0000-0000-000000000000:-1
2023-12-05 23:53:14 2 [Note] WSREP: Server status change connected -> joiner
2023-12-05 23:53:14 0 [Note] WSREP: Joiner monitor thread started to monitor
2023-12-05 23:53:14 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address 'mariadb-4.mariadb-internal.databases.svc.newcluster.local' --datadir '/var/lib/mysql/' --parent 1 --progress 0'
WSREP_SST: [INFO] mariabackup SST started on joiner (20231205 23:53:14.665)
WSREP_SST: [INFO] SSL configuration: CA='', CAPATH='', CERT='', KEY='', MODE='DISABLED', encrypt='0' (20231205 23:53:14.721)
WSREP_SST: [INFO] Progress reporting tool pv not found in path: /usr//bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/sbin:/usr/bin:/sbin:/bin (20231205 23:53:14.856)
WSREP_SST: [INFO] Disabling all progress/rate-limiting (20231205 23:53:14.859)
WSREP_SST: [INFO] Streaming with mbstream (20231205 23:53:14.886)
WSREP_SST: [INFO] Using socat as streamer (20231205 23:53:14.890)
WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql/sst_in_progress (20231205 23:53:14.895)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20231205 23:53:14.960)
2023-12-05 23:53:15 0 [Note] WSREP: (72a61da0-93c4, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20231205 23:53:15.972)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20231205 23:53:16.983)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20231205 23:53:17.994)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20231205 23:53:19.006)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20231205 23:53:20.016)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20231205 23:53:21.027)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20231205 23:53:22.038)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20231205 23:53:23.050)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20231205 23:53:24.061)
WSREP_SST: [ERROR] previous SST script still running. (20231205 23:53:24.064)
2023-12-05 23:53:24 0 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_mariabackup --role 'joiner' --address 'mariadb-4.mariadb-internal.databases.svc.newcluster.local' --datadir '/var/lib/mysql/' --parent 1 --progress 0
Read: '(null)'
2023-12-05 23:53:24 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address 'mariadb-4.mariadb-internal.databases.svc.newcluster.local' --datadir '/var/lib/mysql/' --parent 1 --progress 0: 114 (Operation already in progress)
2023-12-05 23:53:24 2 [ERROR] WSREP: Failed to prepare for 'mariabackup' SST. Unrecoverable.
2023-12-05 23:53:24 2 [ERROR] WSREP: SST request callback failed. This is unrecoverable, restart required.
2023-12-05 23:53:24 2 [Note] WSREP: ReplicatorSMM::abort()
2023-12-05 23:53:24 2 [Note] WSREP: Closing send monitor...
2023-12-05 23:53:24 2 [Note] WSREP: Closed send monitor.
2023-12-05 23:53:24 2 [Note] WSREP: gcomm: terminating thread
2023-12-05 23:53:24 2 [Note] WSREP: gcomm: joining thread
2023-12-05 23:53:24 2 [Note] WSREP: gcomm: closing backend
2023-12-05 23:53:24 2 [Note] WSREP: view(view_id(NON_PRIM,37718ade-864d,383) memb {
72a61da0-93c4,0
} joined {
} left {
} partitioned {
37718ade-864d,0
a0738f2d-bcd9,0
ac774553-8db1,0
c017ceee-af4a,0
})
2023-12-05 23:53:24 2 [Note] WSREP: PC protocol downgrade 1 -> 0
2023-12-05 23:53:24 2 [Note] WSREP: view((empty))
2023-12-05 23:53:24 2 [Note] WSREP: gcomm: closed
2023-12-05 23:53:24 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2023-12-05 23:53:24 0 [Note] WSREP: Flow-control interval: [16, 16]
2023-12-05 23:53:24 0 [Note] WSREP: Received NON-PRIMARY.
2023-12-05 23:53:24 0 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 13315)
2023-12-05 23:53:24 0 [Note] WSREP: New SELF-LEAVE.
2023-12-05 23:53:24 0 [Note] WSREP: Flow-control interval: [0, 0]
2023-12-05 23:53:24 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2023-12-05 23:53:24 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 13315)
2023-12-05 23:53:24 0 [Note] WSREP: RECV thread exiting 0: Success
2023-12-05 23:53:24 2 [Note] WSREP: recv_thread() joined.
2023-12-05 23:53:24 2 [Note] WSREP: Closing replication queue.
2023-12-05 23:53:24 2 [Note] WSREP: Closing slave action queue.
2023-12-05 23:53:24 2 [Note] WSREP: mariadbd: Terminated.
231205 23:53:24 [ERROR] mysqld got signal 11 ;
Sorry, we probably made a mistake, and this is a bug.
Your assistance in bug reporting will enable us to fix this for the next release.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 11.2.2-MariaDB-1:11.2.2+maria~ubu2204 source revision: 929532a9426d085111c24c63de9c23cc54382259
key_buffer_size=0
read_buffer_size=131072
max_used_connections=0
max_threads=153
thread_count=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 337017 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7f6394000c68
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f63b81f9c68 thread_stack 0x49000
Printing to addr2line failed
mariadbd(my_print_stacktrace+0x32)[0x55ffec5f2032]
mariadbd(handle_fatal_signal+0x478)[0x55ffec0c6158]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f63b9740520]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x178)[0x7f63b9726898]
/usr/lib/galera/libgalera_smm.so(+0x157602)[0x7f63b91c2602]
/usr/lib/galera/libgalera_smm.so(+0x700e1)[0x7f63b90db0e1]
/usr/lib/galera/libgalera_smm.so(+0x6cc94)[0x7f63b90d7c94]
/usr/lib/galera/libgalera_smm.so(+0x8b311)[0x7f63b90f6311]
/usr/lib/galera/libgalera_smm.so(+0x604a0)[0x7f63b90cb4a0]
/usr/lib/galera/libgalera_smm.so(+0x48261)[0x7f63b90b3261]
mariadbd(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x55ffec6b0b02]
mariadbd(+0xd7ff31)[0x55ffec383f31]
mariadbd(_Z15start_wsrep_THDPv+0x26b)[0x55ffec371cfb]
mariadbd(+0xcf24c6)[0x55ffec2f64c6]
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f63b9792ac3]
/lib/x86_64-linux-gnu/libc.so.6(+0x126a40)[0x7f63b9824a40]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): (null)
Connection ID (thread ID): 2
Status: NOT_KILLED
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=on,cset_narrowing=off
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains
information that should help you find out what is causing the crash.
We think the query pointer is invalid, but we will try to print it anyway.
Query:
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 0 bytes
Max resident set unlimited unlimited bytes
Max processes unlimited unlimited processes
Max open files 65535 65535 files
Max locked memory unlimited unlimited bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 128442 128442 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Core pattern: core
Kernel version: Linux version 5.10.0-25-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.191-1 (2023-08-16)
Hey
Is it possible that during reconcile, the controller is "erroring" as it's waiting for a cert from certmanager (couldn't find the certmanager code, my fault lol) and then creating the key here?
Our cert-controller is not deployed if cert-manager is enabled since they serve the same purpose, so there is no we they can clash.
I'm really not sure why it's just creating CertificateRequest after CertificateRequest, instead of waiting for certmanager to pick them up?
As far as I know, cert-manager tracks attemtps to renew a Certificate
object in `CertificateRequests, therefore what may be happening is that cert-manager is considering your certificate outdated or invalid. This is most likely a cert-manager issue with your installation I would say. Maybe something to report upstream?:
However, after changing the spec.image from 11.0.3 to 11.2.2, I just get the following error over and over again as it tries to do a rolling update...
Can we handle the Galera issue separately in another issue? Also, did you have the chance to look at the troubleshooting guide?:
Understood, I use cert-manager throughout my environment, so not sure why it doesn’t play well here.
The solution I had regarding the webhook was just to avoid use cert-manager within the Helm release since it was constantly looping :) so I think we’re good to close this issue now! Unless you want to debug the cert-manager issue?
Unless you want to debug the cert-manager issue?
Leave it open, I will try to reproduce it.
One possibility could be that you are in an intermediate state where the Secrets used by the cert-controller and managed by helm are still in the cluster. By default they are empty and they are named the same as the ones generated by cert-managed, something that might be creating conflicts. Could you confirm if you see them?:
Sure! So with the following values:
clusterName: "newcluster.local"
ha:
enabled: true
logLevel: DEBUG
webhook:
cert:
certManager:
enabled: true
I see that the deployment is in that "loop" of Secrets vs. Certificates:
I see the following resources:
With the mariadb-op-mariadb-operator-webhook-cert
being recreated over and over again. It has the following values within the Secret:
With the mariadb-operator-webhook-ca
having the following:
While this is going on, the Certificate
resource named mariadb-op-mariadb-operator-webhook-cert
also now exists:
Let me know if there's anything else you would like to see!
Interestingly enough too, I'm getting the following "errors" on my cluster:
This issue is stale because it has been open 30 days with no activity.
Up to you, if you want to keep this open or not @mmontes11. I had to just nuke this DB (backup, destroy, restore) and start from scratch. After doing so, this issue went away more or less...
Same issue here. After reboot of some nodes it ends up that only one of three galera nodes are up. This is highly concerning to me. To me it seems that if in case of reboot the webhook is not available it stops working even if the webhook is abailable again shortly after.
I ran across what I think is the same issue, when enabling webhook.cert.certManager.enabled
, it repeatedly attempted to request certs, creating/delete secrets. Brand new cluster, so I deleted both cert-manager and mariadb operator and it kept happening.
Config's (doing the app-of-apps model):
cert-manager.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cert-manager
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: {{ .Values.spec.source.repoURL }}
path: src/cert-manager
targetRevision: {{ .Values.spec.source.targetRevision }}
destination:
server: {{ .Values.spec.destination.server }}
namespace: cert-manager
syncPolicy:
syncOptions:
- CreateNamespace=true
automated:
selfHeal: true
prune: true
maraiadb-operator.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: mariadb-operator
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
chart: mariadb-operator
repoURL: https://mariadb-operator.github.io/mariadb-operator
targetRevision: ">=0"
helm:
releaseName: mariadb-operator
parameters:
- name: "webhook.cert.certManager.enabled"
value: "true"
destination:
server: {{ .Values.spec.destination.server }}
namespace: mariadb-operator
syncPolicy:
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
# https://github.com/argoproj/argo-cd/issues/820#issuecomment-1246960210
# May also just need to force replace the failed-to-launch.
automated:
selfHeal: true
prune: true
Since there doesn't seem to be some sort of conclusive answer as to what was going on I turned off the use of cert-manager.
Versions:
Hey @ShakataGaNai ! Thanks for reporting, I still didn't have too much time to reproduce this sorry. So far I didn't manage to reproduce this with flux.
I have 3 possible investigation paths:
ca.crt
in the secret. Therefore the webhook can't trust the connectionHappy to hear your thoughts!
In regards to point 2.
from my previous comment, this might be related:
Would be great to hear from an ArgoCD expert.
This is related to the helm chart bug https://github.com/mariadb-operator/mariadb-operator/issues/375 @ShakataGaNai you can remove the prune: true syncPolicy as a temp workaround, caveat is that argocd will look as it is out of sync but webhook will work.
We have just merged @jescarri's PR with a fix for this:
Closing! This will be released in v0.0.26
this week, feel free to reopen. Pleas consider @jescarri's advice about ArgoCD:
https://github.com/mariadb-operator/mariadb-operator/issues/285#issuecomment-1939536676
I also had to add the following to the values of my Helm deployment when using certManager:
webhook:
cert:
secretLabels:
key1: value1
To have it stop complaining about a null
value not being allowed for spec.SecretTemplate
. This commit led me to that resolution.
Hi there,
Sorry to be a bother again, I know that #267 exists - and I believe this is related but not exactly the same issue? I updated to the latest operator (released today), but it appears as though I'm still having the same issue:
This occurs when I try to change
spec.image
for my activeMariaDB
resource. Is there a way to debug/workaround this error? I don't believe the error is specific to Galera. I installed the operator via Helm quite a while ago...If there's any other information I can provide, please let me know :)