Open d0x2f opened 5 years ago
I've just run a test using a ReadWriteMany PVC using an nfs provisioner and there have been no crashes so far. I'd guess ReadWriteMany is indeed required then.
Throw out my ReadWriteMany theory, I'm still getting crashes, log attached.
Can you have a look at the events on the deployment/pod? Seems like you're getting a SIGABRT sent to the pod.
The crash may be coinciding with a node scale up event, here're the events of a pod that crashed.
7m41s Warning FailedScheduling Pod pod has unbound immediate PersistentVolumeClaims (repeated 4 times)
7m41s Normal Scheduled Pod Successfully assigned alchemy/alchemy-database-1 to gke-cluster-node-pool-e9917db4-c8r6
7m34s Normal TriggeredScaleUp Pod pod triggered scale-up: [{https://content.googleapis.com/compute/v1/projects/development-5893cdfe/zones/australia-southeast1-a/instanceGroups/gke-cluster-node-pool-e9917db4-grp 2->3 (max: 3)}]
7m33s Normal SuccessfulAttachVolume Pod AttachVolume.Attach succeeded for volume "pvc-21f9162d-43f1-11e9-8756-42010a00000a"
7m22s Normal Pulling Pod pulling image "mysql/mysql-server:8.0.12"
7m9s Normal Pulled Pod Successfully pulled image "mysql/mysql-server:8.0.12"
5m45s Normal Created Pod Created container
5m45s Normal Started Pod Started container
7m5s Normal Pulling Pod pulling image "iad.ocir.io/oracle/mysql-agent:0.3.0"
6m38s Normal Pulled Pod Successfully pulled image "iad.ocir.io/oracle/mysql-agent:0.3.0"
6m34s Normal Created Pod Created container
6m34s Normal Started Pod Started container
5m45s Normal Pulled Pod Container image "mysql/mysql-server:8.0.12" already present on machine
2m18s Warning Unhealthy Pod Readiness probe failed: HTTP probe failed with statuscode: 503
5m58s Warning BackOff Pod Back-off restarting failed container
I'm using the smallest instance type n1-standard-1 so scaling happens often.
More then likely the case.
@d0x2f have you been able to resolve?
Unfortunately not, I've switched to larger images and a larger minimum node pool but I'm still getting crashes even without a node scaling event.
I'd be curious if anyone else is able to reproduce this. I don't believe there's anything special about my gke setup.
I had a similar problem when I increased the number of members in YAML below.
apiVersion: mysql.oracle.com/v1alpha1
kind: Cluster
metadata:
name: mysql
spec:
members: 5 <--- Increase from 3 to 5
config:
name: mycnf
rootPasswordSecret:
name: mysql-root-user-secret
volumeClaimTemplate:
metadata:
name: data
spec:
storageClassName: oci
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 41m default-scheduler Successfully assigned mysql-operator/mysql-3 to 10.0.3.4
Normal SuccessfulAttachVolume 40m attachdetach-controller AttachVolume.Attach succeeded for volume "ocid1.volume.oc1.phx.abyhqljtirxeg574lohvtxzqxerupv7zc725huszogaie6kghydeiz5mqz4a"
Normal Pulled 40m kubelet, 10.0.3.4 Container image "iad.ocir.io/oracle/mysql-agent:0.3.0" already present on machine
Normal Created 40m kubelet, 10.0.3.4 Created container
Normal Started 40m kubelet, 10.0.3.4 Started container
Normal Pulled 39m (x4 over 40m) kubelet, 10.0.3.4 Container image "mysql/mysql-server:8.0.12" already present on machine
Normal Created 39m (x4 over 40m) kubelet, 10.0.3.4 Created container
Normal Started 39m (x4 over 40m) kubelet, 10.0.3.4 Started container
Warning Unhealthy 5m27s (x211 over 40m) kubelet, 10.0.3.4 Readiness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 28s (x182 over 40m) kubelet, 10.0.3.4 Back-off restarting failed container
I have the same issue, Do anyone has a solution for that?
Is this a BUG REPORT or FEATURE REQUEST?
Choose one: BUG REPORT
Versions
MySQL Operator Version: helm chart master (c98210b2c7b176befa00aa0751db184088adfc39) Values.image.tag 0.3.0
Environment:
kubectl version
):What happened?
When producing a cluster stateful set, often one of the pods enters a crash loop with the following logs:
error-log.txt
What you expected to happen?
All pods to start successfully
How to reproduce it (as minimally and precisely as possible)?
Here's my cluster.yaml:
Anything else we need to know?
mysql-operator is installed into the same namespace as the above yaml, "alchemy".
This yaml is based on some of the examples provided in this repo, however I've changed the access mode on the volume claims to ReadWriteOnce because ReadWriteMany isn't supported on GKE out of the box. Perhaps ReadWriteMany is required for mysql-operator?
By following the link at the end of the crash log I found the line:
Also if one pod crashes it'll continue to crash every time it's restarted, but the others remain running with no issue.
All of this makes me think it might be something to do with the access mode but I was under the impression that each pod mounts it's own PV and so ReadWriteOnce should be sufficient.