Closed hjkkhj123 closed 4 years ago
@hjkkhj123 Thank you for the issue.
Can you describe katib-mysql
pod please.
Which version of Katib are you using?
Which manifest did you use when installing Kubeflow?
@hjkkhj123 I think you may mount a hostPath in the MySQL instance and there are some existing files for older version MySQL in the path. Can you clean up the storage (PV) and have another try?
At first sorry for my bad english
@andreyvelich
here is kubectl describe pods katib-mysql
Name: katib-mysql-dcf7dcbd5-8cb96
Namespace: kubeflow
Priority: 0
Node: neptune/192.168.100.14
Start Time: Mon, 20 Apr 2020 11:01:34 +0900
Labels: app=katib
app.kubernetes.io/component=katib
app.kubernetes.io/instance=katib-controller-0.8.0
app.kubernetes.io/managed-by=kfctl
app.kubernetes.io/name=katib-controller
app.kubernetes.io/part-of=kubeflow
app.kubernetes.io/version=0.8.0
component=mysql
pod-template-hash=dcf7dcbd5
Annotations: sidecar.istio.io/inject: false
Status: Running
IP: 10.10.1.171
Controlled By: ReplicaSet/katib-mysql-dcf7dcbd5
Containers:
katib-mysql:
Container ID: docker://e579297bcac51d1dee57634d2ed8065c8e087cc18d1b5599ff97c2eb1744e30f
Image: mysql:8
Image ID: docker-pullable://mysql@sha256:b69d0b62d02ee1eba8c7aeb32eba1bb678b6cfa4ccfb211a5d7931c7755dc4a8
Port: 3306/TCP
Host Port: 0/TCP
Args:
--datadir
/var/lib/mysql/datadir
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 20 Apr 2020 11:04:46 +0900
Finished: Mon, 20 Apr 2020 11:04:53 +0900
Ready: False
Restart Count: 3
Liveness: exec [/bin/bash -c mysqladmin ping -u root -p${MYSQL_ROOT_PASSWORD}] delay=30s timeout=5s period=10s #success=1 #failure=3
Readiness: exec [/bin/bash -c mysql -D ${MYSQL_DATABASE} -u root -p${MYSQL_ROOT_PASSWORD} -e 'SELECT 1'] delay=5s timeout=1s period=10s #success=1 #failure=3
Environment:
MYSQL_ROOT_PASSWORD: <set to the key 'MYSQL_ROOT_PASSWORD' in secret 'katib-mysql-secrets'> Optional: false
MYSQL_ALLOW_EMPTY_PASSWORD: true
MYSQL_DATABASE: katib
Mounts:
/var/lib/mysql from katib-mysql (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-l4b9v (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
katib-mysql:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: katib-mysql
ReadOnly: false
default-token-l4b9v:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-l4b9v
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m41s (x2 over 3m41s) default-scheduler persistentvolumeclaim "katib-mysql" not found
Warning FailedScheduling 3m39s (x2 over 3m39s) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
Normal Scheduled 3m36s default-scheduler Successfully assigned kubeflow/katib-mysql-dcf7dcbd5-8cb96 to neptune
Normal SuccessfulAttachVolume 3m36s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-4481d2c9-4dae-4914-aaee-4397d6f02c4a"
Normal Killing 2m31s kubelet, neptune Container katib-mysql failed liveness probe, will be restarted
Normal Pulled 2m1s (x2 over 3m23s) kubelet, neptune Container image "mysql:8" already present on machine
Normal Created 2m (x2 over 3m22s) kubelet, neptune Created container katib-mysql
Normal Started 119s (x2 over 3m21s) kubelet, neptune Started container katib-mysql
Warning Unhealthy 71s (x12 over 3m11s) kubelet, neptune Readiness probe failed: mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
Warning Unhealthy 71s (x5 over 2m51s) kubelet, neptune Liveness probe failed: mysqladmin: [Warning] Using a password on the command line interface can be insecure.
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
Warning Unhealthy 54s kubelet, neptune Liveness probe failed: mysqladmin: [Warning] Using a password on the command line interface can be insecure.
mysqladmin: connect to server at 'localhost' failed
error: 'Lost connection to MySQL server at 'reading initial communication packet', system error: 104'
And I use https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_k8s_istio.v1.0.1.yaml
this menifast
@gaocegege I user rook/ceph for PV and there is no old PV I tried reinstall kubeflow more than 3times every time i check to delete old pv or configs
Thank you for helping me
@andreyvelich @gaocegege I found the reason maybe? kubeflow/kubeflow#4864 I see this issue and this is really similar to me I tried to use hostpath as PV and it works but I wanna use my storageclass via ceph how can I solve this issue?
Thank you
@hjkkhj123 I am not sure that you can use rook/ceph as PV for mysql image. Did you check documentation, if mysql works with this sort of volume?
@andreyvelich Finally, I found solution this is just error with probe I just deploy simple mysql pod with rook/ceph PV it works so i change the probe interval and threshold it works fine now thank you for helping
@andreyvelich - having same issue, how did you change the probe interval and threshhold ?
@andreyvelich - having same issue, how did you change the probe interval and threshhold ?
i think it's depend on your server spec
i just try to increase liveness probe and readiness probe 2times
我的情况不太一样,发现有两个版本mysql, 直接把8降低为5.7 ,
k3s kubectl edit deploy katib-mysql -nkubeflow
把image: mysql:8 ---> gcr.io/ml-pipeline/mysql
[root@dl01 manifests-1.3.0]# docker image ls |grep mysql
mysql 8 76152be68449 10 days ago 524MB
gcr.io/ml-pipeline/mysql 5.7 f8fcde8
``` 然后删除旧的pvc(katib-mysql)下的数据,删除katib-mysql, katib-db-manager pods, 自动重启pod,就恢复了正常。
I think it's specific to MySQL 8.0 issue, there is some plugin in the mysql config file. We can remove the plugin config and start MySQL 8.0, after 1st startup, we can add the plugin back and start again, it should start normally at the 2nd time.
/kind bug
What steps did you take and what happened: [A clear and concise description of what the bug is.]
i just install kubeflow in manual
What did you expect to happen:
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
kubectl version
):/etc/os-release
):