kubeflow / katib

Automated Machine Learning on Kubernetes
https://www.kubeflow.org/docs/components/katib
Apache License 2.0
1.51k stars 442 forks source link

katib-mysql does not work #1156

Closed hjkkhj123 closed 4 years ago

hjkkhj123 commented 4 years ago

/kind bug

What steps did you take and what happened: [A clear and concise description of what the bug is.]

i just install kubeflow in manual

What did you expect to happen:

2020-04-17 06:28:38+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.19-1debian10 started.
2020-04-17 06:28:38+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2020-04-17 06:28:38+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.19-1debian10 started.
2020-04-17T06:28:38.815249Z 0 [Warning] [MY-011070] [Server] 'Disabling symbolic links using --skip-symbolic-links (or equivalent) is the default. Consider not using this option as it' is deprecated and will be removed in a future release.
2020-04-17T06:28:38.815363Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.19) starting as process 1
mysqld: Table 'mysql.plugin' doesn't exist
2020-04-17T06:28:43.673845Z 0 [ERROR] [MY-010735] [Server] Could not open the mysql.plugin table. Please perform the MySQL upgrade procedure.
2020-04-17T06:28:44.007568Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2020-04-17T06:28:44.391089Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2020-04-17T06:28:44.509101Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2020-04-17T06:28:44.542561Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.
2020-04-17T06:28:44.543088Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables
2020-04-17T06:28:44.543392Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we're sending the information to the error-log instead: MY-001146 - Table 'mysql.component' doesn't exist
2020-04-17T06:28:44.543560Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we're sending the information to the error-log instead: MY-003543 - The mysql.component table is missing or has an incorrect definition.
2020-04-17T06:28:44.544225Z 0 [ERROR] [MY-010326] [Server] Fatal error: Can't open and lock privilege tables: Table 'mysql.user' doesn't exist
2020-04-17T06:28:44.544478Z 0 [ERROR] [MY-010952] [Server] The privilege system failed to initialize correctly. For complete instructions on how to upgrade MySQL to a new version please see the 'Upgrading MySQL' section from the MySQL manual.
2020-04-17T06:28:44.545053Z 0 [ERROR] [MY-010119] [Server] Aborting
2020-04-17T06:28:47.609375Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.19)  MySQL Community Server - GPL.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

andreyvelich commented 4 years ago

@hjkkhj123 Thank you for the issue. Can you describe katib-mysql pod please. Which version of Katib are you using? Which manifest did you use when installing Kubeflow?

gaocegege commented 4 years ago

@hjkkhj123 I think you may mount a hostPath in the MySQL instance and there are some existing files for older version MySQL in the path. Can you clean up the storage (PV) and have another try?

hjkkhj123 commented 4 years ago

At first sorry for my bad english

@andreyvelich here is kubectl describe pods katib-mysql

Name:           katib-mysql-dcf7dcbd5-8cb96
Namespace:      kubeflow
Priority:       0
Node:           neptune/192.168.100.14
Start Time:     Mon, 20 Apr 2020 11:01:34 +0900
Labels:         app=katib
                app.kubernetes.io/component=katib
                app.kubernetes.io/instance=katib-controller-0.8.0
                app.kubernetes.io/managed-by=kfctl
                app.kubernetes.io/name=katib-controller
                app.kubernetes.io/part-of=kubeflow
                app.kubernetes.io/version=0.8.0
                component=mysql
                pod-template-hash=dcf7dcbd5
Annotations:    sidecar.istio.io/inject: false
Status:         Running
IP:             10.10.1.171
Controlled By:  ReplicaSet/katib-mysql-dcf7dcbd5
Containers:
  katib-mysql:
    Container ID:  docker://e579297bcac51d1dee57634d2ed8065c8e087cc18d1b5599ff97c2eb1744e30f
    Image:         mysql:8
    Image ID:      docker-pullable://mysql@sha256:b69d0b62d02ee1eba8c7aeb32eba1bb678b6cfa4ccfb211a5d7931c7755dc4a8
    Port:          3306/TCP
    Host Port:     0/TCP
    Args:
      --datadir
      /var/lib/mysql/datadir
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 20 Apr 2020 11:04:46 +0900
      Finished:     Mon, 20 Apr 2020 11:04:53 +0900
    Ready:          False
    Restart Count:  3
    Liveness:       exec [/bin/bash -c mysqladmin ping -u root -p${MYSQL_ROOT_PASSWORD}] delay=30s timeout=5s period=10s #success=1 #failure=3
    Readiness:      exec [/bin/bash -c mysql -D ${MYSQL_DATABASE} -u root -p${MYSQL_ROOT_PASSWORD} -e 'SELECT 1'] delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:
      MYSQL_ROOT_PASSWORD:         <set to the key 'MYSQL_ROOT_PASSWORD' in secret 'katib-mysql-secrets'>  Optional: false
      MYSQL_ALLOW_EMPTY_PASSWORD:  true
      MYSQL_DATABASE:              katib
    Mounts:
      /var/lib/mysql from katib-mysql (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-l4b9v (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  katib-mysql:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  katib-mysql
    ReadOnly:   false
  default-token-l4b9v:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-l4b9v
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                    From                     Message
  ----     ------                  ----                   ----                     -------
  Warning  FailedScheduling        3m41s (x2 over 3m41s)  default-scheduler        persistentvolumeclaim "katib-mysql" not found
  Warning  FailedScheduling        3m39s (x2 over 3m39s)  default-scheduler        pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
  Normal   Scheduled               3m36s                  default-scheduler        Successfully assigned kubeflow/katib-mysql-dcf7dcbd5-8cb96 to neptune
  Normal   SuccessfulAttachVolume  3m36s                  attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-4481d2c9-4dae-4914-aaee-4397d6f02c4a"
  Normal   Killing                 2m31s                  kubelet, neptune         Container katib-mysql failed liveness probe, will be restarted
  Normal   Pulled                  2m1s (x2 over 3m23s)   kubelet, neptune         Container image "mysql:8" already present on machine
  Normal   Created                 2m (x2 over 3m22s)     kubelet, neptune         Created container katib-mysql
  Normal   Started                 119s (x2 over 3m21s)   kubelet, neptune         Started container katib-mysql
  Warning  Unhealthy               71s (x12 over 3m11s)   kubelet, neptune         Readiness probe failed: mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
  Warning  Unhealthy  71s (x5 over 2m51s)  kubelet, neptune  Liveness probe failed: mysqladmin: [Warning] Using a password on the command line interface can be insecure.
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
  Warning  Unhealthy  54s  kubelet, neptune  Liveness probe failed: mysqladmin: [Warning] Using a password on the command line interface can be insecure.
mysqladmin: connect to server at 'localhost' failed
error: 'Lost connection to MySQL server at 'reading initial communication packet', system error: 104'

And I use https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_k8s_istio.v1.0.1.yaml this menifast

@gaocegege I user rook/ceph for PV and there is no old PV I tried reinstall kubeflow more than 3times every time i check to delete old pv or configs

Thank you for helping me

hjkkhj123 commented 4 years ago

@andreyvelich @gaocegege I found the reason maybe? kubeflow/kubeflow#4864 I see this issue and this is really similar to me I tried to use hostpath as PV and it works but I wanna use my storageclass via ceph how can I solve this issue?

Thank you

andreyvelich commented 4 years ago

@hjkkhj123 I am not sure that you can use rook/ceph as PV for mysql image. Did you check documentation, if mysql works with this sort of volume?

hjkkhj123 commented 4 years ago

@andreyvelich Finally, I found solution this is just error with probe I just deploy simple mysql pod with rook/ceph PV it works so i change the probe interval and threshold it works fine now thank you for helping

mavencode01 commented 3 years ago

@andreyvelich - having same issue, how did you change the probe interval and threshhold ?

hjkkhj123 commented 3 years ago

@andreyvelich - having same issue, how did you change the probe interval and threshhold ?

i think it's depend on your server spec

i just try to increase liveness probe and readiness probe 2times

xwyangjshb commented 2 years ago

我的情况不太一样,发现有两个版本mysql, 直接把8降低为5.7 ,


k3s kubectl edit deploy katib-mysql -nkubeflow
把image: mysql:8 ---> gcr.io/ml-pipeline/mysql

[root@dl01 manifests-1.3.0]# docker image ls |grep mysql
mysql                                                                           8                                                 76152be68449   10 days ago     524MB
gcr.io/ml-pipeline/mysql                                                        5.7                                               f8fcde8

``` 然后删除旧的pvc(katib-mysql)下的数据,删除katib-mysql, katib-db-manager pods, 自动重启pod,就恢复了正常。
nkwangleiGIT commented 1 year ago

I think it's specific to MySQL 8.0 issue, there is some plugin in the mysql config file. We can remove the plugin config and start MySQL 8.0, after 1st startup, we can add the plugin back and start again, it should start normally at the 2nd time.