shikanon / kubeflow-manifests

kubeflow国内一键安装文件
GNU General Public License v3.0
338 stars 117 forks source link

关于数据库的POD启动报错 #61

Open WMeng1 opened 3 years ago

WMeng1 commented 3 years ago

其他的POD都可以启动,相关数据库的katib-db-manager,和katib-mysql会有错误产生,查询log如下:

E0827 03:18:05.755835 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:10.758696 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:15.754750 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:20.756393 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:25.756346 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:30.758046 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:35.758436 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:40.756272 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:45.756977 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:50.754163 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:55.754928 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:19:00.755864 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused F0827 03:19:00.755932 1 main.go:99] Failed to open db connection: DB open failed: Timeout waiting for DB conn successfully opened.

看起来主要是mysql的POD的原因,DB对应的POD连接不上mysql,但是不清楚该如何解决,上述是我在kindest/node:v1.16.9的版本下出现的问题,当使用版本为v1.19.1时,katib-mysql报错如下:

2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27T01:54:20.642846Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.24) starting as process 1 2021-08-27T01:54:20.670463Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T01:54:23.849922Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. mysqld: Table 'mysql.plugin' doesn't exist 2021-08-27T01:54:24.146624Z 0 [ERROR] [MY-010735] [Server] Could not open the mysql.plugin table. Please perform the MySQL upgrade proced 2021-08-27T01:54:24.148013Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.148942Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.149946Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.151631Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.152681Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.153661Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.154611Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.360102Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/k 2021-08-27T01:54:24.675026Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be open 2021-08-27T01:54:26.359732Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be open 2021-08-27T01:54:26.563015Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2021-08-27T01:54:26.563668Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now snnel. 2021-08-27T01:54:26.808613Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the patl OS users. Consider choosing a different directory. 2021-08-27T01:54:26.809436Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:26.810520Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is ae're sending the information to the error-log instead: MY-001146 - Table 'mysql.component' doesn't exist 2021-08-27T01:54:26.810874Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is we're sending the information to the error-log instead: MY-003543 - The mysql.component table is missing or has an incorrect definition. 2021-08-27T01:54:26.811771Z 0 [ERROR] [MY-010326] [Server] Fatal error: Can't open and lock privilege tables: Table 'mysql.user' doesn't 2021-08-27T01:54:26.812089Z 0 [ERROR] [MY-010952] [Server] The privilege system failed to initialize correctly. For complete instructionsSQL to a new version please see the 'Upgrading MySQL' section from the MySQL manual. 2021-08-27T01:54:26.812705Z 0 [ERROR] [MY-010119] [Server] Aborting 2021-08-27T01:54:28.384254Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.24) MySQL Community Server -

望大佬提供一些解决思路,感谢!

xiashenzhen commented 3 years ago

我的报错和上面一样,大佬看看: image katib-db-manager log: E0827 04:22:23.644805 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused E0827 04:22:28.668739 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused E0827 04:22:33.664700 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused E0827 04:22:38.652756 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused E0827 04:22:43.644760 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused E0827 04:22:48.668705 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused E0827 04:22:53.660754 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused E0827 04:22:58.652762 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused E0827 04:23:03.644786 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused E0827 04:23:08.672724 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused E0827 04:23:13.660592 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused E0827 04:23:18.652703 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.249.11:3306: connect: connection refused F0827 04:23:18.652781 1 main.go:99] Failed to open db connection: DB open failed: Timeout waiting for DB conn successfully opened.

katib-mysql log: 2021-08-27 03:41:54+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 03:41:54+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 03:41:54+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27T03:41:55.090175Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.24) starting as process 1 2021-08-27T03:41:55.126744Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T03:42:24.589423Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2021-08-27T03:42:24.910933Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/run/mysqld/mysqlx.sock 2021-08-27T03:42:25.157690Z 0 [ERROR] [MY-011947] [InnoDB] Cannot open '/var/lib/mysql/datadir/ib_buffer_pool' for reading: No such file or directory 2021-08-27T03:42:25.499502Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2021-08-27T03:42:25.500065Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel. 2021-08-27T03:42:25.563667Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.

mysql log:

2021-08-27T03:38:55.396330Z 0 [Note] Event Scheduler: Loaded 0 events 2021-08-27T03:38:55.396640Z 0 [Note] mysqld: ready for connections. Version: '5.7.33' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server (GPL) 2021-08-27T03:52:14.357722Z 4 [Note] Aborted connection 4 to db: 'mlpipeline' user: 'root' host: '127.0.0.1' (Got an error reading communication packets)

xiashenzhen commented 3 years ago

其他的POD都可以启动,相关数据库的katib-db-manager,和katib-mysql会有错误产生,查询log如下:

  • katib-db-manager:

E0827 03:18:05.755835 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:10.758696 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:15.754750 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:20.756393 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:25.756346 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:30.758046 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:35.758436 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:40.756272 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:45.756977 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:50.754163 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:55.754928 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:19:00.755864 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused F0827 03:19:00.755932 1 main.go:99] Failed to open db connection: DB open failed: Timeout waiting for DB conn successfully opened.

  • katib-mysql: 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Initializing database files 2021-08-27T02:31:36.865722Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.24) initializing of server in progress as process 44 2021-08-27T02:31:36.870024Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T02:31:52.754440Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2021-08-27T02:32:53.159102Z 6 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.

看起来主要是mysql的POD的原因,DB对应的POD连接不上mysql,但是不清楚该如何解决,上述是我在kindest/node:v1.16.9的版本下出现的问题,当使用版本为v1.19.1时,katib-mysql报错如下:

2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27T01:54:20.642846Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.24) starting as process 1 2021-08-27T01:54:20.670463Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T01:54:23.849922Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. mysqld: Table 'mysql.plugin' doesn't exist 2021-08-27T01:54:24.146624Z 0 [ERROR] [MY-010735] [Server] Could not open the mysql.plugin table. Please perform the MySQL upgrade proced 2021-08-27T01:54:24.148013Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.148942Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.149946Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.151631Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.152681Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.153661Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.154611Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.360102Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/k 2021-08-27T01:54:24.675026Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be open 2021-08-27T01:54:26.359732Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be open 2021-08-27T01:54:26.563015Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2021-08-27T01:54:26.563668Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now snnel. 2021-08-27T01:54:26.808613Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the patl OS users. Consider choosing a different directory. 2021-08-27T01:54:26.809436Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:26.810520Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is ae're sending the information to the error-log instead: MY-001146 - Table 'mysql.component' doesn't exist 2021-08-27T01:54:26.810874Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is we're sending the information to the error-log instead: MY-003543 - The mysql.component table is missing or has an incorrect definition. 2021-08-27T01:54:26.811771Z 0 [ERROR] [MY-010326] [Server] Fatal error: Can't open and lock privilege tables: Table 'mysql.user' doesn't 2021-08-27T01:54:26.812089Z 0 [ERROR] [MY-010952] [Server] The privilege system failed to initialize correctly. For complete instructionsSQL to a new version please see the 'Upgrading MySQL' section from the MySQL manual. 2021-08-27T01:54:26.812705Z 0 [ERROR] [MY-010119] [Server] Aborting 2021-08-27T01:54:28.384254Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.24) MySQL Community Server -

望大佬提供一些解决思路,感谢!

你部署的时候有没有卡到,我部署的时候,再跑patch第一的删除的时候卡着不动,然后停了之后手动运行的。

WMeng1 commented 3 years ago

其他的POD都可以启动,相关数据库的katib-db-manager,和katib-mysql会有错误产生,查询log如下:

  • katib-db-manager:

E0827 03:18:05.755835 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:10.758696 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:15.754750 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:20.756393 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:25.756346 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:30.758046 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:35.758436 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:40.756272 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:45.756977 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:50.754163 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:55.754928 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:19:00.755864 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused F0827 03:19:00.755932 1 main.go:99] Failed to open db connection: DB open failed: Timeout waiting for DB conn successfully opened.

  • katib-mysql: 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Initializing database files 2021-08-27T02:31:36.865722Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.24) initializing of server in progress as process 44 2021-08-27T02:31:36.870024Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T02:31:52.754440Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2021-08-27T02:32:53.159102Z 6 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.

看起来主要是mysql的POD的原因,DB对应的POD连接不上mysql,但是不清楚该如何解决,上述是我在kindest/node:v1.16.9的版本下出现的问题,当使用版本为v1.19.1时,katib-mysql报错如下: 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27T01:54:20.642846Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.24) starting as process 1 2021-08-27T01:54:20.670463Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T01:54:23.849922Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. mysqld: Table 'mysql.plugin' doesn't exist 2021-08-27T01:54:24.146624Z 0 [ERROR] [MY-010735] [Server] Could not open the mysql.plugin table. Please perform the MySQL upgrade proced 2021-08-27T01:54:24.148013Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.148942Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.149946Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.151631Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.152681Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.153661Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.154611Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.360102Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/k 2021-08-27T01:54:24.675026Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be open 2021-08-27T01:54:26.359732Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be open 2021-08-27T01:54:26.563015Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2021-08-27T01:54:26.563668Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now snnel. 2021-08-27T01:54:26.808613Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the patl OS users. Consider choosing a different directory. 2021-08-27T01:54:26.809436Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:26.810520Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is ae're sending the information to the error-log instead: MY-001146 - Table 'mysql.component' doesn't exist 2021-08-27T01:54:26.810874Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is we're sending the information to the error-log instead: MY-003543 - The mysql.component table is missing or has an incorrect definition. 2021-08-27T01:54:26.811771Z 0 [ERROR] [MY-010326] [Server] Fatal error: Can't open and lock privilege tables: Table 'mysql.user' doesn't 2021-08-27T01:54:26.812089Z 0 [ERROR] [MY-010952] [Server] The privilege system failed to initialize correctly. For complete instructionsSQL to a new version please see the 'Upgrading MySQL' section from the MySQL manual. 2021-08-27T01:54:26.812705Z 0 [ERROR] [MY-010119] [Server] Aborting 2021-08-27T01:54:28.384254Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.24) MySQL Community Server - 望大佬提供一些解决思路,感谢!

你部署的时候有没有卡到,我部署的时候,再跑patch第一的删除的时候卡着不动,然后停了之后手动运行的。

没有卡住,就只是这两个POD一直跑不起来

xiashenzhen commented 3 years ago

其他的POD都可以启动,相关数据库的katib-db-manager,和katib-mysql会有错误产生,查询log如下:

  • katib-db-manager:

E0827 03:18:05.755835 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:10.758696 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:15.754750 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:20.756393 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:25.756346 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:30.758046 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:35.758436 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:40.756272 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:45.756977 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:50.754163 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:18:55.754928 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused E0827 03:19:00.755864 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.67.181:3306: connect: connection refused F0827 03:19:00.755932 1 main.go:99] Failed to open db connection: DB open failed: Timeout waiting for DB conn successfully opened.

  • katib-mysql: 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 02:31:36+00:00 [Note] [Entrypoint]: Initializing database files 2021-08-27T02:31:36.865722Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.24) initializing of server in progress as process 44 2021-08-27T02:31:36.870024Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T02:31:52.754440Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2021-08-27T02:32:53.159102Z 6 [Warning] [MY-010453] [Server] root@localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.

看起来主要是mysql的POD的原因,DB对应的POD连接不上mysql,但是不清楚该如何解决,上述是我在kindest/node:v1.16.9的版本下出现的问题,当使用版本为v1.19.1时,katib-mysql报错如下: 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql' 2021-08-27 01:54:20+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started. 2021-08-27T01:54:20.642846Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.24) starting as process 1 2021-08-27T01:54:20.670463Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2021-08-27T01:54:23.849922Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. mysqld: Table 'mysql.plugin' doesn't exist 2021-08-27T01:54:24.146624Z 0 [ERROR] [MY-010735] [Server] Could not open the mysql.plugin table. Please perform the MySQL upgrade proced 2021-08-27T01:54:24.148013Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.148942Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.149946Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.151631Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.152681Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.153661Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.154611Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:24.360102Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/k 2021-08-27T01:54:24.675026Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be open 2021-08-27T01:54:26.359732Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be open 2021-08-27T01:54:26.563015Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2021-08-27T01:54:26.563668Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now snnel. 2021-08-27T01:54:26.808613Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the patl OS users. Consider choosing a different directory. 2021-08-27T01:54:26.809436Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T01:54:26.810520Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is ae're sending the information to the error-log instead: MY-001146 - Table 'mysql.component' doesn't exist 2021-08-27T01:54:26.810874Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is we're sending the information to the error-log instead: MY-003543 - The mysql.component table is missing or has an incorrect definition. 2021-08-27T01:54:26.811771Z 0 [ERROR] [MY-010326] [Server] Fatal error: Can't open and lock privilege tables: Table 'mysql.user' doesn't 2021-08-27T01:54:26.812089Z 0 [ERROR] [MY-010952] [Server] The privilege system failed to initialize correctly. For complete instructionsSQL to a new version please see the 'Upgrading MySQL' section from the MySQL manual. 2021-08-27T01:54:26.812705Z 0 [ERROR] [MY-010119] [Server] Aborting 2021-08-27T01:54:28.384254Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.24) MySQL Community Server - 望大佬提供一些解决思路,感谢!

你部署的时候有没有卡到,我部署的时候,再跑patch第一的删除的时候卡着不动,然后停了之后手动运行的。

没有卡住,就只是这两个POD一直跑不起来

我现在问题和你一样,感觉是数据库认证问题。。

shikanon commented 3 years ago

@xiashenzhen @WMeng1 你们看看PVC是否有问题:

kubectl get pvc -A

这个mysql应用是很简单的,有可能是你们之前安装出错没有删除导致,关于这个mysql,你们可以看 https://github.com/shikanon/kubeflow-manifests/blob/50ee9f1e0aef5f69620db89c9ae2f81c9b2d96e3/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml#L616

WMeng1 commented 3 years ago

PVC处于pending状态,我刚才这样重新部署了一下这个yaml文件 kubectl delete -f "/opt/wangm/kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml" kubectl apply -f "/opt/wangm/kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml" 但是这几个POD仍然起不了,包括之前我也是delete,start了很多次集群,但是一直这两个节点都是起不来的,不知道我的操作哪里出现了问题,还是没能把问题解决掉,感谢大佬回复

xiashenzhen commented 3 years ago

@xiashenzhen @WMeng1 你们看看PVC是否有问题:

kubectl get pvc -A

这个mysql应用是很简单的,有可能是你们之前安装出错没有删除导致,关于这个mysql,你们可以看

https://github.com/shikanon/kubeflow-manifests/blob/50ee9f1e0aef5f69620db89c9ae2f81c9b2d96e3/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml#L616

感谢回复,我看了下,存贮是没有问题的,不知道为什么,就是这两个POD有问题 image 我把pod删掉重启也不行。。。

不知道是不是版本的问题,我用的kubectl 1.20.5

xiashenzhen commented 3 years ago

kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml

我的现在解决了,直接删了创建 kubectl delete -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml kubectl apply -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml

WMeng1 commented 3 years ago

kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml

我的现在解决了,直接删了创建 kubectl delete -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml kubectl apply -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml

我这里删了创建之后,pvc都bound上了,但是连接数据库的两个Pod虽然为running状态,但是ready显示0/1,describe显示还是没有连通数据库

xiashenzhen commented 3 years ago

kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml

我的现在解决了,直接删了创建 kubectl delete -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml kubectl apply -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml

我这里删了创建之后,pvc都bound上了,但是连接数据库的两个Pod虽然为running状态,但是ready显示0/1,describe显示还是没有连通数据库

你先跑patch里面的东西,delete一遍,然后apply,最后再删除重建

xiashenzhen commented 3 years ago

kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml

我的现在解决了,直接删了创建 kubectl delete -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml kubectl apply -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml

我这里删了创建之后,pvc都bound上了,但是连接数据库的两个Pod虽然为running状态,但是ready显示0/1,describe显示还是没有连通数据库

你先跑patch里面的东西,delete一遍,然后apply,最后再删除重建。跑起来就是配置问题,找到原因就可以解决了

WMeng1 commented 3 years ago

kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml

我的现在解决了,直接删了创建 kubectl delete -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml kubectl apply -f kubeflow-manifests/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml

我这里删了创建之后,pvc都bound上了,但是连接数据库的两个Pod虽然为running状态,但是ready显示0/1,describe显示还是没有连通数据库

你先跑patch里面的东西,delete一遍,然后apply,最后再删除重建

跑起来就是配置问题,按个配置不对,你logs下看看

logs仍然是最开始的那几条log,mysql显示一条warning,db显示联不通数据库

shikanon commented 3 years ago

@WMeng1 确保你的 pvc 被删除了,然后单独 apply 这个 019-katib-installs-katib-with-kubeflow-cert-manager.yaml 文件,或者更精确的,你可以单独 apply 我上面评论的那个 deployment 文件。

Perhurb commented 2 years ago

@shikanon 我也遇到了相同的问题,katib-mysql有问题,也是db-manage和katib-mysql的Pod没起来:

[root@master kubeflow-manifests]# kubectl get pod -nkubeflow
NAME                                                        READY   STATUS             RESTARTS   AGE
admission-webhook-deployment-5f5cc7968b-ck6wk               1/1     Running            0          4d17h
cache-deployer-deployment-64598b6c87-rk4x6                  2/2     Running            1          32m
cache-server-59d67c7584-mt6xs                               2/2     Running            0          32m
centraldashboard-7b6b6cc7fc-7fg5s                           1/1     Running            0          4d17h
jupyter-web-app-deployment-7c6974bb88-cdc6w                 1/1     Running            0          4d17h
katib-controller-7b784c44dd-6r56w                           1/1     Running            0          29m
katib-db-manager-6c5757dc64-6f6v9                           0/1     CrashLoopBackOff   8          29m
katib-mysql-79d75c7444-g4zkv                                0/1     Running            1          29m
katib-ui-69f5b6795d-rpxtg                                   1/1     Running            0          29m
kfserving-controller-manager-0                              2/2     Running            0          4d17h
kubeflow-pipelines-profile-controller-76c45c8c6b-8b9gm      1/1     Running            0          32m
metacontroller-0                                            1/1     Running            0          32m
metadata-envoy-deployment-56f745f7fb-gwt8n                  1/1     Running            0          32m
metadata-grpc-deployment-6494577fdb-xm7qp                   2/2     Running            1          32m
metadata-writer-b7ff9787-jl6xq                              2/2     Running            1          32m
minio-57bcb749d5-7ph7n                                      2/2     Running            0          32m
ml-pipeline-66bcb9d79d-h5p72                                2/2     Running            0          32m
ml-pipeline-persistenceagent-7fb8f6dc68-mwkbg               2/2     Running            0          32m
ml-pipeline-scheduledworkflow-64bcfd6596-xtwlt              2/2     Running            0          32m
ml-pipeline-ui-8578f6685f-2ws4h                             2/2     Running            0          32m
ml-pipeline-viewer-crd-565fb9b5c5-qkzkx                     2/2     Running            1          32m
ml-pipeline-visualizationserver-b7c7d49fb-qbckt             2/2     Running            0          32m
mpi-operator-794849c566-xc7g4                               1/1     Running            2          4d17h
mxnet-operator-6668d797d4-s2pth                             1/1     Running            2          4d17h
mysql-9dfc684cd-lqjwr                                       2/2     Running            0          32m
notebook-controller-deployment-6795dd887b-gctm4             1/1     Running            0          4d17h
profiles-deployment-84bd4f9bc7-dj7d5                        2/2     Running            0          4d17h
pytorch-operator-6887749499-n59mc                           2/2     Running            5          4d17h
tensorboard-controller-controller-manager-dd896c8df-c8gns   3/3     Running            15         4d17h
tensorboards-web-app-deployment-5969cd5b68-mwcxn            1/1     Running            0          4d17h
tf-job-operator-ccb48b77b-2c9vm                             1/1     Running            2          4d17h
volumes-web-app-deployment-867dfb5b5c-vxbfh                 1/1     Running            0          4d17h
workflow-controller-74b88f9855-rvd2h                        2/2     Running            2          32m
xgboost-operator-deployment-665cf9bf8d-wz2vb                2/2     Running            1          4d17h

查看PVC的状态

NAMESPACE      NAME              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
istio-system   authservice-pvc   Bound    pvc-b18f3d6b-dc68-4ec0-b343-ba02cc8dac11   10Gi       RWO            rook-ceph-block   4d17h
kubeflow       katib-mysql       Bound    pvc-f2d493fd-7e18-4347-8573-a2ef8d97b466   10Gi       RWO            rook-ceph-block   31m
kubeflow       minio-pvc         Bound    pvc-62fa4d01-3bce-4e89-89e2-84cf2889e361   20Gi       RWO            rook-ceph-block   35m
kubeflow       mysql-pv-claim    Bound    pvc-f60480e7-f5a2-4991-8e92-40ea95f4d953   20Gi       RWO            rook-ceph-block   35m

事件描述具体如下

Events:
  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Warning  FailedScheduling        22m                default-scheduler        0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
  Warning  FailedScheduling        22m                default-scheduler        0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled               22m                default-scheduler        Successfully assigned kubeflow/katib-mysql-79d75c7444-g4zkv to node2
  Normal   SuccessfulAttachVolume  22m                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-f2d493fd-7e18-4347-8573-a2ef8d97b466"
  Warning  Unhealthy               20m (x3 over 21m)  kubelet                  Liveness probe failed: mysqladmin: [Warning] Using a password on the command line interface can be insecure.
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
  Normal   Killing    20m                kubelet  Container katib-mysql failed liveness probe, will be restarted
  Normal   Pulled     20m (x2 over 21m)  kubelet  Container image "registry.cn-shenzhen.aliyuncs.com/tensorbytes/mysql:8-0627e" already present on machine
  Normal   Created    20m (x2 over 21m)  kubelet  Created container katib-mysql
  Normal   Started    20m (x2 over 21m)  kubelet  Started container katib-mysql
  Warning  Unhealthy  20m                kubelet  Readiness probe failed: OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "process_linux.go:101: executing setns process caused \"exit status 1\"": unknown
  Warning  Unhealthy  20m (x9 over 21m)  kubelet  Readiness probe failed: mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
  Warning  Unhealthy  98s (x111 over 19m)  kubelet  Readiness probe failed: mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)

我不知道该怎么排除这个错误,这问题是数据库认证出现了问题,我尝试了了把所有关于mysql的PVC重启然后重新部署,但没有启动任何作用,希望大神能指点一二。 下面是katib-mysql的日志:

[root@master kubeflow-manifests]# kubectl -n kubeflow logs katib-mysql-79d75c7444-g4zkv
2021-11-24 01:37:12+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started.
2021-11-24 01:37:12+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2021-11-24 01:37:12+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started.
2021-11-24T01:37:13.250055Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.24) starting as process 1
2021-11-24T01:37:13.265138Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-11-24T01:37:23.637424Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.

InnoDB: Progress in percents: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 172021-11-24T01:37:23.956283Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/run/mysqld/mysqlx.sock
 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 442021-11-24T01:37:24.198622Z 0 [System] [MY-010229] [Server] Starting XA crash recovery...
 45 462021-11-24T01:37:24.217075Z 0 [System] [MY-010232] [Server] XA crash recovery finished.
 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 792021-11-24T01:37:24.553035Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2021-11-24T01:37:24.553418Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
 802021-11-24T01:37:24.561314Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.
 81 82 83 842021-11-24T01:37:24.597879Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.24'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server - GPL.
shikanon commented 2 years ago

@Perhurb 可以尝试将kubeflow命名空间下的PVC全部删除再重新apply

jelly234 commented 2 years ago

@shikanon 我也遇到了相同的问题,katib-mysql有问题,也是db-manage和katib-mysql的Pod没起来:

[root@master kubeflow-manifests]# kubectl get pod -nkubeflow
NAME                                                        READY   STATUS             RESTARTS   AGE
admission-webhook-deployment-5f5cc7968b-ck6wk               1/1     Running            0          4d17h
cache-deployer-deployment-64598b6c87-rk4x6                  2/2     Running            1          32m
cache-server-59d67c7584-mt6xs                               2/2     Running            0          32m
centraldashboard-7b6b6cc7fc-7fg5s                           1/1     Running            0          4d17h
jupyter-web-app-deployment-7c6974bb88-cdc6w                 1/1     Running            0          4d17h
katib-controller-7b784c44dd-6r56w                           1/1     Running            0          29m
katib-db-manager-6c5757dc64-6f6v9                           0/1     CrashLoopBackOff   8          29m
katib-mysql-79d75c7444-g4zkv                                0/1     Running            1          29m
katib-ui-69f5b6795d-rpxtg                                   1/1     Running            0          29m
kfserving-controller-manager-0                              2/2     Running            0          4d17h
kubeflow-pipelines-profile-controller-76c45c8c6b-8b9gm      1/1     Running            0          32m
metacontroller-0                                            1/1     Running            0          32m
metadata-envoy-deployment-56f745f7fb-gwt8n                  1/1     Running            0          32m
metadata-grpc-deployment-6494577fdb-xm7qp                   2/2     Running            1          32m
metadata-writer-b7ff9787-jl6xq                              2/2     Running            1          32m
minio-57bcb749d5-7ph7n                                      2/2     Running            0          32m
ml-pipeline-66bcb9d79d-h5p72                                2/2     Running            0          32m
ml-pipeline-persistenceagent-7fb8f6dc68-mwkbg               2/2     Running            0          32m
ml-pipeline-scheduledworkflow-64bcfd6596-xtwlt              2/2     Running            0          32m
ml-pipeline-ui-8578f6685f-2ws4h                             2/2     Running            0          32m
ml-pipeline-viewer-crd-565fb9b5c5-qkzkx                     2/2     Running            1          32m
ml-pipeline-visualizationserver-b7c7d49fb-qbckt             2/2     Running            0          32m
mpi-operator-794849c566-xc7g4                               1/1     Running            2          4d17h
mxnet-operator-6668d797d4-s2pth                             1/1     Running            2          4d17h
mysql-9dfc684cd-lqjwr                                       2/2     Running            0          32m
notebook-controller-deployment-6795dd887b-gctm4             1/1     Running            0          4d17h
profiles-deployment-84bd4f9bc7-dj7d5                        2/2     Running            0          4d17h
pytorch-operator-6887749499-n59mc                           2/2     Running            5          4d17h
tensorboard-controller-controller-manager-dd896c8df-c8gns   3/3     Running            15         4d17h
tensorboards-web-app-deployment-5969cd5b68-mwcxn            1/1     Running            0          4d17h
tf-job-operator-ccb48b77b-2c9vm                             1/1     Running            2          4d17h
volumes-web-app-deployment-867dfb5b5c-vxbfh                 1/1     Running            0          4d17h
workflow-controller-74b88f9855-rvd2h                        2/2     Running            2          32m
xgboost-operator-deployment-665cf9bf8d-wz2vb                2/2     Running            1          4d17h

查看PVC的状态

NAMESPACE      NAME              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
istio-system   authservice-pvc   Bound    pvc-b18f3d6b-dc68-4ec0-b343-ba02cc8dac11   10Gi       RWO            rook-ceph-block   4d17h
kubeflow       katib-mysql       Bound    pvc-f2d493fd-7e18-4347-8573-a2ef8d97b466   10Gi       RWO            rook-ceph-block   31m
kubeflow       minio-pvc         Bound    pvc-62fa4d01-3bce-4e89-89e2-84cf2889e361   20Gi       RWO            rook-ceph-block   35m
kubeflow       mysql-pv-claim    Bound    pvc-f60480e7-f5a2-4991-8e92-40ea95f4d953   20Gi       RWO            rook-ceph-block   35m

事件描述具体如下

Events:
  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Warning  FailedScheduling        22m                default-scheduler        0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
  Warning  FailedScheduling        22m                default-scheduler        0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled               22m                default-scheduler        Successfully assigned kubeflow/katib-mysql-79d75c7444-g4zkv to node2
  Normal   SuccessfulAttachVolume  22m                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-f2d493fd-7e18-4347-8573-a2ef8d97b466"
  Warning  Unhealthy               20m (x3 over 21m)  kubelet                  Liveness probe failed: mysqladmin: [Warning] Using a password on the command line interface can be insecure.
mysqladmin: connect to server at 'localhost' failed
error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
  Normal   Killing    20m                kubelet  Container katib-mysql failed liveness probe, will be restarted
  Normal   Pulled     20m (x2 over 21m)  kubelet  Container image "registry.cn-shenzhen.aliyuncs.com/tensorbytes/mysql:8-0627e" already present on machine
  Normal   Created    20m (x2 over 21m)  kubelet  Created container katib-mysql
  Normal   Started    20m (x2 over 21m)  kubelet  Started container katib-mysql
  Warning  Unhealthy  20m                kubelet  Readiness probe failed: OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "process_linux.go:101: executing setns process caused \"exit status 1\"": unknown
  Warning  Unhealthy  20m (x9 over 21m)  kubelet  Readiness probe failed: mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
  Warning  Unhealthy  98s (x111 over 19m)  kubelet  Readiness probe failed: mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)

我不知道该怎么排除这个错误,这问题是数据库认证出现了问题,我尝试了了把所有关于mysql的PVC重启然后重新部署,但没有启动任何作用,希望大神能指点一二。 下面是katib-mysql的日志:

[root@master kubeflow-manifests]# kubectl -n kubeflow logs katib-mysql-79d75c7444-g4zkv
2021-11-24 01:37:12+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started.
2021-11-24 01:37:12+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2021-11-24 01:37:12+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.24-1debian10 started.
2021-11-24T01:37:13.250055Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.24) starting as process 1
2021-11-24T01:37:13.265138Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-11-24T01:37:23.637424Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.

InnoDB: Progress in percents: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 172021-11-24T01:37:23.956283Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/run/mysqld/mysqlx.sock
 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 442021-11-24T01:37:24.198622Z 0 [System] [MY-010229] [Server] Starting XA crash recovery...
 45 462021-11-24T01:37:24.217075Z 0 [System] [MY-010232] [Server] XA crash recovery finished.
 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 792021-11-24T01:37:24.553035Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2021-11-24T01:37:24.553418Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
 802021-11-24T01:37:24.561314Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.
 81 82 83 842021-11-24T01:37:24.597879Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.24'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server - GPL.

hi 我遇到了和你一样的问题 你最后解决了吗 怎么解决的

MoeXiaoHei commented 1 year ago

已解决