Open ChaoHsin-fang opened 3 weeks ago
所有pod状态都正常
NAME READY STATUS RESTARTS AGE default-ansibleserver-6fdb6458d4-wtsq6 1/1 Running 3 3d19h default-apigateway-dbd8bf9fc-95hlh 1/1 Running 4 3d19h default-apimap-5656dff545-qqdrq 1/1 Running 3 3d19h default-baremetal-agent-6bc454f845-r97hr 1/1 Running 3 3d19h default-climc-77645d6b5b-xt27f 1/1 Running 0 3d19h default-cloudmon-866977f748-959kq 1/1 Running 4 3d19h default-cloudproxy-59d598b59b-r76v9 1/1 Running 3 3d19h default-devtool-5d68ff7b9-s642w 1/1 Running 4 3d19h default-esxi-agent-6758cd5bf5-rskd9 1/1 Running 0 3d19h default-etcd-fls59n7pxh 1/1 Running 0 3d19h default-glance-5b6f7c5b96-frhht 0/1 CrashLoopBackOff 829 3d19h default-host-2lbsr 3/3 Running 9 3d19h default-host-2tnwd 3/3 Running 6 3d19h default-host-deployer-2thwq 1/1 Running 4 3d19h default-host-deployer-m5hwr 1/1 Running 4 3d19h default-host-deployer-q65gh 1/1 Running 4 3d19h default-host-deployer-xg84g 1/1 Running 2 3d19h default-host-fj765 3/3 Running 0 3d19h default-host-health-j7ngf 1/1 Running 5 3d19h default-host-health-nkdzf 1/1 Running 4 3d19h default-host-health-phnld 1/1 Running 5 3d19h default-host-health-zbd7c 1/1 Running 5 3d19h default-host-image-2f5xm 1/1 Running 3 3d19h default-host-image-c6fq4 1/1 Running 2 3d19h default-host-image-rlvz6 1/1 Running 4 3d19h default-host-image-rnlls 1/1 Running 4 3d19h default-host-jhcz4 3/3 Running 7 3d19h default-keystone-5b56544cc-tgh6d 1/1 Running 0 3d19h default-kubeserver-f6b8c7977-97zkj 1/1 Running 0 3d19h default-logger-5ffbd6597b-b4d5p 1/1 Running 0 3d19h default-monitor-cc44875c4-hhvw4 1/1 Running 4 3d19h default-notify-5f87d4c7cd-lkwp4 1/1 Running 0 3d19h default-onecloud-service-operator-7fbbcb7c8-wm8tj 1/1 Running 2 3d19h default-ovn-north-6b67f766f8-njjwg 1/1 Running 0 3d19h default-region-5c97c994f6-h5ct5 1/1 Running 0 3d19h default-region-dns-f9jsj 1/1 Running 0 3d19h default-region-dns-lq4fg 1/1 Running 2 3d19h default-scheduledtask-79cb96d468-26mj8 1/1 Running 0 3d19h default-scheduler-598578dcd5-lvmxf 1/1 Running 3 3d19h default-telegraf-jk8dh 1/1 Running 0 3d19h default-telegraf-jw9tw 1/1 Running 0 3d19h default-telegraf-l9nwn 1/1 Running 0 3d19h default-telegraf-t4rdq 1/1 Running 0 3d19h default-victoria-metrics-5dc6cb5c7c-znfwj 1/1 Running 0 3d19h default-vpcagent-65c969d497-kt26s 1/1 Running 3 3d19h default-web-7d5494cdd5-6v2sz 1/1 Running 0 3d19h default-webconsole-8466b9b999-zc2n4 2/2 Running 3 3d19h default-yunionconf-648d5f9b9-nbb8l 1/1 Running 2 3d19h onecloud-operator-86d474747d-8ctph 1/1 Running 0 3d19h
只有glance pod异常 删除glance pod 让它重启 也不能恢复 日志如下 帮忙看下是哪里的问题
[debug 241029 03:03:31 db.NewJointResourceBaseManager(jointbase.go:48)] Initialize guestimagejoints [info 241029 03:03:31 options.parseOptions(options.go:334)] Use configuration file: /etc/yunion/glance.conf [info 241029 03:03:31 options.parseOptions(options.go:357)] Set log level to "info" [error 2024-10-29 03:03:31 auth.(authManager).startRefreshRevokeTokens(auth.go:193)] refreshRevokeTokens: No valid admin token credential [info 2024-10-29 03:03:34 service.StartService.func1(service.go:66)] Auth complete!! [info 2024-10-29 03:03:34 policy.(SPolicyManager).init(policy.go:160)] policy fetch worker count 1 [info 2024-10-29 03:03:34 consts.SetNonDefaultDomainProjects(consts.go:109)] set non_default_domain_projects to false [info 2024-10-29 03:03:34 options.StartOptionManagerWithSessionDriver(manager.go:68)] OptionManager start to fetch service configs with interval 30m0s ... [info 2024-10-29 03:03:34 watcher.(SInformerSyncManager).startWatcher(watcher.go:83)]EndpointChangeManager: Start resource informer watcher for endpoint [info 2024-10-29 03:03:34 informer.(EtcdBackendForClient).StartClientWatch(etcd_client.go:84)] /onecloud/informer watched [info 2024-10-29 03:03:34 informer.NewWatchManagerBySessionBg.func1(watcher.go:51)] callback with watchMan success. [info 2024-10-29 03:03:34 options.optionsEquals(manager.go:116)] Options changed: {"storage_driver":["local","s3"]} [info 2024-10-29 03:03:34 service.StartService(service.go:97)] exec socket path: /var/run/onecloud/exec.sock [info 2024-10-29 03:03:34 service.StartService(service.go:104)] Target image formats []string{"qcow2"} [info 2024-10-29 03:03:34 watcher.(SInformerSyncManager).startWatcher(watcher.go:83)]ServiceConfigManager: Start resource informer watcher for service [info 2024-10-29 03:03:34 informer.(EtcdBackendForClient).StartClientWatch(etcd_client.go:84)] /onecloud/informer watched [error 2024-10-29 03:03:34 torrent.GetTrackers(torrent.go:66)] fail to get torrent-tracker [error 2024-10-29 03:03:34 service.StartService(service.go:116)] no valid torrent-tracker [info 2024-10-29 03:03:34 app.InitApp(app.go:32)] RequestWorkerCount: 8 [info 2024-10-29 03:03:34 appsrv.NewApplication(appsrv.go:121)] App hostId: a1h56qoWy-U3xEzYfKWpwFiSq-I= (image,default-glance-5b6f7c5b96-frhht,10.40.73.97) 2024/10/29 03:03:34 Allow hosts [] [info 2024-10-29 03:03:34 appsrv.(Application).SetDefaultTimeout(appsrv.go:137)] adjust application default timeout to 60.000000 seconds [info 2024-10-29 03:03:34 cloudcommon.InitDB(database.go:48)] Registered SQL drivers: clickhouse, dm, mysql, sqlite3 [info 2024-10-29 03:03:34 cloudcommon.InitDB(database.go:88)] database dialect: mysql sqlStr: glance:vmeq537eHSfpjjxD@tcp(10.64.25.149:3306)/glance?charset=utf8&parseTime=True [info 2024-10-29 03:03:34 cloudcommon.InitDB(database.go:122)] using inmemory lockman [info 2024-10-29 03:03:34 db.CheckSync(models.go:116)] Start check database schema: autoSync(true), enableChecksumTables(false), skipInitChecksum(false) [info 2024-10-29 03:03:34 informer.NewWatchManagerBySessionBg.func1(watcher.go:51)] callback with watchMan success. [info 2024-10-29 03:03:34 db.setDbConnection(database.go:60)] Total 45 db workers, set db connection max [info 2024-10-29 03:03:34 service.StartService(service.go:131)] deploy server socket path: /var/run/onecloud/deploy.sock [info 2024-10-29 03:03:34 app.ServeForeverExtended(app.go:60)] Start listen on https://0.0.0.0:30292, isMaster: true [info 2024-10-29 03:03:34 watcher.(SInformerSyncManager).startWatcher(watcher.go:83)]ResourceChangeManager:project: Start resource informer watcher for project [info 2024-10-29 03:03:34 watcher.(SInformerSyncManager).startWatcher(watcher.go:83)]ResourceChangeManager:domain: Start resource informer watcher for domain [info 2024-10-29 03:03:34 watcher.(SInformerSyncManager).startWatcher(watcher.go:83)]ResourceChangeManager:user: Start resource informer watcher for user [info 2024-10-29 03:03:34 informer.(EtcdBackendForClient).StartClientWatch(etcd_client.go:84)] /onecloud/informer watched [info 2024-10-29 03:03:34 informer.(EtcdBackendForClient).StartClientWatch(etcd_client.go:84)] /onecloud/informer watched [info 2024-10-29 03:03:34 informer.(EtcdBackendForClient).StartClientWatch(etcd_client.go:84)] /onecloud/informer watched [info 2024-10-29 03:03:34 informer.NewWatchManagerBySessionBg.func1(watcher.go:51)] callback with watchMan success. [info 2024-10-29 03:03:34 informer.NewWatchManagerBySessionBg.func1(watcher.go:51)] callback with watchMan success. [info 2024-10-29 03:03:34 informer.NewWatchManagerBySessionBg.func1(watcher.go:51)] callback with watchMan success. [info 2024-10-29 03:05:01 appsrv.(Application).ServeHTTP(appsrv.go:289)] a1h56qoWy-U3xEzYfKWpwFiSq-I= 404 2fe56e HEAD /v1/images/61dbe777-5227-4cf9-8ad7-06f5bf4e84cb (10.110.112.111:45356:compute_v2/cron-service) 1325.25ms [info 2024-10-29 03:05:01 appsrv.(*Application).ServeHTTP(appsrv.go:289)] a1h56qoWy-U3xEzYfKWpwFiSq-I= 404 658fea HEAD /v1/images/61dbe777-5227-4cf9-8ad7-06f5bf4e84cb (10.110.112.111:45356:compute_v2/cron-service) 0.79ms goroutine 276 [running]: runtime/debug.Stack() /usr/lib/go/src/runtime/debug/stack.go:24 +0x65 runtime/debug.PrintStack() /usr/lib/go/src/runtime/debug/stack.go:16 +0x19 yunion.io/x/log.Fatalf({0x1e4139b, 0x18}, {0xc001583fa0, 0x1, 0x1}) /root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/log/log.go:138 +0x32 yunion.io/x/onecloud/pkg/image/service.initS3() /root/go/src/yunion.io/x/onecloud/pkg/image/service/service.go:198 +0x15d created by yunion.io/x/onecloud/pkg/image/service.StartService /root/go/src/yunion.io/x/onecloud/pkg/image/service/service.go:136 +0x865 [fatal 2024-10-29 03:05:19 service.initS3(service.go:198)] failed init s3 client new minio client: fetchBuckets: client.ListBuckets: Server not initialized, please try again.
service.initS3(service.go:198)] failed init s3 client new minio client: fetchBuckets: client.ListBuckets: Server not initialized, please try again.
看报错是对应连接的 s3 服务有问题
所有pod状态都正常
kubectl get pods -n onecloud
NAME READY STATUS RESTARTS AGE default-ansibleserver-6fdb6458d4-wtsq6 1/1 Running 3 3d19h default-apigateway-dbd8bf9fc-95hlh 1/1 Running 4 3d19h default-apimap-5656dff545-qqdrq 1/1 Running 3 3d19h default-baremetal-agent-6bc454f845-r97hr 1/1 Running 3 3d19h default-climc-77645d6b5b-xt27f 1/1 Running 0 3d19h default-cloudmon-866977f748-959kq 1/1 Running 4 3d19h default-cloudproxy-59d598b59b-r76v9 1/1 Running 3 3d19h default-devtool-5d68ff7b9-s642w 1/1 Running 4 3d19h default-esxi-agent-6758cd5bf5-rskd9 1/1 Running 0 3d19h default-etcd-fls59n7pxh 1/1 Running 0 3d19h default-glance-5b6f7c5b96-frhht 0/1 CrashLoopBackOff 829 3d19h default-host-2lbsr 3/3 Running 9 3d19h default-host-2tnwd 3/3 Running 6 3d19h default-host-deployer-2thwq 1/1 Running 4 3d19h default-host-deployer-m5hwr 1/1 Running 4 3d19h default-host-deployer-q65gh 1/1 Running 4 3d19h default-host-deployer-xg84g 1/1 Running 2 3d19h default-host-fj765 3/3 Running 0 3d19h default-host-health-j7ngf 1/1 Running 5 3d19h default-host-health-nkdzf 1/1 Running 4 3d19h default-host-health-phnld 1/1 Running 5 3d19h default-host-health-zbd7c 1/1 Running 5 3d19h default-host-image-2f5xm 1/1 Running 3 3d19h default-host-image-c6fq4 1/1 Running 2 3d19h default-host-image-rlvz6 1/1 Running 4 3d19h default-host-image-rnlls 1/1 Running 4 3d19h default-host-jhcz4 3/3 Running 7 3d19h default-keystone-5b56544cc-tgh6d 1/1 Running 0 3d19h default-kubeserver-f6b8c7977-97zkj 1/1 Running 0 3d19h default-logger-5ffbd6597b-b4d5p 1/1 Running 0 3d19h default-monitor-cc44875c4-hhvw4 1/1 Running 4 3d19h default-notify-5f87d4c7cd-lkwp4 1/1 Running 0 3d19h default-onecloud-service-operator-7fbbcb7c8-wm8tj 1/1 Running 2 3d19h default-ovn-north-6b67f766f8-njjwg 1/1 Running 0 3d19h default-region-5c97c994f6-h5ct5 1/1 Running 0 3d19h default-region-dns-f9jsj 1/1 Running 0 3d19h default-region-dns-lq4fg 1/1 Running 2 3d19h default-scheduledtask-79cb96d468-26mj8 1/1 Running 0 3d19h default-scheduler-598578dcd5-lvmxf 1/1 Running 3 3d19h default-telegraf-jk8dh 1/1 Running 0 3d19h default-telegraf-jw9tw 1/1 Running 0 3d19h default-telegraf-l9nwn 1/1 Running 0 3d19h default-telegraf-t4rdq 1/1 Running 0 3d19h default-victoria-metrics-5dc6cb5c7c-znfwj 1/1 Running 0 3d19h default-vpcagent-65c969d497-kt26s 1/1 Running 3 3d19h default-web-7d5494cdd5-6v2sz 1/1 Running 0 3d19h default-webconsole-8466b9b999-zc2n4 2/2 Running 3 3d19h default-yunionconf-648d5f9b9-nbb8l 1/1 Running 2 3d19h onecloud-operator-86d474747d-8ctph 1/1 Running 0 3d19h
只有glance pod异常 删除glance pod 让它重启 也不能恢复 日志如下 帮忙看下是哪里的问题
kubectl logs default-glance-5b6f7c5b96-frhht -n onecloud -f
[debug 241029 03:03:31 db.NewJointResourceBaseManager(jointbase.go:48)] Initialize guestimagejoints [info 241029 03:03:31 options.parseOptions(options.go:334)] Use configuration file: /etc/yunion/glance.conf [info 241029 03:03:31 options.parseOptions(options.go:357)] Set log level to "info" [error 2024-10-29 03:03:31 auth.(authManager).startRefreshRevokeTokens(auth.go:193)] refreshRevokeTokens: No valid admin token credential [info 2024-10-29 03:03:34 service.StartService.func1(service.go:66)] Auth complete!! [info 2024-10-29 03:03:34 policy.(SPolicyManager).init(policy.go:160)] policy fetch worker count 1 [info 2024-10-29 03:03:34 consts.SetNonDefaultDomainProjects(consts.go:109)] set non_default_domain_projects to false [info 2024-10-29 03:03:34 options.StartOptionManagerWithSessionDriver(manager.go:68)] OptionManager start to fetch service configs with interval 30m0s ... [info 2024-10-29 03:03:34 watcher.(SInformerSyncManager).startWatcher(watcher.go:83)]EndpointChangeManager: Start resource informer watcher for endpoint [info 2024-10-29 03:03:34 informer.(EtcdBackendForClient).StartClientWatch(etcd_client.go:84)] /onecloud/informer watched [info 2024-10-29 03:03:34 informer.NewWatchManagerBySessionBg.func1(watcher.go:51)] callback with watchMan success. [info 2024-10-29 03:03:34 options.optionsEquals(manager.go:116)] Options changed: {"storage_driver":["local","s3"]} [info 2024-10-29 03:03:34 service.StartService(service.go:97)] exec socket path: /var/run/onecloud/exec.sock [info 2024-10-29 03:03:34 service.StartService(service.go:104)] Target image formats []string{"qcow2"} [info 2024-10-29 03:03:34 watcher.(SInformerSyncManager).startWatcher(watcher.go:83)]ServiceConfigManager: Start resource informer watcher for service [info 2024-10-29 03:03:34 informer.(EtcdBackendForClient).StartClientWatch(etcd_client.go:84)] /onecloud/informer watched [error 2024-10-29 03:03:34 torrent.GetTrackers(torrent.go:66)] fail to get torrent-tracker [error 2024-10-29 03:03:34 service.StartService(service.go:116)] no valid torrent-tracker [info 2024-10-29 03:03:34 app.InitApp(app.go:32)] RequestWorkerCount: 8 [info 2024-10-29 03:03:34 appsrv.NewApplication(appsrv.go:121)] App hostId: a1h56qoWy-U3xEzYfKWpwFiSq-I= (image,default-glance-5b6f7c5b96-frhht,10.40.73.97) 2024/10/29 03:03:34 Allow hosts [] [info 2024-10-29 03:03:34 appsrv.(Application).SetDefaultTimeout(appsrv.go:137)] adjust application default timeout to 60.000000 seconds [info 2024-10-29 03:03:34 cloudcommon.InitDB(database.go:48)] Registered SQL drivers: clickhouse, dm, mysql, sqlite3 [info 2024-10-29 03:03:34 cloudcommon.InitDB(database.go:88)] database dialect: mysql sqlStr: glance:vmeq537eHSfpjjxD@tcp(10.64.25.149:3306)/glance?charset=utf8&parseTime=True [info 2024-10-29 03:03:34 cloudcommon.InitDB(database.go:122)] using inmemory lockman [info 2024-10-29 03:03:34 db.CheckSync(models.go:116)] Start check database schema: autoSync(true), enableChecksumTables(false), skipInitChecksum(false) [info 2024-10-29 03:03:34 informer.NewWatchManagerBySessionBg.func1(watcher.go:51)] callback with watchMan success. [info 2024-10-29 03:03:34 db.setDbConnection(database.go:60)] Total 45 db workers, set db connection max [info 2024-10-29 03:03:34 service.StartService(service.go:131)] deploy server socket path: /var/run/onecloud/deploy.sock [info 2024-10-29 03:03:34 app.ServeForeverExtended(app.go:60)] Start listen on https://0.0.0.0:30292, isMaster: true [info 2024-10-29 03:03:34 watcher.(SInformerSyncManager).startWatcher(watcher.go:83)]ResourceChangeManager:project: Start resource informer watcher for project [info 2024-10-29 03:03:34 watcher.(SInformerSyncManager).startWatcher(watcher.go:83)]ResourceChangeManager:domain: Start resource informer watcher for domain [info 2024-10-29 03:03:34 watcher.(SInformerSyncManager).startWatcher(watcher.go:83)]ResourceChangeManager:user: Start resource informer watcher for user [info 2024-10-29 03:03:34 informer.(EtcdBackendForClient).StartClientWatch(etcd_client.go:84)] /onecloud/informer watched [info 2024-10-29 03:03:34 informer.(EtcdBackendForClient).StartClientWatch(etcd_client.go:84)] /onecloud/informer watched [info 2024-10-29 03:03:34 informer.(EtcdBackendForClient).StartClientWatch(etcd_client.go:84)] /onecloud/informer watched [info 2024-10-29 03:03:34 informer.NewWatchManagerBySessionBg.func1(watcher.go:51)] callback with watchMan success. [info 2024-10-29 03:03:34 informer.NewWatchManagerBySessionBg.func1(watcher.go:51)] callback with watchMan success. [info 2024-10-29 03:03:34 informer.NewWatchManagerBySessionBg.func1(watcher.go:51)] callback with watchMan success. [info 2024-10-29 03:05:01 appsrv.(Application).ServeHTTP(appsrv.go:289)] a1h56qoWy-U3xEzYfKWpwFiSq-I= 404 2fe56e HEAD /v1/images/61dbe777-5227-4cf9-8ad7-06f5bf4e84cb (10.110.112.111:45356:compute_v2/cron-service) 1325.25ms [info 2024-10-29 03:05:01 appsrv.(*Application).ServeHTTP(appsrv.go:289)] a1h56qoWy-U3xEzYfKWpwFiSq-I= 404 658fea HEAD /v1/images/61dbe777-5227-4cf9-8ad7-06f5bf4e84cb (10.110.112.111:45356:compute_v2/cron-service) 0.79ms goroutine 276 [running]: runtime/debug.Stack() /usr/lib/go/src/runtime/debug/stack.go:24 +0x65 runtime/debug.PrintStack() /usr/lib/go/src/runtime/debug/stack.go:16 +0x19 yunion.io/x/log.Fatalf({0x1e4139b, 0x18}, {0xc001583fa0, 0x1, 0x1}) /root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/log/log.go:138 +0x32 yunion.io/x/onecloud/pkg/image/service.initS3() /root/go/src/yunion.io/x/onecloud/pkg/image/service/service.go:198 +0x15d created by yunion.io/x/onecloud/pkg/image/service.StartService /root/go/src/yunion.io/x/onecloud/pkg/image/service/service.go:136 +0x865 [fatal 2024-10-29 03:05:19 service.initS3(service.go:198)] failed init s3 client new minio client: fetchBuckets: client.ListBuckets: Server not initialized, please try again.