Closed HarborZeng closed 3 years ago
可以kubectl describe
看看 mysql 为啥没起来
又研究了几天,现在只有两个pods有问题,一个是 cache-server
,一个是 cache-deployer-deployment
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
auth dex-6686f66f9b-m9vds 1/1 Running 0 142m
cattle-system cattle-cluster-agent-69d44b7858-5ttvb 0/1 CrashLoopBackOff 36 160m
cattle-system cattle-cluster-agent-854998cfb7-gzmjn 0/1 ImagePullBackOff 0 160m
cattle-system cattle-node-agent-6rsqw 1/1 Running 0 157m
cattle-system cattle-node-agent-q2g67 1/1 Running 0 170m
cattle-system kube-api-auth-xww7h 1/1 Running 0 157m
cert-manager cert-manager-9d5774b59-8mjc5 1/1 Running 0 142m
cert-manager cert-manager-cainjector-67c8c5c665-dxhdh 1/1 Running 0 142m
cert-manager cert-manager-webhook-75dc9757bd-txnm2 1/1 Running 1 142m
ingress-nginx nginx-ingress-controller-9jzpx 1/1 Running 0 170m
ingress-nginx nginx-ingress-controller-bscxv 1/1 Running 0 157m
istio-system authservice-0 1/1 Running 0 142m
istio-system cluster-local-gateway-66bcf8bc5d-2nqsm 1/1 Running 0 141m
istio-system istio-ingressgateway-85b49c758f-gmzzs 1/1 Running 0 141m
istio-system istiod-5ff6cdbbcd-5x2dn 1/1 Running 0 141m
knative-eventing broker-controller-5c84984b97-gv4bb 1/1 Running 0 142m
knative-eventing eventing-controller-54bfbd5446-qnmdk 1/1 Running 0 142m
knative-eventing eventing-webhook-58f56d9cf4-2cz5v 1/1 Running 0 142m
knative-eventing imc-controller-769896c7db-m2hkp 1/1 Running 0 142m
knative-eventing imc-dispatcher-86954fb4cd-bpwsz 1/1 Running 0 142m
knative-serving activator-75696c8c9-67sj2 1/1 Running 0 142m
knative-serving autoscaler-6764f9b5c5-wmc98 1/1 Running 0 142m
knative-serving controller-598fd8bfd7-c4wrs 1/1 Running 0 142m
knative-serving istio-webhook-785bb58cc6-p7dcv 1/1 Running 0 142m
knative-serving networking-istio-77fbcfcf9b-l5gkv 1/1 Running 0 142m
knative-serving webhook-865f54cf5f-klcs4 1/1 Running 0 142m
kube-system coredns-6b84d75d99-2f5p4 1/1 Running 0 3h4m
kube-system coredns-6b84d75d99-rpvl7 1/1 Running 0 157m
kube-system coredns-autoscaler-5c4b6999d9-pp9xs 1/1 Running 0 3h4m
kube-system kube-flannel-kjmlb 2/2 Running 0 157m
kube-system kube-flannel-vnhm7 2/2 Running 0 170m
kube-system metrics-server-7579449c57-2jqld 1/1 Running 0 3h4m
kubeflow-user-example-com ml-pipeline-ui-artifact-6d7ffcc4b6-rcghq 2/2 Running 0 116m
kubeflow-user-example-com ml-pipeline-visualizationserver-84d577b989-t49gf 2/2 Running 0 116m
kubeflow admission-webhook-deployment-6fb9d65887-vsf8h 1/1 Running 0 138m
kubeflow cache-deployer-deployment-7558d65bf4-s7bwk 1/2 CrashLoopBackOff 19 138m
kubeflow cache-server-c64c68ddf-f7c9m 0/2 Init:0/1 0 138m
kubeflow centraldashboard-7b7676d8bd-qt6g5 1/1 Running 0 138m
kubeflow jupyter-web-app-deployment-66f74586d9-kts2c 1/1 Running 0 98m
kubeflow katib-controller-77675c88df-gqp5k 1/1 Running 0 138m
kubeflow katib-db-manager-646695754f-rwnpk 1/1 Running 3 138m
kubeflow katib-mysql-5bb5bd9957-9zh8x 1/1 Running 0 138m
kubeflow katib-ui-55fd4bd6f9-vcn6f 1/1 Running 0 138m
kubeflow kfserving-controller-manager-0 2/2 Running 0 139m
kubeflow kubeflow-pipelines-profile-controller-5698bf57cf-8cvbw 1/1 Running 0 138m
kubeflow kubeflow-pipelines-profile-controller-5698bf57cf-tdjjm 1/1 Running 0 98m
kubeflow metacontroller-0 1/1 Running 0 139m
kubeflow metadata-envoy-deployment-76d65977f7-kcq7g 1/1 Running 0 138m
kubeflow metadata-grpc-deployment-697d9c6c67-9t9zt 2/2 Running 6 138m
kubeflow metadata-writer-58cdd57678-24gqw 2/2 Running 2 138m
kubeflow minio-6d6784db95-lrr67 2/2 Running 0 98m
kubeflow ml-pipeline-85fc99f899-mwkrk 2/2 Running 5 138m
kubeflow ml-pipeline-persistenceagent-65cb9594c7-hbzcx 2/2 Running 1 138m
kubeflow ml-pipeline-scheduledworkflow-7f8d8dfc69-c6lhl 2/2 Running 0 138m
kubeflow ml-pipeline-ui-5c765cc7bd-p4lmb 2/2 Running 0 138m
kubeflow ml-pipeline-viewer-crd-5b8df7f458-tq2gp 2/2 Running 1 138m
kubeflow ml-pipeline-visualizationserver-56c5ff68d5-stgm5 2/2 Running 0 138m
kubeflow mpi-operator-789f88879-v2bh7 1/1 Running 0 138m
kubeflow mxnet-operator-7fff864957-5kc2w 1/1 Running 0 138m
kubeflow mysql-56b554ff66-wvpg5 2/2 Running 0 98m
kubeflow notebook-controller-deployment-74d9584477-x2tpk 1/1 Running 0 138m
kubeflow profiles-deployment-67b4666796-js8k7 2/2 Running 0 138m
kubeflow pytorch-operator-fd86f7694-8fcbc 2/2 Running 0 138m
kubeflow tensorboard-controller-controller-manager-fd6bcffb4-6clhz 3/3 Running 1 138m
kubeflow tensorboards-web-app-deployment-78d7b8b658-chccs 1/1 Running 0 138m
kubeflow tf-job-operator-7bc5cf4cc7-txfbc 1/1 Running 0 138m
kubeflow volumes-web-app-deployment-68fcfc9775-d9tdx 1/1 Running 0 138m
kubeflow workflow-controller-5449754fb4-czlsb 2/2 Running 2 137m
kubeflow xgboost-operator-deployment-5c7bfd57cc-2jrd7 2/2 Running 1 138m
local-path-storage local-path-provisioner-5bd6f65fdf-j575f 1/1 Running 0 147m
$ kubectl describe pod cache-deployer-deployment -n kubeflow
Name: cache-deployer-deployment-7558d65bf4-s7bwk
Namespace: kubeflow
Priority: 0
Node: node1/10.102.13.9
Start Time: Tue, 18 May 2021 15:12:33 +0800
Labels: app=cache-deployer
app.kubernetes.io/component=ml-pipeline
app.kubernetes.io/name=kubeflow-pipelines
application-crd-id=kubeflow-pipelines
istio.io/rev=default
pod-template-hash=7558d65bf4
security.istio.io/tlsMode=istio
service.istio.io/canonical-name=kubeflow-pipelines
service.istio.io/canonical-revision=latest
Annotations: kubectl.kubernetes.io/default-logs-container: main
prometheus.io/path: /stats/prometheus
prometheus.io/port: 15020
prometheus.io/scrape: true
sidecar.istio.io/status:
{"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-data","istio-podinfo","istiod-ca-cert"],"ima...
Status: Running
IP: 10.42.0.20
IPs:
IP: 10.42.0.20
Controlled By: ReplicaSet/cache-deployer-deployment-7558d65bf4
Init Containers:
istio-init:
Container ID: docker://1ce4c15f6318caeb0fb9b258ef8bdc11e712f1d5be366584fc9805c9645a9f15
Image: docker.io/istio/proxyv2:1.9.0
Image ID: docker-pullable://istio/proxyv2@sha256:286b821197d7a9233d1d889119f090cd9a9394468d3a312f66ea24f6e16b2294
Port: <none>
Host Port: <none>
Args:
istio-iptables
-p
15001
-z
15006
-u
1337
-m
REDIRECT
-i
*
-x
-b
*
-d
15090,15021,15020
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 18 May 2021 15:13:40 +0800
Finished: Tue, 18 May 2021 15:13:40 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 10m
memory: 40Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kubeflow-pipelines-cache-deployer-sa-token-9cbqj (ro)
Containers:
main:
Container ID: docker://6ce9304d9db0e6e3d2733b45cbe8539a920dad1ecf856d39e10ae87ff90d7b2d
Image: registry.cn-shenzhen.aliyuncs.com/tensorbytes/ml-pipeline-cache-deployer:1.5.0-rc.2-deb1e
Image ID: docker-pullable://registry.cn-shenzhen.aliyuncs.com/tensorbytes/ml-pipeline-cache-deployer@sha256:a13d49a4bee754f221697957d8491469bf2f958bbaac3d09707f053c8a4adf83
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 18 May 2021 17:30:07 +0800
Finished: Tue, 18 May 2021 17:31:00 +0800
Ready: False
Restart Count: 19
Environment:
NAMESPACE_TO_WATCH: kubeflow (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kubeflow-pipelines-cache-deployer-sa-token-9cbqj (ro)
istio-proxy:
Container ID: docker://9053d64fa0acc159584b39a78cf482a564fc467edad0ad893a82b34f147ce346
Image: docker.io/istio/proxyv2:1.9.0
Image ID: docker-pullable://istio/proxyv2@sha256:286b821197d7a9233d1d889119f090cd9a9394468d3a312f66ea24f6e16b2294
Port: 15090/TCP
Host Port: 0/TCP
Args:
proxy
sidecar
--domain
$(POD_NAMESPACE).svc.cluster.local
--serviceCluster
cache-deployer.$(POD_NAMESPACE)
--proxyLogLevel=warning
--proxyComponentLogLevel=misc:error
--log_output_level=default:info
--concurrency
2
State: Running
Started: Tue, 18 May 2021 15:59:50 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 10m
memory: 40Mi
Readiness: http-get http://:15021/healthz/ready delay=1s timeout=3s period=2s #success=1 #failure=30
Environment:
JWT_POLICY: first-party-jwt
PILOT_CERT_PROVIDER: istiod
CA_ADDR: istiod.istio-system.svc:15012
POD_NAME: cache-deployer-deployment-7558d65bf4-s7bwk (v1:metadata.name)
POD_NAMESPACE: kubeflow (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
SERVICE_ACCOUNT: (v1:spec.serviceAccountName)
HOST_IP: (v1:status.hostIP)
CANONICAL_SERVICE: (v1:metadata.labels['service.istio.io/canonical-name'])
CANONICAL_REVISION: (v1:metadata.labels['service.istio.io/canonical-revision'])
PROXY_CONFIG: {}
ISTIO_META_POD_PORTS: [
]
ISTIO_META_APP_CONTAINERS: main
ISTIO_META_CLUSTER_ID: Kubernetes
ISTIO_META_INTERCEPTION_MODE: REDIRECT
ISTIO_META_WORKLOAD_NAME: cache-deployer-deployment
ISTIO_META_OWNER: kubernetes://apis/apps/v1/namespaces/kubeflow/deployments/cache-deployer-deployment
ISTIO_META_MESH_ID: cluster.local
TRUST_DOMAIN: cluster.local
Mounts:
/etc/istio/pod from istio-podinfo (rw)
/etc/istio/proxy from istio-envoy (rw)
/var/lib/istio/data from istio-data (rw)
/var/run/secrets/istio from istiod-ca-cert (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kubeflow-pipelines-cache-deployer-sa-token-9cbqj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istio-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
istio-podinfo:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
metadata.annotations -> annotations
limits.cpu -> cpu-limit
requests.cpu -> cpu-request
istiod-ca-cert:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: istio-ca-root-cert
Optional: false
kubeflow-pipelines-cache-deployer-sa-token-9cbqj:
Type: Secret (a volume populated by a Secret)
SecretName: kubeflow-pipelines-cache-deployer-sa-token-9cbqj
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulling 31m (x15 over 138m) kubelet, node1 Pulling image "registry.cn-shenzhen.aliyuncs.com/tensorbytes/ml-pipeline-cache-deployer:1.5.0-rc.2-deb1e"
Warning BackOff 2m3s (x328 over 89m) kubelet, node1 Back-off restarting failed container
$ kubectl describe pod cache-server -n kubeflow
Name: cache-server-c64c68ddf-f7c9m
Namespace: kubeflow
Priority: 0
Node: node1/10.102.13.9
Start Time: Tue, 18 May 2021 15:12:34 +0800
Labels: app=cache-server
app.kubernetes.io/component=ml-pipeline
app.kubernetes.io/name=kubeflow-pipelines
application-crd-id=kubeflow-pipelines
istio.io/rev=default
pod-template-hash=c64c68ddf
security.istio.io/tlsMode=istio
service.istio.io/canonical-name=kubeflow-pipelines
service.istio.io/canonical-revision=latest
Annotations: kubectl.kubernetes.io/default-logs-container: server
prometheus.io/path: /stats/prometheus
prometheus.io/port: 15020
prometheus.io/scrape: true
sidecar.istio.io/status:
{"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-data","istio-podinfo","istiod-ca-cert"],"ima...
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/cache-server-c64c68ddf
Init Containers:
istio-init:
Container ID:
Image: docker.io/istio/proxyv2:1.9.0
Image ID:
Port: <none>
Host Port: <none>
Args:
istio-iptables
-p
15001
-z
15006
-u
1337
-m
REDIRECT
-i
*
-x
-b
*
-d
15090,15021,15020
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 10m
memory: 40Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kubeflow-pipelines-cache-token-7pwl7 (ro)
Containers:
server:
Container ID:
Image: registry.cn-shenzhen.aliyuncs.com/tensorbytes/ml-pipeline-cache-server:1.5.0-rc.2-a44df
Image ID:
Port: 8443/TCP
Host Port: 0/TCP
Args:
--db_driver=$(DBCONFIG_DRIVER)
--db_host=$(DBCONFIG_HOST_NAME)
--db_port=$(DBCONFIG_PORT)
--db_name=$(DBCONFIG_DB_NAME)
--db_user=$(DBCONFIG_USER)
--db_password=$(DBCONFIG_PASSWORD)
--namespace_to_watch=$(NAMESPACE_TO_WATCH)
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment:
NAMESPACE_TO_WATCH:
CACHE_IMAGE: <set to the key 'cacheImage' of config map 'pipeline-install-config'> Optional: false
DBCONFIG_DRIVER: mysql
DBCONFIG_DB_NAME: <set to the key 'cacheDb' of config map 'pipeline-install-config'> Optional: false
DBCONFIG_HOST_NAME: <set to the key 'dbHost' of config map 'pipeline-install-config'> Optional: false
DBCONFIG_PORT: <set to the key 'dbPort' of config map 'pipeline-install-config'> Optional: false
DBCONFIG_USER: <set to the key 'username' in secret 'mysql-secret'> Optional: false
DBCONFIG_PASSWORD: <set to the key 'password' in secret 'mysql-secret'> Optional: false
Mounts:
/etc/webhook/certs from webhook-tls-certs (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kubeflow-pipelines-cache-token-7pwl7 (ro)
istio-proxy:
Container ID:
Image: docker.io/istio/proxyv2:1.9.0
Image ID:
Port: 15090/TCP
Host Port: 0/TCP
Args:
proxy
sidecar
--domain
$(POD_NAMESPACE).svc.cluster.local
--serviceCluster
cache-server.$(POD_NAMESPACE)
--proxyLogLevel=warning
--proxyComponentLogLevel=misc:error
--log_output_level=default:info
--concurrency
2
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 10m
memory: 40Mi
Readiness: http-get http://:15021/healthz/ready delay=1s timeout=3s period=2s #success=1 #failure=30
Environment:
JWT_POLICY: first-party-jwt
PILOT_CERT_PROVIDER: istiod
CA_ADDR: istiod.istio-system.svc:15012
POD_NAME: cache-server-c64c68ddf-f7c9m (v1:metadata.name)
POD_NAMESPACE: kubeflow (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
SERVICE_ACCOUNT: (v1:spec.serviceAccountName)
HOST_IP: (v1:status.hostIP)
CANONICAL_SERVICE: (v1:metadata.labels['service.istio.io/canonical-name'])
CANONICAL_REVISION: (v1:metadata.labels['service.istio.io/canonical-revision'])
PROXY_CONFIG: {}
ISTIO_META_POD_PORTS: [
{"name":"webhook-api","containerPort":8443,"protocol":"TCP"}
]
ISTIO_META_APP_CONTAINERS: server
ISTIO_META_CLUSTER_ID: Kubernetes
ISTIO_META_INTERCEPTION_MODE: REDIRECT
ISTIO_META_WORKLOAD_NAME: cache-server
ISTIO_META_OWNER: kubernetes://apis/apps/v1/namespaces/kubeflow/deployments/cache-server
ISTIO_META_MESH_ID: cluster.local
TRUST_DOMAIN: cluster.local
Mounts:
/etc/istio/pod from istio-podinfo (rw)
/etc/istio/proxy from istio-envoy (rw)
/var/lib/istio/data from istio-data (rw)
/var/run/secrets/istio from istiod-ca-cert (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kubeflow-pipelines-cache-token-7pwl7 (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istio-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
istio-podinfo:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
metadata.annotations -> annotations
limits.cpu -> cpu-limit
requests.cpu -> cpu-request
istiod-ca-cert:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: istio-ca-root-cert
Optional: false
webhook-tls-certs:
Type: Secret (a volume populated by a Secret)
SecretName: webhook-server-tls
Optional: false
kubeflow-pipelines-cache-token-7pwl7:
Type: Secret (a volume populated by a Secret)
SecretName: kubeflow-pipelines-cache-token-7pwl7
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 49m (x5 over 124m) kubelet, node1 Unable to attach or mount volumes: unmounted volumes=[webhook-tls-certs], unattached volumes=[istio-data istio-envoy istio-podinfo kubeflow-pipelines-cache-token-7pwl7 webhook-tls-certs istiod-ca-cert]: timed out waiting for the condition
Warning FailedMount 24m (x17 over 133m) kubelet, node1 Unable to attach or mount volumes: unmounted volumes=[webhook-tls-certs], unattached volumes=[kubeflow-pipelines-cache-token-7pwl7 webhook-tls-certs istiod-ca-cert istio-data istio-envoy istio-podinfo]: timed out waiting for the condition
Warning FailedMount 19m (x8 over 121m) kubelet, node1 Unable to attach or mount volumes: unmounted volumes=[webhook-tls-certs], unattached volumes=[istio-envoy istio-podinfo kubeflow-pipelines-cache-token-7pwl7 webhook-tls-certs istiod-ca-cert istio-data]: timed out waiting for the condition
Warning FailedMount 9m38s (x72 over 139m) kubelet, node1 MountVolume.SetUp failed for volume "webhook-tls-certs" : secret "webhook-server-tls" not found
Warning FailedMount 3m58s (x10 over 130m) kubelet, node1 Unable to attach or mount volumes: unmounted volumes=[webhook-tls-certs], unattached volumes=[webhook-tls-certs istiod-ca-cert istio-data istio-envoy istio-podinfo kubeflow-pipelines-cache-token-7pwl7]: timed out waiting for the condition
kubernetes 1.17.17
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-4ab26a02-4566-4574-bed6-499f922898a1 20Gi RWO Delete Bound kubeflow/minio-pvc local-path 96m
pvc-578440fd-f835-4820-bf8a-bd4f2ed836d3 10Gi RWO Delete Bound istio-system/authservice-pvc local-path 152m
pvc-5b3e06ac-8ba9-4af8-afae-2201fbb47b9a 10Gi RWO Delete Bound kubeflow/katib-mysql local-path 148m
pvc-816d2df6-835c-4ac7-a8b7-e246d68107dd 20Gi RWO Delete Bound kubeflow/mysql-pv-claim local-path 97m
kubectl get certs -A
NAMESPACE NAME READY SECRET AGE
kubeflow admission-webhook-cert True webhook-certs 110m
kubeflow katib-webhook-cert True katib-webhook-cert 110m
kubeflow serving-cert True kfserving-webhook-server-cert 110m
kubectl get secret -A
NAMESPACE NAME TYPE DATA AGE
auth default-token-dr667 kubernetes.io/service-account-token 3 155m
auth dex-oidc-client Opaque 2 155m
auth dex-token-xg5nb kubernetes.io/service-account-token 3 155m
cattle-system cattle-credentials-e9faa6f Opaque 3 5h27m
cattle-system cattle-token-tgmq6 kubernetes.io/service-account-token 3 5h27m
cattle-system default-token-6hp5w kubernetes.io/service-account-token 3 5h27m
cattle-system kontainer-engine-token-677kr kubernetes.io/service-account-token 3 5h27m
cert-manager cert-manager-cainjector-token-gbgnh kubernetes.io/service-account-token 3 155m
cert-manager cert-manager-token-mmttc kubernetes.io/service-account-token 3 155m
cert-manager cert-manager-webhook-ca kubernetes.io/tls 3 155m
cert-manager cert-manager-webhook-tls kubernetes.io/tls 3 155m
cert-manager cert-manager-webhook-token-cwn4z kubernetes.io/service-account-token 3 155m
cert-manager default-token-sf6gn kubernetes.io/service-account-token 3 155m
default default-token-bhp26 kubernetes.io/service-account-token 3 5h28m
ingress-nginx default-token-w62jx kubernetes.io/service-account-token 3 5h27m
ingress-nginx nginx-ingress-serviceaccount-token-56f7d kubernetes.io/service-account-token 3 5h27m
istio-system cluster-local-gateway-service-account-token-vwfcd kubernetes.io/service-account-token 3 154m
istio-system default-token-vn9cc kubernetes.io/service-account-token 3 155m
istio-system istio-ca-secret istio.io/ca-root 5 151m
istio-system istio-ingressgateway-service-account-token-txqwn kubernetes.io/service-account-token 3 155m
istio-system istio-reader-service-account-token-xnw4g kubernetes.io/service-account-token 3 155m
istio-system istiod-service-account-token-b6k42 kubernetes.io/service-account-token 3 155m
istio-system oidc-authservice-client Opaque 2 155m
knative-eventing default-token-v8jpw kubernetes.io/service-account-token 3 155m
knative-eventing eventing-controller-token-xwklj kubernetes.io/service-account-token 3 155m
knative-eventing eventing-webhook-certs Opaque 3 155m
knative-eventing eventing-webhook-token-nf9lq kubernetes.io/service-account-token 3 155m
knative-eventing imc-controller-token-lm8cc kubernetes.io/service-account-token 3 155m
knative-eventing imc-dispatcher-token-q2m8l kubernetes.io/service-account-token 3 155m
knative-eventing pingsource-jobrunner-token-v2mz9 kubernetes.io/service-account-token 3 155m
knative-serving controller-token-bpfct kubernetes.io/service-account-token 3 155m
knative-serving default-token-r74bf kubernetes.io/service-account-token 3 155m
knative-serving istio-webhook-certs Opaque 3 155m
knative-serving webhook-certs Opaque 3 155m
kube-node-lease default-token-xm4q5 kubernetes.io/service-account-token 3 5h28m
kube-public default-token-ds2jt kubernetes.io/service-account-token 3 5h28m
kube-system attachdetach-controller-token-2fgjg kubernetes.io/service-account-token 3 5h28m
kube-system certificate-controller-token-r2rvx kubernetes.io/service-account-token 3 5h28m
kube-system clusterrole-aggregation-controller-token-sxrfk kubernetes.io/service-account-token 3 5h28m
kube-system coredns-autoscaler-token-bsbrs kubernetes.io/service-account-token 3 5h27m
kube-system coredns-token-vxfqb kubernetes.io/service-account-token 3 5h27m
kube-system cronjob-controller-token-62lqw kubernetes.io/service-account-token 3 5h28m
kube-system daemon-set-controller-token-24t97 kubernetes.io/service-account-token 3 5h28m
kube-system default-token-56s4x kubernetes.io/service-account-token 3 5h28m
kube-system deployment-controller-token-6bjkh kubernetes.io/service-account-token 3 5h28m
kube-system disruption-controller-token-klcws kubernetes.io/service-account-token 3 5h28m
kube-system endpoint-controller-token-vxvjs kubernetes.io/service-account-token 3 5h28m
kube-system expand-controller-token-29hz5 kubernetes.io/service-account-token 3 5h28m
kube-system flannel-token-np54t kubernetes.io/service-account-token 3 5h27m
kube-system generic-garbage-collector-token-gpnr4 kubernetes.io/service-account-token 3 5h28m
kube-system horizontal-pod-autoscaler-token-9fpnh kubernetes.io/service-account-token 3 5h28m
kube-system job-controller-token-s96hf kubernetes.io/service-account-token 3 5h28m
kube-system metrics-server-token-wx5gf kubernetes.io/service-account-token 3 5h27m
kube-system namespace-controller-token-9gcx8 kubernetes.io/service-account-token 3 5h28m
kube-system node-controller-token-5tzf7 kubernetes.io/service-account-token 3 5h28m
kube-system persistent-volume-binder-token-559jd kubernetes.io/service-account-token 3 5h28m
kube-system pod-garbage-collector-token-268jt kubernetes.io/service-account-token 3 5h28m
kube-system pv-protection-controller-token-7zzxj kubernetes.io/service-account-token 3 5h28m
kube-system pvc-protection-controller-token-58h6j kubernetes.io/service-account-token 3 5h28m
kube-system replicaset-controller-token-8mw42 kubernetes.io/service-account-token 3 5h28m
kube-system replication-controller-token-4zcqw kubernetes.io/service-account-token 3 5h28m
kube-system resourcequota-controller-token-rrm7j kubernetes.io/service-account-token 3 5h28m
kube-system rke-job-deployer-token-l7llh kubernetes.io/service-account-token 3 5h28m
kube-system rke-job-deployer-token-zpcnl kubernetes.io/service-account-token 3 169m
kube-system service-account-controller-token-lcxkt kubernetes.io/service-account-token 3 5h28m
kube-system service-controller-token-qsdbl kubernetes.io/service-account-token 3 5h28m
kube-system statefulset-controller-token-8xt4g kubernetes.io/service-account-token 3 5h28m
kube-system ttl-controller-token-kps8h kubernetes.io/service-account-token 3 5h28m
kubeflow-user-example-com default-editor-token-2jn5x kubernetes.io/service-account-token 3 129m
kubeflow-user-example-com default-token-85ff8 kubernetes.io/service-account-token 3 129m
kubeflow-user-example-com default-viewer-token-pzhq8 kubernetes.io/service-account-token 3 129m
kubeflow-user-example-com mlpipeline-minio-artifact Opaque 2 129m
kubeflow admission-webhook-service-account-token-vcbnz kubernetes.io/service-account-token 3 154m
kubeflow argo-token-fvvmc kubernetes.io/service-account-token 3 154m
kubeflow centraldashboard-token-dvxd9 kubernetes.io/service-account-token 3 154m
kubeflow default-token-fcc5w kubernetes.io/service-account-token 3 154m
kubeflow jupyter-web-app-service-account-token-cgkpr kubernetes.io/service-account-token 3 154m
kubeflow katib-controller-token-rn42w kubernetes.io/service-account-token 3 154m
kubeflow katib-mysql-secrets Opaque 1 154m
kubeflow katib-ui-token-gf6qc kubernetes.io/service-account-token 3 154m
kubeflow katib-webhook-cert kubernetes.io/tls 3 111m
kubeflow kfserving-webhook-server-cert kubernetes.io/tls 3 111m
kubeflow kfserving-webhook-server-secret Opaque 0 154m
kubeflow kubeflow-pipelines-cache-deployer-sa-token-9cbqj kubernetes.io/service-account-token 3 154m
kubeflow kubeflow-pipelines-cache-token-7pwl7 kubernetes.io/service-account-token 3 154m
kubeflow kubeflow-pipelines-container-builder-token-nghbx kubernetes.io/service-account-token 3 154m
kubeflow kubeflow-pipelines-metadata-writer-token-bk84c kubernetes.io/service-account-token 3 154m
kubeflow kubeflow-pipelines-viewer-token-qhmst kubernetes.io/service-account-token 3 154m
kubeflow meta-controller-service-token-465dk kubernetes.io/service-account-token 3 154m
kubeflow metadata-grpc-server-token-wd2q5 kubernetes.io/service-account-token 3 154m
kubeflow ml-pipeline-persistenceagent-token-ppp6v kubernetes.io/service-account-token 3 154m
kubeflow ml-pipeline-scheduledworkflow-token-4tjvm kubernetes.io/service-account-token 3 154m
kubeflow ml-pipeline-token-xgnqr kubernetes.io/service-account-token 3 154m
kubeflow ml-pipeline-ui-token-4fbg6 kubernetes.io/service-account-token 3 154m
kubeflow ml-pipeline-viewer-crd-service-account-token-vq78r kubernetes.io/service-account-token 3 154m
kubeflow ml-pipeline-visualizationserver-token-9p5wh kubernetes.io/service-account-token 3 154m
kubeflow mlpipeline-minio-artifact Opaque 2 154m
kubeflow mpi-operator-token-t5pc2 kubernetes.io/service-account-token 3 154m
kubeflow mxnet-operator-token-m8b5k kubernetes.io/service-account-token 3 154m
kubeflow mysql-secret Opaque 2 154m
kubeflow mysql-token-dlmfc kubernetes.io/service-account-token 3 154m
kubeflow notebook-controller-service-account-token-wrs8f kubernetes.io/service-account-token 3 154m
kubeflow pipeline-runner-token-4w59r kubernetes.io/service-account-token 3 154m
kubeflow profiles-controller-service-account-token-j966m kubernetes.io/service-account-token 3 154m
kubeflow pytorch-operator-token-l2xk4 kubernetes.io/service-account-token 3 154m
kubeflow tensorboard-controller-token-ffwrg kubernetes.io/service-account-token 3 154m
kubeflow tensorboards-web-app-service-account-token-fmwbr kubernetes.io/service-account-token 3 154m
kubeflow tf-job-operator-token-wbwvx kubernetes.io/service-account-token 3 154m
kubeflow volumes-web-app-service-account-token-4khfs kubernetes.io/service-account-token 3 154m
kubeflow webhook-certs kubernetes.io/tls 3 111m
kubeflow xgboost-operator-service-account-token-gmj65 kubernetes.io/service-account-token 3 154m
local-path-storage default-token-8np4s kubernetes.io/service-account-token 3 160m
local-path-storage local-path-provisioner-service-account-token-5bl2n kubernetes.io/service-account-token 3 160m
security-scan default-token-p7mnr kubernetes.io/service-account-token 3 5h27m
@HarborZeng your problem like is mutatingwebhookconfigurations
problem. you can use the command to see it:
$ kubectl get mutatingwebhookconfigurations -A
NAME WEBHOOKS AGE
admission-webhook-mutating-webhook-configuration 1 23h
cache-webhook-kubeflow 1 23h
cert-manager-webhook 1 23h
inferenceservice.serving.kubeflow.org 3 23h
istio-sidecar-injector 1 30d
katib.kubeflow.org 2 23h
sinkbindings.webhook.sources.knative.dev 1 23h
webhook.eventing.knative.dev 1 23h
webhook.istio.networking.internal.knative.dev 1 23h
webhook.serving.knative.dev 1 23h
if you have cache-webhook-kubeflow
, you can see the issue: https://github.com/kubeflow/pipelines/issues/3815#issuecomment-643651401
and you also can read this patch: https://github.com/kubeflow/pipelines/pull/3992/commits/2789657496a296c3275f92a8492f50423d7ed13f
if cache-webhook-kubeflow
is in the mutatingwebhookconfigurations
and webhook-tls-certs
is not in secret, you can delete cache-webhook-kubeflow
and reinstall it:
kubectl delete mutatingwebhookconfigurations cache-webhook-kubeflow
kubectl delete -f mainfest1.3/
python install.py
@shikanon 我的 mutatingwebhookconfigurations
里面并没有 cache-webhook-kubeflow
$ kubectl get mutatingwebhookconfigurations -A
NAME CREATED AT
admission-webhook-mutating-webhook-configuration 2021-05-20T03:03:36Z
cert-manager-webhook 2021-05-20T03:02:30Z
inferenceservice.serving.kubeflow.org 2021-05-20T03:03:29Z
istio-sidecar-injector 2021-05-20T03:02:36Z
katib.kubeflow.org 2021-05-20T03:03:32Z
sinkbindings.webhook.sources.knative.dev 2021-05-20T03:02:57Z
webhook.eventing.knative.dev 2021-05-20T03:02:57Z
webhook.istio.networking.internal.knative.dev 2021-05-20T03:02:52Z
webhook.serving.knative.dev 2021-05-20T03:02:52Z
我又查了查资料,最终这里,找到了原因,重新 python install.py
之后,pods 终于启动了
kubeflow admission-webhook-deployment-54cf94d964-8qsh2 1/1 Running 0 47m
kubeflow cache-deployer-deployment-65cd55d4d9-d6dzd 2/2 Running 11 47m
kubeflow cache-server-f85c69486-rgzq6 2/2 Running 0 47m
kubeflow centraldashboard-7b7676d8bd-w5jw6 1/1 Running 0 50m
想请问一下,关联数据库的pod是怎么起来的,我的katib-db-manager和katib-mysql起不来,错误信息如下:
katib-db-manager: E0827 00:35:04.510080 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.12.87:3306: connect: connection refused E0827 00:35:09.467185 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.12.87:3306: connect: connection refused E0827 00:35:14.467236 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.12.87:3306: connect: connection refused E0827 00:35:19.466911 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.12.87:3306: connect: connection refused
katib:mysql:
mysqld: Table 'mysql.plugin' doesn't exist 2021-08-27T00:38:50.580357Z 0 [ERROR] [MY-010735] [Server] Could not open the mysql.plugin table. Please perform the MySQL upgrade procedure. 2021-08-27T00:38:50.581584Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T00:38:50.582534Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T00:38:50.583506Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T00:38:50.584457Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T00:38:50.585447Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T00:38:50.588551Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T00:38:50.589593Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables
想请问一下,关联数据库的pod是怎么起来的,我的katib-db-manager和katib-mysql起不来,错误信息如下:
katib-db-manager: E0827 00:35:04.510080 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.12.87:3306: connect: connection refused E0827 00:35:09.467185 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.12.87:3306: connect: connection refused E0827 00:35:14.467236 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.12.87:3306: connect: connection refused E0827 00:35:19.466911 1 mysql.go:78] Ping to Katib db failed: dial tcp 10.96.12.87:3306: connect: connection refused
katib:mysql:
mysqld: Table 'mysql.plugin' doesn't exist 2021-08-27T00:38:50.580357Z 0 [ERROR] [MY-010735] [Server] Could not open the mysql.plugin table. Please perform the MySQL upgrade procedure. 2021-08-27T00:38:50.581584Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T00:38:50.582534Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T00:38:50.583506Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T00:38:50.584457Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T00:38:50.585447Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T00:38:50.588551Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2021-08-27T00:38:50.589593Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables
说实话,全靠运气,反正不行就删了重来,总有一次会成功。。。不过后来我发现,就算数据库这个pod起不来,也不影响notebook server的使用
@HarborZeng @WMeng1 数据库这个pod 是很简单实现的,可以看这个yaml https://github.com/shikanon/kubeflow-manifests/blob/50ee9f1e0aef5f69620db89c9ae2f81c9b2d96e3/manifest1.3/019-katib-installs-katib-with-kubeflow-cert-manager.yaml#L620 挂载了一个数据盘和设置了账号密码,你们自己也可以其一个这个名字的deployment代替。
你们删除时候保证 pvc 被删除了应该是不会出问题的:
kubectl get pvc -A
来查看是否相关PVC都被卸载了
使用 Rancher 创建了 kubernetes 集群,1.17 版本
然后在 control plane 节点上配置了此集群的
~/.kube/config
,所以 control plane 节点上可以使用kubectl
,执行了python3 install.py
,然后等待了半小时以后,发现有 7 个 pods 在报错他们分别是:
authservice
admission-webhook-deployment
cache-deployer-deployment
日志:ERROR: After approving csr cache-server.kubeflow, the signed certificate did not appear on the resource. Giving up after 10 attempts.
cache-server
katib-controller
katib-db-manager
日志:
Ping to Katib db failed: dial tcp 10.43.240.163:3306: connect: no route to host
Failed to open db connection: DB open failed: Timeout waiting for DB conn successfully opened.
katib-mysql
kfserving-controller-manager
不知道是什么原因,请求您的帮助