Closed zrcx123 closed 1 year ago
The error report seems to be that the webhook component of chaosmeta has not been deployed. May I ask whether it has been deployed according to the document (https://chaosmeta.gitbook.io/chaosmeta-cn/an-zhuang-zhi-yin/jiao-ben-an-zhuang) operator component? Please use the command "kubectl get all -n chaosmeta" to check it.
确定完成chaosmeta部署后仍然报错 $kubectl get all -n chaosmeta NAME READY STATUS RESTARTS AGE pod/chaosmeta-daemonset-sfk5p 1/1 Running 0 3m24s pod/chaosmeta-inject-controller-manager-85f9c94684-9prbg 1/1 Running 0 8m24s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/chaosmeta-inject-webhook-service ClusterIP 10.96.180.73
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/chaosmeta-daemonset 1 1 1 1 1 chaos-role.chaosmeta.io=chaosmeta-daemon 8m17s
NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/chaosmeta-inject-controller-manager 1/1 1 1 8m24s
NAME DESIRED CURRENT READY AGE replicaset.apps/chaosmeta-inject-controller-manager-85f9c94684 1 1 1 8m24s
$kubectl apply -f 111.yaml Error from server (InternalError): error when creating "111.yaml": Internal error occurred: failed calling webhook "mexperiment.kb.io": failed to call webhook: Post "https://chaosmeta-inject-webhook-service.chaosmeta.svc:443/mutate-inject-chaosmeta-io-v1alpha1-experiment?timeout=10s": dial tcp 10.96.180.73:443: connect: connection timed out
使用命令:kubectl get MutatingWebhookConfiguration chaosmeta-inject-mutating-webhook-configuration -o yaml 检查一下caBundle是否已经配置。 同时也可以在执行后查看operator的日志:kubectl logs chaosmeta-inject-controller-manager-85f9c94684-9prbg -n chaosmeta
2项配置都已存在
$kubectl get MutatingWebhookConfiguration chaosmeta-inject-mutating-webhook-configuration -o yaml apiVersion: admissionregistration.k8s.io/v1 kind: MutatingWebhookConfiguration metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"admissionregistration.k8s.io/v1","kind":"MutatingWebhookConfiguration","metadata":{"annotations":{},"creationTimestamp":null,"name":"chaosmeta-inject-mutating-webhook-configuration"},"webhooks":[{"admissionReviewVersions":["v1"],"clientConfig":{"service":{"name":"chaosmeta-inject-webhook-service","namespace":"chaosmeta","path":"/mutate-inject-chaosmeta-io-v1alpha1-experiment"}},"failurePolicy":"Fail","name":"mexperiment.kb.io","rules":[{"apiGroups":["inject.chaosmeta.io"],"apiVersions":["v1alpha1"],"operations":["CREATE"],"resources":["experiments"]}],"sideEffects":"None"}]} creationTimestamp: "2023-05-31T03:19:35Z" generation: 2 name: chaosmeta-inject-mutating-webhook-configuration resourceVersion: "5223838" uid: ff855944-a4cb-4b3b-b24f-60460b8ce21e webhooks:
====================================================================
$kubectl logs chaosmeta-inject-controller-manager-85f9c94684-9prbg -n chaosmeta 2023-05-31T03:22:02Z INFO controller-runtime.metrics Metrics server is starting to listen {"addr": ":8080"} 2023-05-31T03:22:02Z INFO setup set main config success: &{{16} {2} {daemonset chaosmetad 0.1.1 {29595} {/tmp chaosmeta map[app.chaosmeta.io:chaosmeta-daemon] false map[]}}} 2023-05-31T03:22:02Z INFO setup set goroutine pool success: 16 2023-05-31T03:22:02Z INFO setup set APIServer for cloud object success: [pod deployment node namespace job] 2023-05-31T03:22:02Z INFO setup set remote executor success: daemonset 2023-05-31T03:22:02Z INFO controller-runtime.builder Registering a mutating webhook {"GVK": "inject.chaosmeta.io/v1alpha1, Kind=Experiment", "path": "/mutate-inject-chaosmeta-io-v1alpha1-experiment"} 2023-05-31T03:22:02Z INFO controller-runtime.webhook Registering webhook {"path": "/mutate-inject-chaosmeta-io-v1alpha1-experiment"} 2023-05-31T03:22:02Z INFO controller-runtime.builder Registering a validating webhook {"GVK": "inject.chaosmeta.io/v1alpha1, Kind=Experiment", "path": "/validate-inject-chaosmeta-io-v1alpha1-experiment"} 2023-05-31T03:22:02Z INFO controller-runtime.webhook Registering webhook {"path": "/validate-inject-chaosmeta-io-v1alpha1-experiment"} 2023-05-31T03:22:02Z INFO setup starting manager 2023-05-31T03:22:02Z INFO start auto recover checker success, ticker second: 2 2023-05-31T03:22:02Z INFO controller-runtime.webhook.webhooks Starting webhook server 2023-05-31T03:22:02Z INFO Starting server {"kind": "health probe", "addr": "[::]:8081"} 2023-05-31T03:22:02Z INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"} 2023-05-31T03:22:02Z INFO controller-runtime.certwatcher Updated current TLS certificate 2023-05-31T03:22:02Z INFO controller-runtime.webhook Serving webhook server {"host": "", "port": 9443} 2023-05-31T03:22:02Z INFO controller-runtime.certwatcher Starting certificate watcher I0531 03:22:02.767372 1 leaderelection.go:248] attempting to acquire leader lease chaosmeta/9cb44693.chaosmeta.io... I0531 03:22:02.779240 1 leaderelection.go:258] successfully acquired lease chaosmeta/9cb44693.chaosmeta.io 2023-05-31T03:22:02Z INFO Starting EventSource {"controller": "experiment", "controllerGroup": "inject.chaosmeta.io", "controllerKind": "Experiment", "source": "kind source: *v1alpha1.Experiment"} 2023-05-31T03:22:02Z INFO Starting Controller {"controller": "experiment", "controllerGroup": "inject.chaosmeta.io", "controllerKind": "Experiment"} 2023-05-31T03:22:02Z DEBUG events chaosmeta-inject-controller-manager-85f9c94684-9prbg_f0ecd7f7-cdcf-45c5-8080-ccde2d783341 became leader {"type": "Normal", "object": {"kind":"Lease","namespace":"chaosmeta","name":"9cb44693.chaosmeta.io","uid":"90395e33-cac2-4d17-81cf-b3b419c7ff2b","apiVersion":"coordination.k8s.io/v1","resourceVersion":"5223935"}, "reason": "LeaderElection"} 2023-05-31T03:22:02Z INFO Starting workers {"controller": "experiment", "controllerGroup": "inject.chaosmeta.io", "controllerKind": "Experiment", "worker count": 1}
svc的ip+端口能telnet通吗? 还有operator的9443端口
svc的ip+端口不能telnet通,显示链接超时 $kubectl get all -n chaosmeta NAME READY STATUS RESTARTS AGE pod/chaosmeta-daemonset-sfk5p 1/1 Running 0 23h pod/chaosmeta-inject-controller-manager-85f9c94684-9prbg 1/1 Running 0 23h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/chaosmeta-inject-webhook-service ClusterIP 10.96.180.73
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/chaosmeta-daemonset 1 1 1 1 1 chaos-role.chaosmeta.io=chaosmeta-daemon 23h
NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/chaosmeta-inject-controller-manager 1/1 1 1 23h
NAME DESIRED CURRENT READY AGE replicaset.apps/chaosmeta-inject-controller-manager-85f9c94684 1 1 1 23h
$telnet 10.96.180.73 443 Trying 10.96.180.73... telnet: connect to address 10.96.180.73: Connection timed out
======================================================
operator的可以telnet通
$kubectl get po -n chaosmeta -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
chaosmeta-daemonset-sfk5p 1/1 Running 0 23h 11.162.217.126 sqaappecsv62s2011162217126.sa128
$telnet 10.244.227.146 9443 Trying 10.244.227.146... Connected to 10.244.227.146. Escape character is '^]'.
那问题应该出在svc到pod的路由规则了,检查一下svc的配置以及operator的标签 kubectl get svc chaosmeta-inject-webhook-service -n chaosmeta -o yaml
kubectl get po chaosmeta-inject-controller-manager-85f9c94684-9prbg -n chaosmeta -o yaml
$kubectl get svc chaosmeta-inject-webhook-service -n chaosmeta -o yaml apiVersion: v1 kind: Service metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"webhook","app.kubernetes.io/created-by":"chaosmeta-inject-operator","app.kubernetes.io/instance":"webhook-service","app.kubernetes.io/managed-by":"kustomize","app.kubernetes.io/name":"service","app.kubernetes.io/part-of":"chaosmeta-inject-operator"},"name":"chaosmeta-inject-webhook-service","namespace":"chaosmeta"},"spec":{"ports":[{"port":443,"protocol":"TCP","targetPort":9443}],"selector":{"control-plane":"controller-manager"}}} creationTimestamp: "2023-05-31T03:19:35Z" labels: app.kubernetes.io/component: webhook app.kubernetes.io/created-by: chaosmeta-inject-operator app.kubernetes.io/instance: webhook-service app.kubernetes.io/managed-by: kustomize app.kubernetes.io/name: service app.kubernetes.io/part-of: chaosmeta-inject-operator name: chaosmeta-inject-webhook-service namespace: chaosmeta resourceVersion: "5223635" uid: 09002b06-194e-47c6-ba2a-3a5645be1f8c spec: clusterIP: 10.96.180.73 clusterIPs:
=============================================================================
$kubectl get po chaosmeta-inject-controller-manager-85f9c94684-9prbg -n chaosmeta -o yaml apiVersion: v1 kind: Pod metadata: annotations: cni.projectcalico.org/containerID: 17eb5d13ca7da00fbd679384e971d9e050209dc560ea53338c352f2de30c13b1 cni.projectcalico.org/podIP: 10.244.227.146/32 cni.projectcalico.org/podIPs: 10.244.227.146/32 kubectl.kubernetes.io/default-container: manager creationTimestamp: "2023-05-31T03:19:35Z" generateName: chaosmeta-inject-controller-manager-85f9c94684- labels: control-plane: controller-manager pod-template-hash: 85f9c94684 name: chaosmeta-inject-controller-manager-85f9c94684-9prbg namespace: chaosmeta ownerReferences:
配置应该没问题,selector和namespace都是匹配的,试一下下面的命令看一下选中的后端正确不,是不是你的operator的pod ip kubectl get ep chaosmeta-inject-webhook-service -n chaosmeta
$kubectl get ep chaosmeta-inject-webhook-service -n chaosmeta NAME ENDPOINTS AGE chaosmeta-inject-webhook-service 10.244.227.146:9443 24h
kubernetes 相关的实例以及属性确实是正常的了,我猜测是node节点的防火墙的配置问题以及网络层面的问题了。你的集群中其他普通的svc能正常工作吗?最好是其他的后端pod也分配到这个node上的svc
集群中其他普通的svc都可以正常工作的,防火墙的配置都已经关闭了,不知道是啥原因。
secret/webhook-server-cert created mutatingwebhookconfiguration.admissionregistration.k8s.io/chaosmeta-inject-mutating-webhook-configuration patched validatingwebhookconfiguration.admissionregistration.k8s.io/chaosmeta-inject-validating-webhook-configuration patched
不是这个原因,这个是提示docker run使用的镜像不在本地的意思,然后会自动从镜像库拉取,然后给webhook服务生成tls密钥
模拟Kubernetes原子故障注入能力:删除pod 报错:kubectl apply -f 111.yaml Error from server (InternalError): error when creating "111.yaml": Internal error occurred: failed calling webhook "mexperiment.kb.io": failed to call webhook: Post "https://chaosmeta-inject-webhook-service.chaosmeta.svc:443/mutate-inject-chaosmeta-io-v1alpha1-experiment?timeout=10s": dial tcp 10.99.2.161:443: connect: connection timed out
$kubectl get pod -n obcluster NAME READY STATUS RESTARTS AGE sapp-ob-test-cn-zone1-0 2/2 Running 0 25h sapp-ob-test-cn-zone2-0 2/2 Running 0 25h sapp-ob-test-cn-zone3-0 2/2 Running 0 25h
111.yaml配置文件内容如下 $cat 111.yaml apiVersion: inject.chaosmeta.io/v1alpha1 kind: Experiment metadata: name: kubernetes-pod-delete-experiment namespace: chaosmeta spec: scope: kubernetes targetPhase: inject rangeMode: type: count value: 2 experiment: target: pod fault: delete duration: 10m selector: