Closed XMingG99 closed 1 month ago
Hi @XMingG99, just want to be sure:
https://karmada-webhook.karmada-system.svc:443
?
- The karmada-webhook pod is running? or crashed sometimes?
I checked pods, the karmada-webhook pod is running, never crash. Restart count is zero.
- Can you run a curl pod in host cluster, and try to curl the host
https://karmada-webhook.karmada-system.svc:443
?
How to create this pod ? Use which image?
Hi, @XMingG99 You can run the following coomand with karmada-host context:
kubectl run mycurlpod --image=curlimages/curl -i --tty -- sh
Hi, @jwcesign
I run the commad:
kubectl --kubeconfig /etc/karmada/karmada-apiserver.config run mycurlpod --image=curlimages/curl -i --tty -- sh
and I got this:
mycurlpod 0/1 Pending 0 79s
...
describe pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ApplyPolicyFailed 97s resource-detector No policy match for resource
Hi, @XMingG99 You should run with karmada-host context, not karmada-apiserver context
Hi, @jwcesign OK, that is my mistake, and I got this now:
kubectl run mycurlpod --image=curlimages/curl -i --tty -- sh
If you don't see a command prompt, try pressing enter.
/ $ curl https://karmada-webhook.karmada-system.svc:443
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
/ $ curl -k https://karmada-webhook.karmada-system.svc:443
404 page not found
/ $
The webhook looks fine, can you try more requests, and check whether it will have an error about connection timeout?
Hi @jwcesign Yes, I have tried some instances and most of the time it is correct, but occasionally this error just happens
Hi @XMingG99 From the test results, it appears that the issue is related to the cluster network. Can you try to increase the timeout for Webhook? https://github.com/karmada-io/karmada/blob/b8abb446b8963b4e1602365daf82f6066083956d/artifacts/deploy/webhook-configuration.yaml#L21
@jwcesign
OK, I'll try that, thank U
@XMingG99 Hi I guess there is something wrong with dns. You can check whether dns resolution is normal in apiserver pod, like:
dig karmada-webhook.karmada-system.svc
Hi @whitewindmills I'll try that, but what's the shell path of apiserver?
kubectl exec -ti karmada-apiserver-7bb9b7556f-dj8tt -n karmada-system /bin/sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "848298cd5e8f098578a85fc45604d0a70a3d34a4a7145858b97311edc5656343": OCI runtime exec failed: exec failed: unable to start container process: exec: "/bin/sh": stat /bin/sh: no such file or directory: unknown```
I'll try that, but what's the shell path of apiserver?
Kindly try /bin/bash
. But it depends on your base OS image.
@whitewindmills I had tried /bin/bash before /bin/sh,which did not work, I'll try another way. By the way, My OS is Ubuntu
Do you know your kube-dns service address? You can also test it by this way:
dig karmada-webhook.karmada-system.svc.cluster.local @10.10.2.1
Notes: 10.10.2.1
is my dns service address, and you'd better test it multiple times and see if there are any results that cannot be parsed normally.
The webhook works sometimes, so I think the DNS works fine
Yes, I'm trying ping dns. I'll record one night.
@whitewindmills
@jwcesign
Hi, guys. After a night of testing, the DNS was completely correct without any errors
So how about increasing the timeout? Would that solve the problem
Hi,@jwcesign OK, I'll do that
When I encountered this problem, I found that it was because the DNS service of one node was not working properly, which caused occasional dns resolution failures. But the cause of this problem could be various, I suggest you adjust the audit log level of karmada-apiserver to RequestResponse
to view more detailed information if you are sure that your dns service is normal. And if you have any confusion about the content of the audit log, you can paste it here and let us help you.
OK, thank you. I'll do more testing.
Seem it's not a bug. Kindly let us know if you have further questions. /remove-kind bug /kind question
So how about increasing the timeout? Would that solve the problem
That might not be a good idea. timeout=3s
is already long enough. We should try to figure out the root cause.
As a lack of activity, let's close this first, feel free to reopen it if you still need it. /close
@XiShanYongYe-Chang: Closing this issue.
I use client-go to operator pp like this:
err := clientSet.PolicyV1alpha1().PropagationPolicies(oldPp.Namespace).Delete(context.Background(), oldPp.Name, metav1.DeleteOptions{}) ... result, err := clientSet.PolicyV1alpha1().PropagationPolicies(newPp.Namespace).Create(context.Background(), newPp, metav1.CreateOptions{})
delete was succeed, but create will get error:Internal error occurred: failed calling webhook "propagationpolicy.karmada.io": failed to call webhook: Post "https://karmada-webhook.karmada-system.svc:443/mutate-propagationpolicy?timeout=3s": context deadline exceeded
Sometimes the operation is normal, but sometimes it will go wrong, the chance of error is about 10%. Does anybody know why?
My Env: Karmada v1.4.2 Ubuntu 20.04 K8s v1.24.10