karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.11k stars 805 forks source link

“Error: Internal error occurred: failed calling webhook ...” #3646

Closed XMingG99 closed 1 month ago

XMingG99 commented 10 months ago

I use client-go to operator pp like this: err := clientSet.PolicyV1alpha1().PropagationPolicies(oldPp.Namespace).Delete(context.Background(), oldPp.Name, metav1.DeleteOptions{}) ... result, err := clientSet.PolicyV1alpha1().PropagationPolicies(newPp.Namespace).Create(context.Background(), newPp, metav1.CreateOptions{}) delete was succeed, but create will get error: Internal error occurred: failed calling webhook "propagationpolicy.karmada.io": failed to call webhook: Post "https://karmada-webhook.karmada-system.svc:443/mutate-propagationpolicy?timeout=3s": context deadline exceeded

Sometimes the operation is normal, but sometimes it will go wrong, the chance of error is about 10%. Does anybody know why?

My Env: Karmada v1.4.2 Ubuntu 20.04 K8s v1.24.10

jwcesign commented 10 months ago

Hi @XMingG99, just want to be sure:

  1. The karmada-webhook pod is running? or crashed sometimes?
  2. Can you run a curl pod in host cluster, and try to curl the host https://karmada-webhook.karmada-system.svc:443?
XMingG99 commented 10 months ago
  1. The karmada-webhook pod is running? or crashed sometimes?

I checked pods, the karmada-webhook pod is running, never crash. Restart count is zero. image

  1. Can you run a curl pod in host cluster, and try to curl the host https://karmada-webhook.karmada-system.svc:443?

How to create this pod ? Use which image?

jwcesign commented 10 months ago

Hi, @XMingG99 You can run the following coomand with karmada-host context:

kubectl run mycurlpod --image=curlimages/curl -i --tty -- sh
XMingG99 commented 10 months ago

Hi, @jwcesign I run the commad: kubectl --kubeconfig /etc/karmada/karmada-apiserver.config run mycurlpod --image=curlimages/curl -i --tty -- sh

and I got this:

mycurlpod   0/1     Pending   0          79s

...

describe pod:
Events:
  Type     Reason             Age   From               Message
  ----     ------             ----  ----               -------
  Warning  ApplyPolicyFailed  97s   resource-detector  No policy match for resource
jwcesign commented 10 months ago

Hi, @XMingG99 You should run with karmada-host context, not karmada-apiserver context

XMingG99 commented 10 months ago

Hi, @jwcesign OK, that is my mistake, and I got this now:

kubectl run mycurlpod --image=curlimages/curl -i --tty -- sh
If you don't see a command prompt, try pressing enter.
/ $ curl https://karmada-webhook.karmada-system.svc:443
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
/ $ curl -k https://karmada-webhook.karmada-system.svc:443
404 page not found
/ $ 
jwcesign commented 10 months ago

The webhook looks fine, can you try more requests, and check whether it will have an error about connection timeout?

XMingG99 commented 10 months ago

Hi @jwcesign Yes, I have tried some instances and most of the time it is correct, but occasionally this error just happens

jwcesign commented 10 months ago

Hi @XMingG99 From the test results, it appears that the issue is related to the cluster network. Can you try to increase the timeout for Webhook? https://github.com/karmada-io/karmada/blob/b8abb446b8963b4e1602365daf82f6066083956d/artifacts/deploy/webhook-configuration.yaml#L21

XMingG99 commented 10 months ago

@jwcesign
OK, I'll try that, thank U

whitewindmills commented 10 months ago

@XMingG99 Hi I guess there is something wrong with dns. You can check whether dns resolution is normal in apiserver pod, like:

dig karmada-webhook.karmada-system.svc
XMingG99 commented 10 months ago

Hi @whitewindmills I'll try that, but what's the shell path of apiserver?


kubectl exec -ti karmada-apiserver-7bb9b7556f-dj8tt -n karmada-system /bin/sh

kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "848298cd5e8f098578a85fc45604d0a70a3d34a4a7145858b97311edc5656343": OCI runtime exec failed: exec failed: unable to start container process: exec: "/bin/sh": stat /bin/sh: no such file or directory: unknown```
whitewindmills commented 10 months ago

I'll try that, but what's the shell path of apiserver?

Kindly try /bin/bash. But it depends on your base OS image.

XMingG99 commented 10 months ago

@whitewindmills I had tried /bin/bash before /bin/sh,which did not work, I'll try another way. By the way, My OS is Ubuntu

whitewindmills commented 10 months ago

Do you know your kube-dns service address? You can also test it by this way:

dig karmada-webhook.karmada-system.svc.cluster.local @10.10.2.1

Notes: 10.10.2.1 is my dns service address, and you'd better test it multiple times and see if there are any results that cannot be parsed normally.

jwcesign commented 10 months ago

The webhook works sometimes, so I think the DNS works fine

XMingG99 commented 10 months ago

Yes, I'm trying ping dns. I'll record one night.

XMingG99 commented 10 months ago

@whitewindmills @jwcesign
Hi, guys. After a night of testing, the DNS was completely correct without any errors

jwcesign commented 10 months ago

So how about increasing the timeout? Would that solve the problem

XMingG99 commented 10 months ago

Hi,@jwcesign OK, I'll do that

whitewindmills commented 10 months ago

When I encountered this problem, I found that it was because the DNS service of one node was not working properly, which caused occasional dns resolution failures. But the cause of this problem could be various, I suggest you adjust the audit log level of karmada-apiserver to RequestResponse to view more detailed information if you are sure that your dns service is normal. And if you have any confusion about the content of the audit log, you can paste it here and let us help you.

XMingG99 commented 10 months ago

OK, thank you. I'll do more testing.

whitewindmills commented 10 months ago

Seem it's not a bug. Kindly let us know if you have further questions. /remove-kind bug /kind question

RainbowMango commented 10 months ago

So how about increasing the timeout? Would that solve the problem

That might not be a good idea. timeout=3s is already long enough. We should try to figure out the root cause.

XiShanYongYe-Chang commented 1 month ago

As a lack of activity, let's close this first, feel free to reopen it if you still need it. /close

karmada-bot commented 1 month ago

@XiShanYongYe-Chang: Closing this issue.

In response to [this](https://github.com/karmada-io/karmada/issues/3646#issuecomment-1976437319): >As a lack of activity, let's close this first, feel free to reopen it if you still need it. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.