k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.64k stars 2.32k forks source link

cURL to external internet API timeouts every few requests within Pods but works perfectly on node. #2928

Closed lvandyk closed 3 years ago

lvandyk commented 3 years ago

Environmental Info: K3s Version: v1.20.2+k3s1 (1d4adb03)

Node(s) CPU architecture, OS, and Version: Linux 5.4.0-47-generic #51-Ubuntu SMP Fri Sep 4 19:50:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: 1 Master

Describe the bug: Every 3rd or 4th cURL request to an external APIs on the internet times out. As if the outgoing networking is getting dropped/blocked. This happens within different pods (e.g one running PHP and one running dotnetcore). Running the exact same on our live environment in GCP works perfectly.

There are no firewall rules enabled for outgoing connection. Running the cURLs manually on the node itself works perfectly. It only happens when running the cURLs inside the pod.

Steps To Reproduce:

Expected behavior: cURL should not timeout if I POST to external internet APIs from within Pods.

Actual behavior: cURL times out every 3rd or 4th POST to an external API on the internet.

Additional context / logs: This happens with any external API, e.g trying to POST to hooks.slack.com, or trying to send a SMS via Twilio. Even notify.bugsnag.com times out.

If I cancel the cURL and immediately run it again, there is a better chance of it working.

Here is an example of trying to cURL (SSL_VERIFYPEER and SSL_VERIFYHOST have even been switched off):

cURL error 7: Failed to connect to hooks.slack.com port 443: Connection timed out

brandond commented 3 years ago

Do you have any firewall enabled on the nodes, even if you don't believe it to be affecting outgoing traffic? If so, can you disable it? This sounds like something is dropping client traffic when the source port is outside of a defined range, which would be a firewall or security group rule misconfiguration.

lvandyk commented 3 years ago

@brandond No firewalls at all. Not on the node itself or via the hosting provider. Every few requests DO get through. So it can't be a firewall can it?

lvandyk commented 3 years ago

Solved by using cilium instead of flannel as mentioned in: https://github.com/k3s-io/k3s/issues/763