nforgeio / neonKUBE

Public NeonKUBE Kubernetes distribution related projects
https://neonkube.io
Apache License 2.0
78 stars 13 forks source link

HAProxy/pfSense client ephemeral port exhaustion #275

Closed jefflill closed 5 years ago

jefflill commented 6 years ago

The current neonHIVE configuration can run into Linux SNAT/DNAT port exhaustion related issues when scaling the network traffic to medium or high loads. This problem can surface due to the Docker ingress/mesh network DNAT iptables rules but it can also happen in other places like the pfSense DMZ load balancer rules that direct external traffic to to cluster nodes.

There appear to be two somewhat related problems:

  1. At high load, traffic being proxied by a load balancer or transformed by a DNAT will have the same source IP so only the source port can be varied when establishing a connection to the backend server. When the backend connection is closed, the source port will go into the TIME_WAIT for 2 minutes (on Linux) and cannot be reused again during this time. neonHIVE currently configures the kernel to allocate ephemeral ports in the range 9000-65535 (56535 ports) so assuming each backend connection is closed immediately so that the source port goes into TIME_WAIT, the maximum connections/sec is 56535/120 = 471/sec per hive host.

  2. It also appears to be a Linux kernel race condition that can cause two inbound connections to be assigned the same DNAT source port resulting in SYN packets being dropped and then re-transmission delays. This is discussed in detail here. Note that this is not a Docker specific issue, it happens in Kubernetes too).

There are some possible mitigations:

Here are the links I found while researching this:

https://stackoverflow.com/questions/10085705/load-balancer-scalability-and-max-tcp-ports https://tech.xing.com/a-reason-for-unexplained-connection-timeouts-on-kubernetes-docker-abd041cf7e02 https://github.com/tsenart/vegeta vegeta load generator

https://github.com/moby/moby/issues/35082 http://archive.linuxvirtualserver.org/html/lvs-devel/2015-10/msg00067.html https://medium.freecodecamp.org/how-we-fine-tuned-haproxy-to-achieve-2-000-000-concurrent-ssl-connections-d017e61a4d27 https://www.linangran.com/?p=547

The first two links really describe the problem. The third link is to the vegeta load generator project that looks like it's better than the Apache load generator we've been using.

jefflill commented 6 years ago

I have confirmed that the neon-proxy based containers do not inherit the host machine net.ipv4.ip_local_port_range setting. We'll need to try to set this in its Dockerfile by modifying /etc/sysctl.conf (or perhaps /etc/sysctl.d/00-alpine.conf).

EDIT: You can set kernel parameters for containers using docker run --sysctl and there's a way to do this in a Docker stack, but there is no implementation for straight services. Here are the tracking issues:

https://github.com/moby/moby/issues/25303 <-- EPIC https://github.com/moby/moby/issues/25209 <-- REQUEST