newsnowlabs / docker-ingress-routing-daemon

Docker swarm daemon that modifies ingress mesh routing to expose true client IPs to service containers
MIT License
179 stars 34 forks source link

Timeout issues when daemon is enabled? #26

Closed fernandomm closed 1 year ago

fernandomm commented 2 years ago

I'm running into a very specific issue but I'm posting here just in case someone else experienced a similar issue and could share some hints.

When using this daemon for a SMTP service, it works fine for most of the clients. But some clients start to get timeout/disconnect issues during the DATA part of the SMTP protocol.

The timeouts aren't random for these clients, they happen all the time when DIND is enabled and always during the DATA part of the SMTP protocol.

Uninstalling DIND fixes the issue right away. And reenabling it, brings the issue back.

Any ideas about what could cause this issue?

The env is a docker swam cluster with 8 nodes and the service has 2 replicas, but all running at the same node. The clients are connecting directly to the node IP.

Also, I've a few sysctl settings in /etc/sysctl.conf:

net.ipv4.tcp_timestamps = 0
fs.nr_open = 15000000
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 60
net.ipv4.ip_local_port_range = 1024 65535
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
net.netfilter.nf_conntrack_max=1310720
net.nf_conntrack_max=1310720
net.core.netdev_max_backlog = 300000
net.core.somaxconn = 65535
vm.swappiness = 1
struanb commented 2 years ago

Hi @fernandomm. Thank you for trying DIND. This sounds like a very interesting issue. I'm not sure what you could be experiencing here.

It does remind me of when we first migrated NewsNow to Docker swarm. Everything seemed fine for most users but some users complained of hanging connections. Eventually we were able to reproduce, and traced that to TCP failing to negotiate the correct MSS due to the docker ingress network having a smaller MTU than our physical network. Connections hung when clients sent segments/packets which were too large to traverse from physical network to ingress network. We solved the issue using MSS clamping.

This issue though arose, I seem to recall, only because of the use of the ingress network and not because of DIND itself. I will dig out our notes on that later though, in case they could provide any further insight.

struanb commented 2 years ago

Hi @fernandomm. I dug out our notes on the aforementioned issue. The symptoms do sound similar, but the only way I can think it could be relevant is if somehow the use of DIND inhibits normal MTU discovery.

Anyway, what you could try is this firewall rule on your load balancers (or external firewalls, if they are separate - ours are the same):

iptables -t mangle -A FORWARD -p tcp -m multiport --sports <PORTS> -m tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1410

(For <PORTS>, I'd assume you need normal SMTP ports e.g. 25,587).

Here are the notes we made regarding this firewall rule:

The Docker ingress network uses an MTU of 1450, whereas standard ethernet interfaces use an MTU of 1500. This means that full size IP packets received by the cluster director from clients cannot be transmitted to containers. When this happens, 'ICMP a.b.c.d unreachable - need to frag (mtu 1450)' will be sent to the client advising their IP stack to resend smaller packets. Some firewall admins prevent these packets from getting through, causing TCP connections to completely break.

This workaround fixes TCP/IP (but not IP or UDP) by forcing the MSS (Maximum Segment Size) we request down from 1460 (1500-40) to 1410 (1450-40) (where 40 is IP header size + TCP header size).

Please note, the reason the issue we experienced was intermittent is because some only firewall admins incorrectly drop 'ICMP unreachable' packets, so the issue only affects users on certain networks/IP addresses.

Please let me know if can try this and if it helps.

fernandomm commented 2 years ago

@struanb Thanks a lot for taking time to look at this issue and review your notes.

Next weekend I'm going to try to enable DIND again and perform some tests with the firewall rule that you mentioned.

I will also run tcpdump during the tests to check if I'm able to get more info about this issue.

struanb commented 1 year ago

Hi @fernandomm. Have you been able to reproduce this issue again, or get more info about it? Assuming not, we will close this issue, however if you would like to, please feel free to reopen it.