kubernetes-sigs / ip-masq-agent

Manage IP masquerade on nodes
Apache License 2.0
217 stars 70 forks source link

Error syncing masquerade rules #21

Closed negz closed 6 years ago

negz commented 6 years ago

I've recently observed my ip-masq-agent containers restarting fairly frequently due to errors syncing masquerade rules. It's hard to determine exactly what is wrong with the rules given the limited log output. I've observed this in both v2.0.1 and v2.1.1 of ip-masq-agent.

I0914 22:40:52.590226       1 ip-masq-agent.go:180] config file found at "/etc/config/ip-masq-agent"
I0914 22:40:52.591164       1 ip-masq-agent.go:167] using config: {"nonMasqueradeCIDRs":["172.16.0.0/13","192.168.0.0/16"],"masqLinkLocal":false,"resyncInterval":60000000000}
I0914 22:40:52.591195       1 iptables.go:361] running iptables -N [IP-MASQ-AGENT -t nat]
I0914 22:40:52.599954       1 iptables.go:361] running iptables -C [POSTROUTING -t nat -m comment --comment ip-masq-agent: ensure nat POSTROUTING directs all non-LOCAL destination traffic to our custom IP-MASQ-AGENT chain -m addrtype ! --dst-type LOCAL -j IP-MASQ-AGENT]
I0914 22:40:52.608243       1 iptables.go:338] running iptables-restore [--noflush]
E0914 22:40:52.622691       1 ip-masq-agent.go:145] error syncing masquerade rules: exit status 1 (iptables-restore: line 7 failed
)
exit status 1 (iptables-restore: line 7 failed
)

I wonder if perhaps I'm being hit by iptables locking issues as described in https://github.com/kubernetes/kubernetes/pull/44895? I notice ip-masq-agent is pinned to a version of util/iptables from before that PR was merged, and thus (I believe) won't use any iptables locking.

negz commented 6 years ago

We deployed a build of ip-masq-agent #22 to a production cluster earlier today and configured both ip-masq-agent and kube-proxy to mount /run/xtables.lock. I've observed zero ip-masq-agent restarts since; typically we'd see a dozen or so on busy nodes in that time frame.