kubernetes-sigs / ip-masq-agent

Manage IP masquerade on nodes
Apache License 2.0
217 stars 70 forks source link

Does ip-masq-agent intentionally crash on errors? #23

Closed negz closed 6 years ago

negz commented 6 years ago

https://github.com/kubernetes-incubator/ip-masq-agent/blob/480a627/cmd/ip-masq-agent/ip-masq-agent.go#L117

It seems like ip-masq-agent is designed to exit non zero when any error occurs. For example without #22 ip-masq-agent often crashes on busy nodes with an obtuse iptables-restore error. With #22 it will crash if it can't get the iptables lock:

E0922 00:30:24.709933       1 ip-masq-agent.go:145] error syncing masquerade rules: failed to ensure that nat chain IP-MASQ-AGENT jumps to MASQUERADE: error checking rule: exit status 4: Another app is currently holding the xtables lock. Stopped waiting after 5s.
failed to ensure that nat chain IP-MASQ-AGENT jumps to MASQUERADE: error checking rule: exit status 4: Another app is currently holding the xtables lock. Stopped waiting after 5s.

Other tools such as kube-proxy seem to log about this and move on with their lives. Is ip-masq-agent designed to crash and restart; i.e. should we consider frequent ip-masq-agent container restarts normal behaviour? I'm wondering whether I should attempt to 'fix' ip-masq-agent or simply not alert when it restarts too frequently.

negz commented 6 years ago

Thanks for fixing this @MrHohn! Would you be able to cut a new release of ip-masq-agent with your recent fixes?

MrHohn commented 6 years ago

@negz Acked, I should be able to cut a new release in couple days.

MrHohn commented 6 years ago

@negz k8s.gcr.io/ip-masq-agent-amd64:v2.2.0 have been published. Can you check if that works for you? Thanks.