Closed aojea closed 6 months ago
/assign @thockin @danwinship @SergeyKanzhelev
hosts and containers using the host can have different iptables versions, these versions are incompatible and can cause problems if both are present in the kernel.
"incompatible" and "can cause problems" are overstating things. As long as your rules are self-contained and aren't trying to interact with or override another component's rules, it generally doesn't matter whether you're using the same iptables mode as anyone else. (Unless the system only has kernel support for one of the two modes, but this PR doesn't help people running into that.)
I think we got a little bit confused about this back when iptables-1.8 first came out, because (a) we were also dealing with various bugs in iptables-1.8.[0-2] at the same time, without fully understand what was going on; (b) at the time, kube-proxy depended on iptables chains that were created by kubelet, so kube-proxy and kubelet specifically needed to be using the same iptables mode; (c) kube-proxy purports to poke holes in the system firewall, which can only work if kube-proxy and the firewall both use the same iptables mode.
The bugs eventually got fixed, KEP-3178 made kube-proxy no longer depend on kubelet's rules, and the "poking holes in the system firewall" use case turns out to be slightly dubious and probably unnecessary. (Some distros have moved to nft
-based firewalls now, which kube-proxy's firewall-hole-poking code won't work with even if you're using iptables-nft
, and no one has filed any bugs about this.)
So I'm not sure it really makes sense to call this out as a "problem"...
well, at least if there are two systems using the different versions they can override each other rules ... I do not know exactly the problems today, but I remember having segfaults because of mixed versions https://bugzilla.netfilter.org/show_bug.cgi?id=1476
We want this plugin to run just to prevent any possible problems , as is impossible to say that there is zero risk
segfaults because of mixed versions https://bugzilla.netfilter.org/show_bug.cgi?id=1476
That's not "segfaults because of mixed versions", that's "segfaults because of a bug in the iptables binaries that you happened to run into on a node with mixed versions".
We want this plugin to run just to prevent any possible problems , as is impossible to say that there is zero risk
It seems to me that this is more likely to generate false positives and warn people that there is a problem when everything is actually working fine.
It seems to me that this is more likely to generate false positives and warn people that there is a problem when everything is actually working fine.
interesting, should I add a threshold to the number of lines? At least for GKE COS clusters I don't expect to have rules from both modes in a node
Based on Dan's feedback and npd docs, I'm going to update this to only report events and metrics to get signal, based on the feedback we can assess the criticality of the problem with real data and update to Condition if necessary
node-problem-detector uses Event and NodeCondition to report problems to apiserver.
NodeCondition: Permanent problem that makes the node unavailable for pods should be reported as NodeCondition. Event: Temporary problem that has limited impact on pod but is informative should be reported as Event.
6: Error fetching NPD metrics: error fetching NPD metrics: {prow 35.188.211.187 curl http://localhost:20257/metrics
curl: (7) Failed to connect to localhost port 20257 after 0 ms: Couldn't connect to server
/test pull-npd-e2e-test
at first sight in does not look related to my PR
Based on Dan's feedback and npd docs, I'm going to update this to only report events and metrics to get signal, based on the feedback we can assess the criticality of the problem with real data and update to Condition if necessary
OK. I don't really know much about node-problem-detector...
/lgtm /cc @vteratipally @mmiranda96 @MartinForReal
/lgtm /approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: aojea, hakman, vteratipally
The full list of commands accepted by this bot can be found here.
The pull request process is described here
iptables has two kernel backends, legacy and nft.
Quoting https://developers.redhat.com/blog/2020/08/18/iptables-the-two-variants-and-their-relationship-with-nftables
However, we don't want to block the node operations because of this reason, as there is no enough evidence this is causing big issues in the wild, so we just signal and warn about this situation.
Once we have more information we can revisit this decision and keep it as is or move it to permanent.
The plugin runs every day to avoid causing problems on large systems.
https://github.com/kubernetes-sigs/iptables-wrappers/blob/97b01f43a8e8db07840fc4b95e833a37c0d36b12/iptables-wrapper-installer.sh