Closed Congelli501 closed 4 years ago
The whole log is that one line?
Generally iptables commands in Weave Net are run with the --wait
option, so you appear to have found an outlier.
Perhaps the auto-detection of nft - could you try setting IPTABLES_BACKEND
and see if it improves?
Yes, that was the full log.
Indeed, the auto detection seems to chose legacy:
# docker exec -it 54e01edb7c47 sh
/home/weave # ls -lah /sbin/iptables
lrwxrwxrwx 1 root root 20 May 13 13:15 /sbin/iptables -> xtables-legacy-multi
The detection commands returns the following results:
/home/weave # (iptables-legacy-save || true) 2>/dev/null | grep '^-' | wc -l
7033
/home/weave # (iptables-nft-save || true) 2>/dev/null | grep '^-' | wc -l
0
Running the iptables-nft-save
command returns the following results:
/home/weave # iptables-nft-save
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
I use iptables-restore
directly on startup to setup my firewall, could it be related ?
Could you try setting IPTABLES_BACKEND
as an environment variable in the Weave Net Daemonset and see if it improves?
I just tried setting IPTABLES_BACKEND
to nft
on both container (weave + weave-npc) of the weave daemonset.
The weave-npc pod loop crash with an error similar to https://github.com/weaveworks/weave/issues/3816
root@master0:~# kubectl logs -n kube-system weave-net-jw6m5 weave
ipset v7.2: Set cannot be destroyed: it is in use by a kernel component
root@master0:~# kubectl logs -n kube-system weave-net-jw6m5 weave-npc
INFO: 2020/07/21 23:51:38.072617 Starting Weaveworks NPC 2.6.5; node name "master0"
INFO: 2020/07/21 23:51:38.072733 Serving /metrics on :6781
Tue Jul 21 23:51:38 2020 <5> ulogd.c:408 registering plugin `NFLOG'
Tue Jul 21 23:51:38 2020 <5> ulogd.c:408 registering plugin `BASE'
Tue Jul 21 23:51:38 2020 <5> ulogd.c:408 registering plugin `PCAP'
Tue Jul 21 23:51:38 2020 <5> ulogd.c:981 building new pluginstance stack: 'log1:NFLOG,base1:BASE,pcap1:PCAP'
WARNING: scheduler configuration failed: Function not implemented
DEBU: 2020/07/21 23:51:38.114634 Got list of ipsets: [weave-;rGqyMIl1HN^cfDki~Z$3]6!N weave-s_+ChJId4Uy_$}G;WdH|~TK)I weave-k?Z;25^M}|1s7P3|H9i;*;MhG weave-5p+tpDg;~4g#*~UIK3LE5B#E7 weave-k*OnRy)yTQMY|1/JzrwAobr/X weave-!)3z_o[q?8ChN0_o[@u^aN~|j weave-P.B|!ZhkAr5q=XZ?3}tMBA+0 weave-E1ney4o[ojNrLk.6rOHi;7MPE weave-iuZcey(5DeXbzgRFs8Szo]+@p weave-]B*(W?)t*z5O17G044[gUo#$l weave-sui%__gZ}{kX~oZgI_Ttqp=Dp weave-mF}1zpEo4W6iYroE^=:V3{S6W weave-Rzff}h:=]JaaJl/G;(XJpGjZ[ weave-41s)5vQ^o/xWGz6a20N:~?#|E weave-4vtqMI+kx/2]jD%_c0S%thO%V weave-kube-test]
DEBU: 2020/07/21 23:51:38.114649 Flushing ipset 'weave-;rGqyMIl1HN^cfDki~Z$3]6!N'
DEBU: 2020/07/21 23:51:38.115150 Flushing ipset 'weave-s_+ChJId4Uy_$}G;WdH|~TK)I'
DEBU: 2020/07/21 23:51:38.115442 Flushing ipset 'weave-k?Z;25^M}|1s7P3|H9i;*;MhG'
DEBU: 2020/07/21 23:51:38.115740 Flushing ipset 'weave-5p+tpDg;~4g#*~UIK3LE5B#E7'
DEBU: 2020/07/21 23:51:38.116180 Flushing ipset 'weave-k*OnRy)yTQMY|1/JzrwAobr/X'
DEBU: 2020/07/21 23:51:38.116462 Flushing ipset 'weave-!)3z_o[q?8ChN0_o[@u^aN~|j'
DEBU: 2020/07/21 23:51:38.116738 Flushing ipset 'weave-P.B|!ZhkAr5q=XZ?3}tMBA+0'
DEBU: 2020/07/21 23:51:38.117119 Flushing ipset 'weave-E1ney4o[ojNrLk.6rOHi;7MPE'
DEBU: 2020/07/21 23:51:38.117408 Flushing ipset 'weave-iuZcey(5DeXbzgRFs8Szo]+@p'
DEBU: 2020/07/21 23:51:38.117684 Flushing ipset 'weave-]B*(W?)t*z5O17G044[gUo#$l'
DEBU: 2020/07/21 23:51:38.117949 Flushing ipset 'weave-sui%__gZ}{kX~oZgI_Ttqp=Dp'
DEBU: 2020/07/21 23:51:38.118230 Flushing ipset 'weave-mF}1zpEo4W6iYroE^=:V3{S6W'
DEBU: 2020/07/21 23:51:38.118501 Flushing ipset 'weave-Rzff}h:=]JaaJl/G;(XJpGjZ['
DEBU: 2020/07/21 23:51:38.118899 Flushing ipset 'weave-41s)5vQ^o/xWGz6a20N:~?#|E'
DEBU: 2020/07/21 23:51:38.119180 Flushing ipset 'weave-4vtqMI+kx/2]jD%_c0S%thO%V'
DEBU: 2020/07/21 23:51:38.119537 Flushing ipset 'weave-kube-test'
DEBU: 2020/07/21 23:51:38.120067 Destroying ipset 'weave-;rGqyMIl1HN^cfDki~Z$3]6!N'
ERRO: 2020/07/21 23:51:38.135906 Failed to destroy ipset 'weave-;rGqyMIl1HN^cfDki~Z$3]6!N'
FATA: 2020/07/21 23:51:38.135932 ipset [destroy weave-;rGqyMIl1HN^cfDki~Z$3]6!N] failed: ipset v7.2: Set cannot be destroyed: it is in use by a kernel component
: exit status 1
When I set back the IPTABLES_BACKEND env to legacy, weave starts I tried to reboot the nodes and the error persisted on reboot.
I have found some code lines in the launch.sh
where no -w
added to the iptables
commands.
PR: https://github.com/weaveworks/weave/pull/3835
Some iptables commands runs outside of the weave application within the launch.sh. This could be the reason of the empty log file
Thanks @nesc58 - you are agreeing with what I said at https://github.com/weaveworks/weave/issues/3828#issuecomment-654154838
Sorry my mistake, we were talking about different parts of the script.
Do you think that iptables-save
command should also get a -w
?
@Congelli501 I meant to try setting it to the correct one for your machine. Sounds like "legacy" was correct.
Do you think that
iptables-save
command should also get a-w
?
iptables-save
not have an -w
option. Maybe the save option does not need a wait parameter because it is readonly and does not modifies the rules. iptables-restore
also have a -w
command line argument but I have not found any calls.
I don't expect the parameter will be added. The man pages of iptables-nft
and also the nft
executable no longer needs a -w
.
Because the xtables-nft tools use the nf_tables kernel API, rule additions and deletions are always atomic. Unlike iptables-legacy, iptables-nft -A .. will NOT need to re‐trieve the current ruleset from the kernel, change it, and re-load the altered ruleset. Instead, iptables-nft will tell the kernel to add one rule. For this reason, the ipta‐bles-legacy --wait option is a no-op in iptables-nft.
A lot of the /test/*.sh
files have a lot of iptables calls. I would suggest that they also needs the -w
parameters or?
iptables-save
not have an-w
option.
Ah, thanks.
A lot of the
/test/*.sh
files have a lot of iptables calls. I would suggest that they also needs the -w parameters or?
It wouldn't be wrong but the test
scripts are expected to be run one at a time so should not encounter conflicts.
I've never seen a CI failure for that reason.
Closing as fixed by #3835
What you expected to happen?
Weave starts normally, on the first try
What happened?
On a busy cluster (pod waiting to be created on a node) weave takes a lot of tries to start (fail at start, goes into CrashLoopBackOff status). The "weave" container of the weave pod of the node shows the following log:
After killing the weave pod a lot of time (to retry after a CrashLoopBackOff), weave end up starting up, and then run normally. On one of my node, it took me 20 minutes to make weave start.
How to reproduce it?
Setup a busy kubernetes node (100 pods) and reboot it. The node will start with a lot of assigned pod waiting to start and won't be able to start weave.
Anything else we need to know?
This bug occurred after a cluster upgrade (reinstall). I didn't have this issue on my previous cluster (Ubuntu 18.04 amd64, Kubernetes v1.13.11, Weave 2.5.2)
Cloud provider: bare metal
Versions:
Logs:
Network:
Seems ok, same network as with our previous cluster.
Troubleshooting?
It seems the problems come from iptables being locked (used) by other kubernetes processes. I saw a lot of portmap commands beeing run (100+ commands sometimes) by kubelet.
Instead of retrying on an iptables lock error or waiting its turn (with the
--wait
option), it crashes, making it hard to make it start (the iptables command needs to work on the first try). I can't explain why my previous cluster was working fine though.