Open fgeorgatos opened 5 years ago
btw. the codepaths are varying somewhat from stacktrace to stacktrace, but the following calls seem to be the common ones (presumably skb related):
8 ? __kmalloc_node_track_caller+0x190/0x280
8 ? __pskb_pull_tail+0x81/0x460
8 ? ovs_ct_update_key+0x9f/0xe0 [openvswitch]
8 __do_softirq+0xd1/0x287
8 __netif_receive_skb+0x18/0x60
8 __netif_receive_skb_core+0x211/0xb30
8 __netif_receive_skb_one_core+0x3b/0x80
8 dev_gro_receive+0x65f/0x670
8 gro_cell_poll+0x5c/0x90
8 napi_gro_complete+0x73/0x90
8 napi_gro_receive+0x38/0xf0
8 net_rx_action+0x289/0x3d0
8 netdev_frame_hook+0xd9/0x160 [openvswitch]
8 netif_receive_skb_internal+0x45/0xf0
thanks @fgeorgatos for reporting this issue
if having enabled FASTDP service could be a factor, since I have noticed that if it gets disabled traffic throughput drops and kernel panic ceases
fastdp
should run fine with 4.19 kernel in fact https://github.com/weaveworks/weave/pull/3430 fixed issue with 4.19 compatibility
It could be either the specific combination (openstack/qemu) or parallel network streams that must be causing this issue.
From the stack trace potentially panic is due to OVS data path that Weave's fastdp uses.
@murali-reddy thanks for the feedback.
fyi. the cause/fix must be hidden somewhere along this linux kernel git diff (<1000 lines until known bugfix point): https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=v4.19.1&id2=v4.19&dt=2
However, I have run out of ideas about how to corner it, rigorously; @brb any suggestions? We can reasonably assume that the ipv6, mellanox & eth drivers, smc & sparc diffs are irrelevant; imho, it is possibly this one: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/net/openvswitch/flow_netlink.c?id=v4.19.1&id2=v4.19
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index a70097ecf33c2..865ecef681969 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -3030,7 +3030,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
* is already present */
if (mac_proto != MAC_PROTO_NONE)
return -EINVAL;
- mac_proto = MAC_PROTO_NONE;
+ mac_proto = MAC_PROTO_ETHERNET;
break;
case OVS_ACTION_ATTR_POP_ETH:
@@ -3038,7 +3038,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
return -EINVAL;
if (vlan_tci & htons(VLAN_TAG_PRESENT))
return -EINVAL;
- mac_proto = MAC_PROTO_ETHERNET;
+ mac_proto = MAC_PROTO_NONE;
break;
case OVS_ACTION_ATTR_PUSH_NSH:
What you expected to happen?
No kernel crash for parallel net streams:
iperf3 -c <server_cni_ip> -P1,2,4,8,16,32,64,128
i.e. the receiving end should be able to tolerate multiple parallel network streams, for P >= ~8.My Request For Comments is, if this is reproducible for any other installations, since kernel 4.19.x series is very popular across a number of distributions (f.i. centos7+elrepo) and i've seen it in many other k8s deployments; testing it is cheap since it is a one-liner, run on 2 pods.
What happened?
kernel panic
, reproducible with iperf for P=~16 or greater, sometimes also for P=8:IMPORTANT: bug is NOT reproducible without involving a cni (precisely: over openvswitch layer).
How to reproduce it?
You need to install
iperf3 -s
inside a test pod withkernel/4.19.0
or a "bug-compatible", then pick a client pod and simply try:iperf3 -c <server_cni_ip> -P1,2,4,8,16,32,64,128
On a problematic kernel, the kernel panic will occur about midway on the above sequence.A convenient oneliner:
echo 1 2 4 8 16 32 64 128|xargs -n1 iperf3 -c <server_cni_ip> -P
N.B. the crashing system is always the traffic receiving server that listens to that CNI ip.
Anything else we need to know?
The configuration tried here regards an
openstack
qemu back-end, deploying viarancher
.Mentioning it, because it could be a factor and/or even bug cause, in some conceivable way, although my bigger question is if having enabled FASTDP service could be a factor, since I have noticed that if it gets disabled traffic throughput drops and kernel panic ceases (i.e. there is a correlation, but not necessarily causal relationship).
Versions:
Logs:
the kernel ultimately dies with:
kernel panic - not syncing: fatal exception in interrupt
stacktrace:
Conclusion
If the above bug report is considered historic (since the said kernel is old),
please consider this feature request instead:
provide a test/qualification test for new a weave CNI deployment, which proves that a couple dozen parallel network streams are not crashing the platform. It could be as simple as:
iperf3 -s & echo 1 2 4 8 16 32 64 128|xargs -n1 iperf3 -c my_iperf_daemonset -P
why: verify that CNI stack is functional, at least at some level of parallelism
what: add an extra QA check point