openvswitch / ovs-issues

Issue tracker repo for Open vSwitch
10 stars 3 forks source link

linux kernel panic #314

Open monkey92t opened 7 months ago

monkey92t commented 7 months ago

My env: OS: Ubuntu 20.04.3 LTS AMD x64 Kernel: 5.4.0-169-generic OVS: ovs-vsctl (Open vSwitch) 3.2.1 DB Schema 8.4.0


create ovs bridge and port:

ovs-vsctl add-br br0
ovs-vsctl set bridge br0 protocols=OpenFlow10,OpenFlow13,OpenFlow14,OpenFlow15 -- add-port br0 sp -- set Interface sp type=internal ofport_request=1
ovs-vsctl set bridge br0 protocols=OpenFlow10,OpenFlow13,OpenFlow14,OpenFlow15 -- add-port br0 back -- set Interface back type=internal ofport_request=2
ovs-vsctl set bridge br0 protocols=OpenFlow10,OpenFlow13,OpenFlow14,OpenFlow15 -- add-port br0 svc1 -- set Interface svc1 type=internal ofport_request=10
ovs-vsctl set bridge br0 protocols=OpenFlow10,OpenFlow13,OpenFlow14,OpenFlow15 -- add-port br0 svc2 -- set Interface svc2 type=internal ofport_request=20
ifconfig sp up
ifconfig back up
ifconfig svc1 up
ifconfig svc2 up

ifconfig sp mtu 65535
ifconfig back mtu 65535
ifconfig svc1 mtu 65535
ifconfig svc2 mtu 65535

# add test eth
sudo ip link add src1 type dummy
ifconfig src1 up
ifconfig src1 mtu 65535

set ovs flows:

ovs-ofctl del-flows br0
ovs-ofctl -Oopenflow13 add-flow br0 "table=0,in_port=sp,actions=encap(nsh(md_type=1)),set_field:0x1234->nsh_spi,set_field:0xff->nsh_si,encap(ethernet),svc1"
ovs-ofctl -Oopenflow13 add-flow br0 "table=0,in_port=back,dl_type=0x894f,nsh_mdtype=1,nsh_spi=0x1234,actions=decap(),decap(),svc2"

graph: image

packet route: src1 -> sp -> ovs -> flows -> svc1 -> sf proxy1(nsh) -> container -> sf proxy1(nsh) -> back port -> ovs -> flows -> svc2

ovs flows:

root@monkey:~# ovs-ofctl -Oopenflow13 dump-flows br0
 cookie=0x0, duration=12.619s, table=0, n_packets=0, n_bytes=0, in_port=sp actions=encap(nsh(md_type=1)),set_field:0x1234->nsh_spi,set_field:255->nsh_si,encap(ethernet),output:svc1
 cookie=0x0, duration=7.747s, table=0, n_packets=0, n_bytes=0, in_port=back,dl_type=0x894f,nsh_mdtype=1,nsh_spi=0x1234 actions=decap(),decap(),output:svc2

I wrote test data from src1 to sp. The individual packet sizes tested were: 64-128-256-512-1024-2048, and everything worked stably up to 2048.

However, when I set the packet size to 4096, a kernel crash occurred. I was only able to gather very limited information:

Message from syslogd@monkey at Dec 25 16:55:53 ...
kernel:[ 424.229272] Kernel panic - not syncing: Fatal exception in interrupt

I attempted to use kdump, but without success. I'm not familiar with kdump......

However, through my testing, the crash was due to this rule:

ovs-ofctl -Oopenflow13 add-flow br0 table=0,in_port=back,dl_type=0x894f,nsh_mdtype=1,nsh_spi=0x1234,actions=decap(),decap(),svc2

Without setting this rule, it can run stably up to ⑧. I'm not sure what's wrong with this rule because it can work with packet sizes up to 2048 or smaller.

I tried both OVS 2.13 and 3.2.1 versions, and they yielded the same results. Did I miss something?

igsilya commented 6 months ago

Hi, @monkey92t . Could you provide the output of the ovs-appctl dpctl/dump-flows executed while the traffic is running in a working case (small packets)?

monkey92t commented 6 months ago

Hi, @monkey92t . Could you provide the output of the executed while the traffic is running in a working case (small packets)?ovs-appctl dpctl/dump-flows

@igsilya Thank you for your reply. Below, I'll explain my testing process:

I simplified my testing process.

MyOS: ubuntu 20.04-tls amd x64, linux kernel 5.4.0-169-generic

# install version 2.13
apt install openvswitch-switch
root@monkey:~# ovs-vsctl --version
ovs-vsctl (Open vSwitch) 2.13.8
DB Schema 8.2.0

ovs-vsctl add-br br0
ovs-vsctl set bridge br0 protocols=OpenFlow10,OpenFlow13,OpenFlow14,OpenFlow15 -- add-port br0 sp -- set Interface sp type=internal ofport_request=1
ovs-vsctl set bridge br0 protocols=OpenFlow10,OpenFlow13,OpenFlow14,OpenFlow15 -- add-port br0 back -- set Interface back type=internal ofport_request=2
ovs-vsctl set bridge br0 protocols=OpenFlow10,OpenFlow13,OpenFlow14,OpenFlow15 -- add-port br0 svc1 -- set Interface svc1 type=internal ofport_request=10
ovs-vsctl set bridge br0 protocols=OpenFlow10,OpenFlow13,OpenFlow14,OpenFlow15 -- add-port br0 svc2 -- set Interface svc2 type=internal ofport_request=20

ifconfig sp up
ifconfig back up
ifconfig svc1 up
ifconfig svc2 up

ifconfig sp mtu 65535
ifconfig back mtu 65535
ifconfig svc1 mtu 65535
ifconfig svc2 mtu 65535

ovs-ofctl del-flows br0
ovs-ofctl -Oopenflow13 add-flow br0 "table=0,in_port=sp,actions=encap(nsh(md_type=1)),set_field:0x1234->nsh_spi,set_field:0xff->nsh_si,encap(ethernet),svc1"
ovs-ofctl -Oopenflow13 add-flow br0 "table=0,in_port=back,dl_type=0x894f,nsh_mdtype=1,nsh_spi=0x1234,actions=decap(),decap(),svc2"

At this moment, my network card list is:

root@monkey:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether fa:cb:36:f6:cd:00 brd ff:ff:ff:ff:ff:ff
    inet brd scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::f8cb:36ff:fef6:cd00/64 scope link 
       valid_lft forever preferred_lft forever
3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 3e:b8:a6:5b:b5:60 brd ff:ff:ff:ff:ff:ff
4: svc1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65535 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 9e:61:5a:30:67:ea brd ff:ff:ff:ff:ff:ff
    inet6 fe80::9c61:5aff:fe30:67ea/64 scope link 
       valid_lft forever preferred_lft forever
5: svc2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65535 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 5e:8e:5f:81:1f:94 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5c8e:5fff:fe81:1f94/64 scope link 
       valid_lft forever preferred_lft forever
6: back: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65535 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether da:c2:f2:65:d9:7e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::d8c2:f2ff:fe65:d97e/64 scope link 
       valid_lft forever preferred_lft forever
7: sp: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65535 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether a2:e2:7f:c4:fe:27 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::a0e2:7fff:fec4:fe27/64 scope link 
       valid_lft forever preferred_lft forever
8: br0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether ee:35:dd:44:6c:41 brd ff:ff:ff:ff:ff:ff

OVS Port list:

root@monkey:~# ovs-vsctl show
    Bridge br0
        Port sp
            Interface sp
                type: internal
        Port svc2
            Interface svc2
                type: internal
        Port back
            Interface back
                type: internal
        Port svc1
            Interface svc1
                type: internal
        Port br0
            Interface br0
                type: internal
    ovs_version: "2.13.8"

OVS rule list:

root@monkey:~# ovs-ofctl -Oopenflow13 dump-flows br0
 cookie=0x0, duration=214.102s, table=0, n_packets=11, n_bytes=866, in_port=sp actions=encap(nsh(md_type=1)),set_field:0x1234->nsh_spi,set_field:255->nsh_si,encap(ethernet),output:svc1
 cookie=0x0, duration=213.691s, table=0, n_packets=0, n_bytes=0, in_port=back,dl_type=0x894f,nsh_mdtype=1,nsh_spi=0x1234 actions=decap(),decap(),output:svc2

The packet path for testing is: sp (source port) -> ovs (Open vSwitch) -> rule-1 -> svc1 -> svc (read svc1 write back) -> rule-2 -> svc2

In the sp network card, I wrote test data with a single packet size of 2048 bytes (2KB). In total, ten million packets were written. The output result of the 'ovs-appctl dpctl/dump-flows' command is:

root@monkey:~# ovs-appctl dpctl/dump-flows 
recirc_id(0),in_port(4),eth(),eth_type(0x0800),ipv4(frag=later), packets:179345, bytes:369809390, used:0.000s, actions:push_nsh(flags=0,ttl=63,mdtype=1,np=3,spi=0x1234,si=255,c1=0x0,c2=0x0,c3=0x0,c4=0x0),push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),1
recirc_id(0x5),in_port(3),eth(),eth_type(0x0800),ipv4(frag=later), packets:0, bytes:0, used:never, actions:2
recirc_id(0),in_port(3),eth(),eth_type(0x894f),nsh(mdtype=1,np=3,spi=0x1234), packets:179180, bytes:376278000, used:0.000s, actions:pop_eth,pop_nsh(),recirc(0x4)
recirc_id(0x4),in_port(3),eth(),eth_type(0x0800),ipv4(frag=later), packets:179047, bytes:369194914, used:0.000s, actions:2
root@monkey:~# ovs-appctl dpctl/dump-flows 
recirc_id(0),in_port(4),eth(),eth_type(0x0800),ipv4(frag=later), packets:242000, bytes:499004000, used:0.008s, actions:push_nsh(flags=0,ttl=63,mdtype=1,np=3,spi=0x1234,si=255,c1=0x0,c2=0x0,c3=0x0,c4=0x0),push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),1
recirc_id(0x5),in_port(3),eth(),eth_type(0x0800),ipv4(frag=later), packets:0, bytes:0, used:never, actions:2
recirc_id(0),in_port(3),eth(),eth_type(0x894f),nsh(mdtype=1,np=3,spi=0x1234), packets:241877, bytes:507941700, used:0.004s, actions:pop_eth,pop_nsh(),recirc(0x4)
recirc_id(0x4),in_port(3),eth(),eth_type(0x0800),ipv4(frag=later), packets:241744, bytes:498476128, used:0.004s, actions:2
root@monkey:~# ovs-appctl dpctl/dump-flows 
recirc_id(0),in_port(4),eth(),eth_type(0x0800),ipv4(frag=later), packets:283785, bytes:585164670, used:0.000s, actions:push_nsh(flags=0,ttl=63,mdtype=1,np=3,spi=0x1234,si=255,c1=0x0,c2=0x0,c3=0x0,c4=0x0),push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),1
recirc_id(0x5),in_port(3),eth(),eth_type(0x0800),ipv4(frag=later), packets:0, bytes:0, used:never, actions:2
recirc_id(0),in_port(3),eth(),eth_type(0x894f),nsh(mdtype=1,np=3,spi=0x1234), packets:282472, bytes:593191200, used:0.000s, actions:pop_eth,pop_nsh(),recirc(0x4)
recirc_id(0x4),in_port(3),eth(),eth_type(0x0800),ipv4(frag=later), packets:282340, bytes:582185080, used:0.000s, actions:2
root@monkey:~# ovs-appctl dpctl/dump-flows 
recirc_id(0),in_port(4),eth(),eth_type(0x0800),ipv4(frag=later), packets:319000, bytes:657778000, used:0.096s, actions:push_nsh(flags=0,ttl=63,mdtype=1,np=3,spi=0x1234,si=255,c1=0x0,c2=0x0,c3=0x0,c4=0x0),push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),1
recirc_id(0x5),in_port(3),eth(),eth_type(0x0800),ipv4(frag=later), packets:0, bytes:0, used:never, actions:2
recirc_id(0),in_port(3),eth(),eth_type(0x894f),nsh(mdtype=1,np=3,spi=0x1234), packets:318870, bytes:669627000, used:0.084s, actions:pop_eth,pop_nsh(),recirc(0x4)
recirc_id(0x4),in_port(3),eth(),eth_type(0x0800),ipv4(frag=later), packets:318737, bytes:657235694, used:0.084s, actions:2
root@monkey:~# ovs-appctl dpctl/dump-flows 
recirc_id(0),in_port(4),eth(),eth_type(0x0800),ipv4(frag=later), packets:363000, bytes:748506000, used:0.024s, actions:push_nsh(flags=0,ttl=63,mdtype=1,np=3,spi=0x1234,si=255,c1=0x0,c2=0x0,c3=0x0,c4=0x0),push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),1
recirc_id(0x5),in_port(3),eth(),eth_type(0x0800),ipv4(frag=later), packets:0, bytes:0, used:never, actions:2
recirc_id(0),in_port(3),eth(),eth_type(0x894f),nsh(mdtype=1,np=3,spi=0x1234), packets:362900, bytes:762090000, used:0.024s, actions:pop_eth,pop_nsh(),recirc(0x4)
recirc_id(0x4),in_port(3),eth(),eth_type(0x0800),ipv4(frag=later), packets:362767, bytes:748025554, used:0.024s, actions:2
root@monkey:~# ovs-appctl dpctl/dump-flows 
recirc_id(0),in_port(4),eth(),eth_type(0x0800),ipv4(frag=later), packets:396000, bytes:816552000, used:0.096s, actions:push_nsh(flags=0,ttl=63,mdtype=1,np=3,spi=0x1234,si=255,c1=0x0,c2=0x0,c3=0x0,c4=0x0),push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),1
recirc_id(0x5),in_port(3),eth(),eth_type(0x0800),ipv4(frag=later), packets:0, bytes:0, used:never, actions:2
recirc_id(0),in_port(3),eth(),eth_type(0x894f),nsh(mdtype=1,np=3,spi=0x1234), packets:395863, bytes:831312300, used:0.092s, actions:pop_eth,pop_nsh(),recirc(0x4)
recirc_id(0x4),in_port(3),eth(),eth_type(0x0800),ipv4(frag=later), packets:395730, bytes:815995260, used:0.092s, actions:2

At this point, it was functioning without issues, but when I set the individual packet size to 4096 bytes (4KB), the kernel crashed immediately. I couldn't retrieve any data or logs in time.

Thank you very much for your help.

monkey92t commented 6 months ago

@igsilya Hi, In my testing process, the first rule:

ovs-ofctl -Oopenflow13 add-flow br0 "table=0,in_port=sp,actions=encap(nsh(md_type=1)),set_field:0x1234->nsh_spi,set_field:0xff->nsh_si,encap(ethernet),svc1"

Changing it to:

ovs-ofctl -Oopenflow13 add-flow br0 "table=0,in_port=sp,actions=encap(nsh(md_type=1)),set_field:0x1234->nsh_spi,set_field:0xff->nsh_si,encap(ethernet),set_field:11:22:33:44:55:66->dl_dst,svc1"

Their difference lies solely in the addition of set_field:11:22:33:44:55:66->dl_dst.

But it seems to have affected the normal functioning of the second rule.

The modification allows it to function properly with packets of 4096 bytes or larger. Do we need to set the MAC address for Ethernet? I'm puzzled because it works fine for packets smaller than 2048 bytes; I'm not sure if there might be another reason...