ironcore-dev / dpservice

DPDK based fast Dataplane / L3 router / SDN enabler, installable on compute nodes / SmartNICs
Apache License 2.0
7 stars 1 forks source link

Crash at termination with offloading #413

Open PlagueCZ opened 11 months ago

PlagueCZ commented 11 months ago

Describe the bug Doing a minimal startup and then terminating dpservice-bin via Ctrl+C leads to a SIGSEGV on termination (in DPDK cleanup).

To Reproduce Running on a PC with Mellanox Connectx-6 with two VMs running (using a vfio NIC). dpservice-bin -l0,1 -- --no-stats dpservice-cli init dpservice-cli add interface --id test10 --device 0000:03:00.0_representor_vf0 --vni 123 --ipv4 192.168.123.10 --ipv6 fe80::10 Ctrl+C on dpservice

Stacktrace

Thread 1 "dpservice-bin" received signal SIGSEGV, Segmentation fault.
0x00007ffff5e8bab1 in flow_dv_sample_clone_free_cb ()
   from /usr/local/lib/x86_64-linux-gnu/dpdk/pmds-23.0/librte_net_mlx5.so.23.0

(this is part of rte_eal_cleanup())

Additional information This does not happen once --no-offload is added (given the stacktrace that's expected).

PlagueCZ commented 11 months ago

@byteocean not sure you can test this. I will hopefully get a separated lab setup soon to test on another machine.

byteocean commented 11 months ago

not encounter this so far. has something to do with migrating to DPDK 22? which was not tested on my side yet.

PlagueCZ commented 11 months ago

@byteocean thanks for pointing me the right direction! The change that causes this is 61cf7a0f94fe65119525d110963e505396dea101 (Which removed the DPDK warnings at the end about stopping ports). So I guess there needs to be something more done before stopping ports, something about the flows I guess?

byteocean commented 11 months ago

maybe the handler thing that wraps the age action. you could try to remove this indirect action part to see if the error still exits. btw, I rebased to main and upgrade dpdk to 22.11.3, and also got this error when I terminated it, after running some ping tests.

byteocean commented 11 months ago

@PlagueCZ fyi, under 22.03, such error (seg fault) does not appear.

PlagueCZ commented 11 months ago

@byteocean removing call to dp_install_default_rule_in_monitoring_group() fixes the error. I have tested 21.11 right before updating to 22.11 and the error does occur there.