Open PlagueCZ opened 1 year ago
Once there is traffic, graph get corrupted. See DPDK's rte_graph_worker.h, rte_graph_walk(), in the main loop, the graph arrives in a state where node is invalid (fence check fails, etc.)
If I understand it correctly, the entry-level graph nodes are at a relative index in relation to cir_start
. Then, when those nodes send the packet to another node, it gets added as a positive index.
Forgive the crude debug output, but I think it can explain a little (plaintext lines are from node loops, address line is from rte_graph_walk() loop with the node name at the end.
rx node burst 1
0x100c30dc0, head: fffffff7, tail: 1, start: 0x100c30140, mask: 3f, head: 1, cirstart: d80; 'cls' <- ''
cls node packet
rx node burst 1
0x100c32d00, head: fffffff7, tail: 2, start: 0x100c30140, mask: 3f, head: 2, cirstart: 2cc0; 'drop' <- ''
drop node packet
0x100c30dc0, head: fffffff7, tail: 3, start: 0x100c30140, mask: 3f, head: 3, cirstart: d80; 'cls' <- ''
cls node packet
0x100c32d00, head: fffffff7, tail: 4, start: 0x100c30140, mask: 3f, head: 4, cirstart: 2cc0; 'drop' <- ''
drop node packet
0x100c30dc0, head: fffffff7, tail: 3, start: 0x100c30140, mask: 3f, head: 1, cirstart: d80; 'cls' <- ''
0x100c32d00, head: fffffff7, tail: 0, start: 0x100c30140, mask: 3f, head: 2, cirstart: 2cc0; 'drop' <- ''
0x100c30dc0, head: fffffff7, tail: 1, start: 0x100c30140, mask: 3f, head: 1, cirstart: d80; 'cls' <- ''
0x100c32d00, head: fffffff7, tail: 0, start: 0x100c30140, mask: 3f, head: 2, cirstart: 2cc0; 'drop' <- ''
0x100c30dc0, head: fffffff7, tail: 0, start: 0x100c30140, mask: 3f, head: 3, cirstart: d80; 'cls' <- ''
0x100c32d00, head: fffffff7, tail: 0, start: 0x100c30140, mask: 3f, head: 4, cirstart: 2cc0; 'drop' <- ''
0x100c30040, head: fffffff7, tail: 0, start: 0x100c30140, mask: 3f, head: 5, cirstart: 0
The last line will cause a node with invalid fence to be processed, leading to a SIGSEGV.
It seems that there are two packets (I had a VM connected that started DHCP). They go though classify node and a drop node. But somehow they are processed in a non-thread-safe way, so there remains an invalid record that then gets processed too.
As dpservice has introduced DPDK 23.11 support, we could evaluate graph library multicore support introduced with its previous release, DPDK 23.07 (mcore dispatch).
Summary
This is a future-improvement feature. dp-service does not work properly with multiple worker cores.
Currently this issue is here as a place to put findings to be used later. No bugfixing/investigation is needed at the time.