Open edgar-costa opened 10 months ago
Sounds very much like whatever code BMv2 is using to read packets from interfaces is blocking indefinitely during the sequence of steps you describe, until/unless a new packet is sent to the interface that was taken down (if it is first brought back up again). Unfortunately I do not know where in the BMv2 implementation that is. If you want to try to track it down, I would suggest starting from the call to input_buffer->pop_back(&packet);
here: https://github.com/p4lang/behavioral-model/blob/main/targets/simple_switch/simple_switch.cpp#L483
and work your way back to wherever there is a call that actually gets packets from veth interfaces.
Hi @jafingerhut , thanks for the reply!
Thanks for the pointer. I do not know much about the BMV2 implementation but I can try to dig down a bit more for that call at the ingress_thread. However, given this only happens with Ubuntu 22.04, and that the issue gets triggered upon an interface down event, I am guessing this might be more of a kernel issue than a bug in bmv2. Or it might be a combination of both. I will try to investigate a bit more.
Ah, sorry, I missed the point about there being one Ubuntu version that exhibited the problem. Any chance you can try an Ubuntu 23.04 system to see if the problem also exists there, and perhaps record the Linux kernel versions of the systems you tested with?
I just tried with Ubuntu 22 and Kernel 6.2 (original was 5.15), and the problem persists.
It might be too early to say this is just an "only kernel" problem. It might be a combination of a change in the kernel /net/veth.c
and something in bmv2
and its binding/interaction with the veth interfaces. But since I don't know the code very well I still did not find anything.
@edgar-costa this could be an issue with BMI, the libpcap wrapper for bmv2. There is a single thread that runs a loop and reads packets from all the interfaces, using select
: https://github.com/p4lang/behavioral-model/blob/d56d5658e34ca68ae9efdd396f8eb54facc67a2a/src/BMI/bmi_port.c#L108
What would be helpful if you have time is to run simple_switch using gdb. Of course bmv2 has to be compiled with the right flags and symbols enabled (-O0 -g
should work). After you reproduce the "deadlock", you should dump a backtrace for all threads in gdb (thread apply all bt
). I'm hoping that would help.
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment, or this will be closed in 180 days
There seems to be some problem with bmv2 (tested with
simple_switch
) when you bring one of the switch interfaces down. I have done the same exact tests with three different VMs running Ubuntu 18.04, 20.04, and 22.04. The problem I will describe only happens in Ubuntu 22.04. Thus, I am assuming this might be some interaction betweenveth
andbmv2
in Ubuntu 22.04.These are my findings:
bmv2
switch is bound to, the switch program freezes and does not even process packets (I verified this by looking at the logs). This remains even after bringing the interface up.veth,
the problem does not happen; packets get processed, and when you bring the interface up, things work again.How to replicate
System settings: Ubuntu 22.04, latest version of
p4c
andbmv2
.mininet CLI
(if you have one):link s1 s2 down
, then bring it up.s1
tos2
or whichever interface you brought down. And observe how the switch works again.