Closed rohit-mp closed 4 years ago
Interesting. We haven't tested with flent before. We will talk about this in our meeting today and see if we can reproduce it.
@rohit-mp thanks for testing this. Flent seems like an interesting tool. I've seen this issue a few times before in Pktgen last year but wasn't able to replicate it for testing. When ping
cannot connect, what do you need to do to bring up the interface again, is it as simple as unbinding the interface, or an entire reboot? Was there any relevant debug output from ONVM manager or bridge when it ran into these issues?
I've seen this issue a few times before in Pktgen last year but wasn't able to replicate it for testing.
I'm seeing the same issue now with Pktgen as well. Pktgen shows a tx of 1000Mbps but manager shows 0 rx_pps (on a 1G NIC).
When
ping
cannot connect, what do you need to do to bring up the interface again, is it as simple as unbinding the interface, or an entire reboot?
Restarting the manager resolves the issue for me with both Pktgen and Flent
Was there any relevant debug output from ONVM manager or bridge when it ran into these issues?
There was no debug output from either when this happened
I was trying to check why this happened, and realized that the dpdk plot above was with dpdk20.05 On trying onvm with dpdk20.05 from dpdk-vsn-update, I see that onvm performs as expected and I don't see the issue of the manager stopping the processing anymore as well.
onvm with dpdk20 with 1 tcp flow:
To confirm that the issue was with dpdk18.11 and not onvm, I tried l2fwd
from dpdk18.11 and encountered the same issue as before. It would be good if someone else could verify the same.
I think we're good to close this issue?
Bug Report
Current Behavior
onvm manager stop processing pkts after some time while running flent tests, and performance with bridge NF as compared to l2fwd is low.
Steps to reproduce
The following script with lines 30-39 commented out was used to see this behavior: https://gist.github.com/archit-p/2ddfca67fb691140b02c87a5bb56fc04
The topology of the setup:
On SUT: Commands to run manager:
./onvm/go.sh 0,1,2 3 0xf8 -s stdout
NFs run: Bridge NF:./go.sh 1
On Generator:
sudo ./setup-flent.sh
Before running flent on the generator, I'm able to ping 10.0.0.2 from client namespace. On running the script, I observe that after few seconds, the manager stops processing all pkts, i.e., rx_pps goes down to zero, and after the script finishes, I'm no longer able to ping 10.0.0.2 from client namespace as well.
Further, on uncommenting lines 30-35, I see that some flent tests complete (manager still stops processing pkts but now it's just randomly as opposed to everytime before) but the bandwidth observed is capped at around 350Mbps. Whereas running dpdk l2fwd on SUT instead of onvm reaches upto 10Gbps (as seen with a 10G NIC) [Refer results attached below]
Edit: tried out l2 switch, and observed similar behaviour as described above.
Environment
Additional context/Screenshots
Results for performance issue:
Single TCP flow with openNetVM:
Single TCP flow with l2fwd:
@archit-p