Closed Yi-ran closed 3 years ago
Can you attach the debug output from the iokernel instances also?
I doubt the DPDK version (19.11) current Caladan/Breakwater supports might not be compatible with your Ubuntu 20.04 Note that Ubuntu 20.04 (Kernel 5.4) has not been tested in Intel platforms with Intel NICs combinations before DPDK 20.05 (https://doc.dpdk.org/guides-20.11/rel_notes/release_19_11.html) Can you check whether you see the same issues with an older Linux kernel? Ubuntu 18.04 with Linux 4.15 which is used for Breakwater evaluation or Ubuntu 18.04 with Linux 5.2 which is used for Caladan would be the best options to try.
Can you attach the debug output from the iokernel instances also?
~/caladan$ sudo ./iokerneld ias nobw no_hw_qdel CPU 07| <6> entering 'iokernel' init phase CPU 07| <6> init -> base CPU 07| <5> cpu: detected 8 cores, 1 nodes CPU 07| <5> time: detected 3407 ticks / us CPU 07| <6> init: entering 'early' init CPU 07| <6> init: entering 'normal' init CPU 07| <6> init: entering 'late' init [ 0.000339] CPU 07| <6> init -> ksched [ 0.000358] CPU 07| <6> init -> sched [ 0.000362] CPU 07| <5> sched: CPU configuration... node 0: [0,4][1,5][2,6][3,7] [ 0.000372] CPU 07| <5> sched: dataplane on 4, control on 0 [ 0.000375] CPU 07| <6> init -> simple [ 0.000381] CPU 07| <6> init -> numa [ 0.000383] CPU 07| <6> init -> ias [ 0.000387] CPU 07| <6> init -> control [ 0.026285] CPU 07| <5> control: spawning control thread [ 0.026364] CPU 07| <6> init -> dpdk EAL: Detected 8 lcore(s) EAL: Detected 1 NUMA nodes EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' EAL: No available hugepages reported in hugepages-1048576kB EAL: Probing VFIO support... EAL: VFIO support initialized EAL: PCI device 0000:00:1f.6 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:15b7 net_e1000_em EAL: PCI device 0000:01:00.0 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:10fb net_ixgbe EAL: PCI device 0000:01:00.1 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:10fb net_ixgbe [ 1.482301] CPU 04| <6> init -> rx EAL: WARNING! Base virtual address hint (0x1100805000 != 0x7f683838a000) not respected! EAL: This may cause issues with mapping memory into secondary processes [ 1.589685] CPU 04| <6> init -> tx [ 1.592309] CPU 04| <6> init -> dp_clients [ 1.592322] CPU 04| <6> init -> dpdk_late [ 3.138781] CPU 04| <5> dpdk: driver: net_ixgbe port 0 MAC: 00 1b 21 88 d2 94 [ 3.138792] CPU 04| <6> init -> hw_timestamp [ 3.138797] CPU 04| <5> main: core 4 running dataplane. [Ctrl+C to quit] [ 3.735374] CPU 04| <6> rx: received 1 packets on port 0 [ 3.735384] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 3.735388] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 3.735446] CPU 04| <6> rx: received 1 packets on port 0 [ 3.735449] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 3.735453] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e [ 5.622595] CPU 04| <6> rx: received 1 packets on port 0 [ 5.622601] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 5.622605] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e [ 6.438784] CPU 04| <6> rx: received 1 packets on port 0 [ 6.438791] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 6.438795] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 6.622797] CPU 04| <6> rx: received 1 packets on port 0 [ 6.622801] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 6.622804] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e [ 7.623111] CPU 04| <6> rx: received 1 packets on port 0 [ 7.623116] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 7.623121] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e [ 8.439352] CPU 04| <6> rx: received 1 packets on port 0 [ 8.439357] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 8.439360] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 .......... (repeat above)
~/caladan$ sudo ./iokerneld ias nobw no_hw_qdel CPU 00| <6> entering 'iokernel' init phase CPU 00| <6> init -> base CPU 00| <5> cpu: detected 8 cores, 1 nodes CPU 00| <5> time: detected 3408 ticks / us CPU 00| <6> init: entering 'early' init CPU 00| <6> init: entering 'normal' init CPU 00| <6> init: entering 'late' init [ 0.000287] CPU 00| <6> init -> ksched [ 0.000304] CPU 00| <6> init -> sched [ 0.000312] CPU 00| <5> sched: CPU configuration... node 0: [0,4][1,5][2,6][3,7] [ 0.000319] CPU 00| <5> sched: dataplane on 4, control on 0 [ 0.000322] CPU 00| <6> init -> simple [ 0.000327] CPU 00| <6> init -> numa [ 0.000330] CPU 00| <6> init -> ias [ 0.000334] CPU 00| <6> init -> control [ 0.025872] CPU 00| <5> control: spawning control thread [ 0.025942] CPU 00| <6> init -> dpdk EAL: Detected 8 lcore(s) EAL: Detected 1 NUMA nodes EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' EAL: No available hugepages reported in hugepages-1048576kB EAL: Probing VFIO support... EAL: VFIO support initialized EAL: PCI device 0000:00:1f.6 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:15b7 net_e1000_em EAL: PCI device 0000:01:00.0 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:10fb net_ixgbe EAL: PCI device 0000:01:00.1 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:10fb net_ixgbe [ 1.487067] CPU 04| <6> init -> rx EAL: WARNING! Base virtual address hint (0x1100805000 != 0x7f8c3558d000) not respected! EAL: This may cause issues with mapping memory into secondary processes [ 1.519877] CPU 04| <6> init -> tx [ 1.522722] CPU 04| <6> init -> dp_clients [ 1.522735] CPU 04| <6> init -> dpdk_late [ 3.142901] CPU 04| <5> dpdk: driver: net_ixgbe port 0 MAC: 00 1b 21 89 6e ac [ 3.142910] CPU 04| <6> init -> hw_timestamp [ 3.142915] CPU 04| <5> main: core 4 running dataplane. [Ctrl+C to quit] [ 3.570855] CPU 04| <6> rx: received 1 packets on port 0 [ 3.570861] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 3.570866] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e [ 3.571357] CPU 04| <6> rx: received 1 packets on port 0 [ 3.571360] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 3.571364] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 4.905497] CPU 04| <6> rx: received 1 packets on port 0 [ 4.905501] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 4.905505] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e [ 5.716392] CPU 04| <6> rx: received 1 packets on port 0 [ 5.716397] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 5.716401] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 5.900380] CPU 04| <6> rx: received 1 packets on port 0 [ 5.900384] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 5.900388] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e [ 6.954245] CPU 04| <6> rx: received 1 packets on port 0 [ 6.954249] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 6.954252] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e [ 7.716403] CPU 04| <6> rx: received 1 packets on port 0 [ 7.716407] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 7.716411] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 9.716431] CPU 04| <6> rx: received 1 packets on port 0 [ 9.716436] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 9.716441] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 11.716449] CPU 04| <6> rx: received 1 packets on port 0 [ 11.716453] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 11.716458] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 13.716467] CPU 04| <6> rx: received 1 packets on port 0 [ 13.716471] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 13.716475] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 15.716480] CPU 04| <6> rx: received 1 packets on port 0 [ 15.716486] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 15.716490] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 17.716504] CPU 04| <6> rx: received 1 packets on port 0 [ 17.716509] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 17.716513] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 19.716519] CPU 04| <6> rx: received 1 packets on port 0 [ 19.716524] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 19.716528] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 21.716542] CPU 04| <6> rx: received 1 packets on port 0 [ 21.716546] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 21.716550] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 23.289722] CPU 04| <6> tx: transmitted 1 packets on port 0 [ 23.716562] CPU 04| <6> rx: received 1 packets on port 0 [ 23.716567] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 23.716571] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 25.289688] CPU 04| <6> tx: transmitted 1 packets on port 0 [ 25.716579] CPU 04| <6> rx: received 1 packets on port 0 [ 25.716584] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 25.716589] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 26.289717] CPU 04| <6> tx: transmitted 1 packets on port 0 [ 27.289746] CPU 04| <6> tx: transmitted 1 packets on port 0 [ 27.716597] CPU 04| <6> rx: received 1 packets on port 0 [ 27.716602] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 27.716605] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 29.716613] CPU 04| <6> rx: received 1 packets on port 0 [ 29.716618] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 29.716622] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 31.716632] CPU 04| <6> rx: received 1 packets on port 0 [ 31.716637] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 31.716640] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 33.716647] CPU 04| <6> rx: received 1 packets on port 0 [ 33.716652] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 33.716656] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 33.720804] CPU 04| <6> rx: received 1 packets on port 0 [ 33.720808] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 33.720811] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 36.717163] CPU 04| <6> rx: received 1 packets on port 0 [ 36.717167] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 36.717171] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 36.900714] CPU 04| <6> rx: received 1 packets on port 0 [ 36.900718] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 36.900721] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e ........
Thanks! Could you also check dmesg
and see if there are any relevant messages there (perhaps from igb_uio
)? Can you try unbinding and re-binding the NIC to the igb_uio
driver between experiments? You can use the dpdk-setup.sh
script in dpdk/usertools
Thanks! Could you also check
dmesg
and see if there are any relevant messages there (perhaps fromigb_uio
)? Can you try unbinding and re-binding the NIC to theigb_uio
driver between experiments? You can use thedpdk-setup.sh
script indpdk/usertools
Thanks. Unfortunately, I do not notice any special warning messages in dmesg. I also try unbinding and re-binding the NIC at both client and server, but the problem remains. I'll try to run under the Ubuntu version and kernel version as suggested by @inhocho89
@Yi-ran You mentioned that it works fine after the machine reboot. Did you have to reboot the server, the client, or both?
@Yi-ran You mentioned that it works fine after the machine reboot. Did you have to reboot the server, the client, or both?
Only rebooting the client is needed. I guess that the client encounters problems after sending requests? However, no other useful error output under the debug mode and in dmesg. By the way, I also test the Breakwater example application under Ubuntu 18.04 with Linux 4.15 (both client and server), but the problem still exists.
Thanks for trying with an older Linux kernel. Based on the information you provided, the client cannot establish a TCP connection with the server even before sending requests. It looks like the TCP SYN packets from the client are dropped somewhere, but I'm not sure where and what causes this for now.
Caladan's synthetic application runs fine because it uses UDP by default. As Caladan and Breakwater share the same runtime API for TCP connection, I think you will encounter the same problem if you run Caladan's synthetic application with TCP. To run the synthetic application with TCP, you need to add a command-line argument --transport tcp
.
Do you have any information which can be useful to determine where the TCP SYN packets are dropped (e.g. counters at client, server, switch)? I think DPDK (igb_uio driver), NIC, or switch may drop the packets.
+) Check firewalls too. They might block the packets.
@joshuafried Hi, I recheck the synthetic application example in Calanda (Ubuntu 18.04 with Linux 4.15), it turns out that after the first successful run, the following runs also have packet drops. Because it uses UDP, it does not have the connection error like TCP uses in Breakwater. Also, no error messages in dmesg and rebinding the NIC does not help. Packets are not dropped by the switch, since no packets tx info in the debug output of iokerneld.
The output:
Distribution, Target, Actual, Dropped, Never Sent, Median, 90th, 99th, 99.9tht
zero, 669, 441, 2047, 2913, 2053062.0, inf, inf, inf, inf, 1618448074, 1323146
zero, 1999, 1999, 0, 0, 27.0, 29.0, 35.0, 40.0, 43.0, 1618448098, 118741284479
zero, 2628, 2628, 0, 3354, 28.0, 1179937.0, 1587158.0, 1657009.0, 1662081.0, 5
[ 82.380460] CPU 06| <6> arp: responding to arp request from IP 192.168.2.2
[ 94.226195] CPU 06| <6> arp: responding to arp request from IP 192.168.2.2
[ 94.226199] CPU 02| <6> arp: responding to arp request from IP 192.168.2.2
zero, 2647, 2550, 867, 12137, 103622.0, 1214256.0, inf, inf, inf, 1618448147,3
[115.408214] CPU 06| <6> arp: responding to arp request from IP 192.168.2.2
zero, 3261, 2810, 2968, 23485, 450190.0, inf, inf, inf, inf, 1618448173, 86963
zero, 2852, 2836, 127, 30697, 791102.0, 2049957.0, 2318952.0, inf, inf, 161840
[166.703801] CPU 06| <6> arp: responding to arp request from IP 192.168.2.2
zero, 2539, 2539, 0, 41767, 102222.0, 2383708.0, 2658246.0, 2685632.0, 268902
I guess the problem is not limited to TCP or Breakwater, maybe dropped by DPDK? But now I have no idea how to obtain the information of drop packets.
Thanks! Are there other processes/interrupts that are competing for CPU time on the cores used by the iokernel and the runtime? It seems from this output that one or both of iokernel and synthetic are being interrupted for long periods of time. Caladan generally assumes exclusive use of the cores that it is managing, and you may need to configure your OS to ensure that this is the case (using isolcpus, interrupt pinning/disabling irqbalance, etc).
@joshuafried @inhocho89 The connection issue disappears after I remove the switch and directly connect the client and server.
I opened the DPDK log and no error TX stats. After that, I notice that each run, the real source MAC addresses in the log are different. At runtime/ioqueues.c, the generate_random_mac() function generates source MAC addresses randomly. As the switch maintains the MAC table, I guess it will drop packets if learned MAC address changes frequently. At iokernel/dpdk.c, in dpdk_port_init(), the original MAC address of the NIC is obtained by rte_eth_macaddr_get(port, &addr). As a result, there are two MAC addresses : the real MAC address (but is generated randomly) showed in output of ./apps/netbench/netbench and an original MAC address showed in output of ./iokerneld.
I wonder why a random generated MAC address is needed, why do not use the original MAC address of NIC? As for the problem I encountered, one direct way seems to use a fixed MAC address instead of a randomly generated MAC address.
Sounds good. I'm surprised the switch is not able to learn the MAC addresses quickly. We have never seen this problem with any switch that we've used. What is the model of this switch?
Each runtime instance on the machine has its own MAC so the iokernel (or hardware) can route inbound packets to the correct runtime. In theory we could have chosen to route packets to runtimes by IP address instead.
The host_mac option may help as well, as it would allow you to hardcode a MAC for each runtime.
The real reason is that STP is default enabled in our switch after I check the log in the switch... (not due to different mac, I was wrong before) Now the problem solved. Thanks a lot.
Hi Calanda, Breakwater developers, I try to run the Breakwater example application in readme but encounter problems. (Before that I do not encounter the same problem when I run the synthetic application in Calanda readme). The Breakwater example application does run successfully when I run it the first time. But when I want to run it for many times, (each time, I resetup the machine using sudo ./scripts/setup_machine.sh, then start iokerneld and the application at the client. so does at the server) I can never run it successfully again unless I reboot the machine. The IP-related configuration is right since I can run the synthetic application in Calanda successfully.
Environment
server.config file
Server output:
I omit the output of sudo ./iokerneld ias nobw no_hw_qdel. so does client.
client.config file
Client output: