shenango / caladan

Interference-aware CPU scheduling that enables performance isolation and high CPU utilization for datacenter servers
Apache License 2.0
117 stars 50 forks source link

Breakwater running error: couldn't connect to raddr #5

Closed Yi-ran closed 3 years ago

Yi-ran commented 3 years ago

Hi Calanda, Breakwater developers, I try to run the Breakwater example application in readme but encounter problems. (Before that I do not encounter the same problem when I run the synthetic application in Calanda readme). The Breakwater example application does run successfully when I run it the first time. But when I want to run it for many times, (each time, I resetup the machine using sudo ./scripts/setup_machine.sh, then start iokerneld and the application at the client. so does at the server) I can never run it successfully again unless I reboot the machine. The IP-related configuration is right since I can run the synthetic application in Calanda successfully.

Environment

server.config file

host_addr 192.168.2.2
host_netmask 255.255.255.0
host_gateway 192.168.2.1
runtime_kthreads 4
runtime_guaranteed_kthreads 4

Server output:

I omit the output of sudo ./iokerneld ias nobw no_hw_qdel. so does client.

~/caladan/breakwater: sudo ./apps/netbench/netbench breakwater ../server.config server
CPU 02| <5> cpu: detected 8 cores, 1 nodes
CPU 02| <5> time: detected 3407 ticks / us
CPU 02| <6> init: entering 'early' init
CPU 02| <6> init: entering 'normal' init
CPU 02| <6> init: entering 'late' init
[  0.000296] CPU 02| <5> loading configuration from '../server.config'
[  0.006781] CPU 02| <5> process pid: 151042
[  0.006786] CPU 02| <6> entering 'global' init phase
[  0.006788] CPU 02| <6> init -> kthread
[  0.006798] CPU 02| <6> init -> ioqueues
[  0.019990] CPU 02| <6> init -> stack
[  0.020006] CPU 02| <6> init -> sched
[  0.020011] CPU 02| <6> init -> preempt
[  0.020017] CPU 02| <6> init -> smalloc
[  0.020025] CPU 02| <6> init -> net
[  0.020699] CPU 02| <5> net: started network stack
[  0.020702] CPU 02| <5> net: using the following configuration:
[  0.020706] CPU 02| <5>   addr:    192.168.2.2
[  0.020710] CPU 02| <5>   netmask: 255.255.255.0
[  0.020713] CPU 02| <5>   gateway: 192.168.2.1
[  0.020716] CPU 02| <5>   mac: EA:82:2E:DD:AE:1E
[  0.020720] CPU 02| <6> init -> directpath
[  0.020723] CPU 02| <6> init -> arp
[  0.020734] CPU 02| <6> init -> trans
[  0.020825] CPU 02| <6> init -> storage
[  0.020828] CPU 02| <6> init -> gc
[  0.020854] CPU 02| <5> thread: created thread 0
[  0.020857] CPU 02| <6> init: entering 'thread' init
[  0.020860] CPU 02| <6> entering 'per-thread' init phase
[  0.020863] CPU 02| <6> init -> kthread
[  0.020867] CPU 02| <6> init -> ioqueues
[  0.020870] CPU 02| <6> init -> stack
[  0.020873] CPU 02| <6> init -> timer
[  0.020882] CPU 02| <6> init -> sched
[  0.020922] CPU 02| <6> init -> smalloc
[  0.020925] CPU 02| <6> init -> net
[  0.020928] CPU 02| <6> init -> directpath
[  0.020931] CPU 02| <6> init -> storage
[  0.020933] CPU 02| <5> spawning 4 kthreads
[  0.021004] CPU 01| <5> thread: created thread 1
[  0.021014] CPU 01| <6> init: entering 'thread' init
[  0.021018] CPU 01| <6> entering 'per-thread' init phase
[  0.021030] CPU 01| <6> init -> kthread
[  0.021022] CPU 03| <5> thread: created thread 2
[  0.021042] CPU 03| <6> init: entering 'thread' init
[  0.021046] CPU 03| <6> entering 'per-thread' init phase
[  0.021050] CPU 03| <6> init -> kthread
[  0.021052] CPU 02| <5> thread: created thread 3
[  0.021061] CPU 02| <6> init: entering 'thread' init
[  0.021065] CPU 03| <6> init -> ioqueues
[  0.021054] CPU 01| <6> init -> ioqueues
[  0.021067] CPU 02| <6> entering 'per-thread' init phase
[  0.021075] CPU 01| <6> init -> stack
[  0.021082] CPU 01| <6> init -> timer
[  0.021072] CPU 03| <6> init -> stack
[  0.021092] CPU 01| <6> init -> sched
[  0.021094] CPU 03| <6> init -> timer
[  0.021077] CPU 02| <6> init -> kthread
[  0.021107] CPU 03| <6> init -> sched
[  0.021121] CPU 02| <6> init -> ioqueues
[  0.021126] CPU 02| <6> init -> stack
[  0.021129] CPU 02| <6> init -> timer
[  0.021134] CPU 02| <6> init -> sched
[  0.021161] CPU 01| <6> init -> smalloc
[  0.021165] CPU 01| <6> init -> net
[  0.021167] CPU 01| <6> init -> directpath
[  0.021171] CPU 01| <6> init -> storage
[  0.021181] CPU 03| <6> init -> smalloc
[  0.021184] CPU 03| <6> init -> net
[  0.021187] CPU 03| <6> init -> directpath
[  0.021190] CPU 03| <6> init -> storage
[  0.021192] CPU 02| <6> init -> smalloc
[  0.021200] CPU 02| <6> init -> net
[  0.021203] CPU 02| <6> init -> directpath
[  0.021206] CPU 02| <6> init -> storage
[  0.021237] CPU 03| <6> entering 'late' init phase
[  0.021241] CPU 03| <6> init -> arp
[  0.021246] CPU 03| <6> init -> stat
[  0.021253] CPU 03| <6> init -> tcp
[  0.021257] CPU 03| <6> init -> rcu
[  0.021261] CPU 03| <6> init -> directpath

client.config file

host_addr 192.168.2.3
host_netmask 255.255.255.0
host_gateway 192.168.2.1
runtime_kthreads 4
runtime_spinning_kthreads 4
runtime_guaranteed_kthreads 4
runtime_priority lc

Client output:

/caladan/breakwater$ ./apps/netbench/netbench breakwater ../client.config client 100 192.168.2.2 10 exp 100 0 100000
CPU 06| <5> cpu: detected 8 cores, 1 nodes
CPU 06| <5> time: detected 3408 ticks / us
CPU 06| <6> init: entering 'early' init
CPU 06| <6> init: entering 'normal' init
CPU 06| <6> init: entering 'late' init
[  0.000290] CPU 06| <5> loading configuration from '../client.config'
[  0.000309] CPU 06| <5> process pid: 859656
[  0.000313] CPU 06| <6> entering 'global' init phase
[  0.000316] CPU 06| <6> init -> kthread
[  0.000323] CPU 06| <6> init -> ioqueues
[  0.013486] CPU 06| <6> init -> stack
[  0.013502] CPU 06| <6> init -> sched
[  0.013508] CPU 06| <6> init -> preempt
[  0.013512] CPU 06| <6> init -> smalloc
[  0.013520] CPU 06| <6> init -> net
[  0.014196] CPU 06| <5> net: started network stack
[  0.014200] CPU 06| <5> net: using the following configuration:
[  0.014204] CPU 06| <5>   addr:    192.168.2.3
[  0.014207] CPU 06| <5>   netmask: 255.255.255.0
[  0.014210] CPU 06| <5>   gateway: 192.168.2.1
[  0.014212] CPU 06| <5>   mac: 1E:C3:48:5A:08:5B
[  0.014216] CPU 06| <6> init -> directpath
[  0.014650] CPU 06| <6> init -> arp
[  0.014661] CPU 06| <6> init -> trans
[  0.014745] CPU 06| <6> init -> storage
[  0.014752] CPU 06| <6> init -> gc
[  0.014777] CPU 06| <5> thread: created thread 0
[  0.014780] CPU 06| <6> init: entering 'thread' init
[  0.014782] CPU 06| <6> entering 'per-thread' init phase
[  0.014788] CPU 06| <6> init -> kthread
[  0.014792] CPU 06| <6> init -> ioqueues
[  0.014796] CPU 06| <6> init -> stack
[  0.014798] CPU 06| <6> init -> timer
[  0.014804] CPU 06| <6> init -> sched
[  0.014844] CPU 06| <6> init -> smalloc
[  0.014847] CPU 06| <6> init -> net
[  0.014849] CPU 06| <6> init -> directpath
[  0.014850] CPU 06| <6> init -> storage
[  0.014852] CPU 06| <5> spawning 4 kthreads
[  0.014912] CPU 01| <5> thread: created thread 1
[  0.014921] CPU 01| <6> init: entering 'thread' init
[  0.014925] CPU 01| <6> entering 'per-thread' init phase
[  0.014930] CPU 01| <6> init -> kthread
[  0.014939] CPU 07| <5> thread: created thread 2
[  0.014947] CPU 07| <6> init: entering 'thread' init
[  0.014951] CPU 07| <6> entering 'per-thread' init phase
[  0.014955] CPU 01| <6> init -> ioqueues
[  0.014963] CPU 01| <6> init -> stack
[  0.014967] CPU 01| <6> init -> timer
[  0.014955] CPU 07| <6> init -> kthread
[  0.014957] CPU 06| <5> thread: created thread 3
[  0.014986] CPU 06| <6> init: entering 'thread' init
[  0.014990] CPU 06| <6> entering 'per-thread' init phase
[  0.014994] CPU 06| <6> init -> kthread
[  0.014973] CPU 01| <6> init -> sched
[  0.014994] CPU 07| <6> init -> ioqueues
[  0.015005] CPU 06| <6> init -> ioqueues
[  0.015019] CPU 06| <6> init -> stack
[  0.015024] CPU 06| <6> init -> timer
[  0.015013] CPU 07| <6> init -> stack
[  0.015035] CPU 01| <6> init -> smalloc
[  0.015037] CPU 07| <6> init -> timer
[  0.015040] CPU 01| <6> init -> net
[  0.015048] CPU 07| <6> init -> sched
[  0.015050] CPU 01| <6> init -> directpath
[  0.015059] CPU 01| <6> init -> storage
[  0.015031] CPU 06| <6> init -> sched
[  0.015116] CPU 06| <6> init -> smalloc
[  0.015119] CPU 06| <6> init -> net
[  0.015122] CPU 06| <6> init -> directpath
[  0.015124] CPU 06| <6> init -> storage
[  0.015124] CPU 07| <6> init -> smalloc
[  0.015135] CPU 07| <6> init -> net
[  0.015139] CPU 07| <6> init -> directpath
[  0.015143] CPU 07| <6> init -> storage
[  0.015177] CPU 06| <6> entering 'late' init phase
[  0.015180] CPU 06| <6> init -> arp
[  0.015185] CPU 06| <6> init -> stat
[  0.015191] CPU 06| <6> init -> tcp
[  0.015195] CPU 06| <6> init -> rcu
[  0.015199] CPU 06| <6> init -> directpath
num_threads,offered_load,throughput,goodput,cpu,min,mean,p50,p90,p99,p999,p9999,max,p1_win,mean_win,p99_win,p1_q,mean_q,p99_q,mean_stime,p99_stime,server:rx_pps,server:tx_pps,server:rx_bps,server:tx_bps,server:rx_drops_pps,server:rx_ooo_pps,server:winu_rx_pps,server:winu_tx_pps,server:win_tx_wps,server:req_rx_pps,server:req_drop_rate,server:resp_tx_pps,client:min_tput,client:max_tput,client:winu_rx_pps,client:winu_tx_pps,client:resp_rx_pps,client:req_tx_pps,client:win_expired_wps,client:req_dropped_rps
[  0.015883] CPU 01| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  0.015895] CPU 01| <6> tcp: 0x100000019000 CREATE -> SYN-SENT
[  0.325878] CPU 05| <6> tcp: 0x100000019000 retransmission timeout
[  0.325890] CPU 01| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  0.625914] CPU 05| <6> tcp: 0x100000019000 retransmission timeout
[  0.625924] CPU 01| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  0.925954] CPU 01| <6> tcp: 0x100000019000 retransmission timeout
[  0.925964] CPU 05| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  1.225988] CPU 02| <6> tcp: 0x100000019000 retransmission timeout
[  1.226002] CPU 06| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  1.526025] CPU 01| <6> tcp: 0x100000019000 retransmission timeout
[  1.526033] CPU 06| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  1.826060] CPU 01| <6> tcp: 0x100000019000 retransmission timeout
[  1.826074] CPU 02| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  2.126100] CPU 05| <6> tcp: 0x100000019000 retransmission timeout
[  2.126108] CPU 01| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  2.426134] CPU 01| <6> tcp: 0x100000019000 retransmission timeout
[  2.426141] CPU 05| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  2.726169] CPU 02| <6> tcp: 0x100000019000 retransmission timeout
[  2.726177] CPU 05| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  3.026204] CPU 06| <6> tcp: 0x100000019000 retransmission timeout
[  3.026212] CPU 05| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  3.326239] CPU 02| <6> tcp: 0x100000019000 retransmission timeout
[  3.326248] CPU 05| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  3.626277] CPU 05| <6> tcp: 0x100000019000 retransmission timeout
[  3.626284] CPU 01| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  3.926313] CPU 02| <6> tcp: 0x100000019000 retransmission timeout
[  3.926322] CPU 06| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  4.226350] CPU 02| <6> tcp: 0x100000019000 retransmission timeout
[  4.226359] CPU 06| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  4.526388] CPU 06| <6> tcp: 0x100000019000 retransmission timeout
[  4.526395] CPU 01| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  4.826423] CPU 01| <6> tcp: 0x100000019000 retransmission timeout
[  4.826430] CPU 06| <6> tcp: 0x100000019000 192.168.2.3:56120 -> 192.168.2.2:8123 FLAGS=SYN SEQ=ISS+0 ACK=IRS+0 WND=64240 LEN=0
[  5.016447] CPU 02| <6> tcp: 0x100000019000 SYN-SENT -> CLOSED
[  5.016453] CPU 06| <0> couldn't connect to raddr.
[  5.016458] CPU 06| <5> init: shutting down -> FAILURE
joshuafried commented 3 years ago

Can you attach the debug output from the iokernel instances also?

inhocho89 commented 3 years ago

I doubt the DPDK version (19.11) current Caladan/Breakwater supports might not be compatible with your Ubuntu 20.04 Note that Ubuntu 20.04 (Kernel 5.4) has not been tested in Intel platforms with Intel NICs combinations before DPDK 20.05 (https://doc.dpdk.org/guides-20.11/rel_notes/release_19_11.html) Can you check whether you see the same issues with an older Linux kernel? Ubuntu 18.04 with Linux 4.15 which is used for Breakwater evaluation or Ubuntu 18.04 with Linux 5.2 which is used for Caladan would be the best options to try.

Yi-ran commented 3 years ago

Can you attach the debug output from the iokernel instances also?

Server:

~/caladan$ sudo ./iokerneld ias nobw no_hw_qdel CPU 07| <6> entering 'iokernel' init phase CPU 07| <6> init -> base CPU 07| <5> cpu: detected 8 cores, 1 nodes CPU 07| <5> time: detected 3407 ticks / us CPU 07| <6> init: entering 'early' init CPU 07| <6> init: entering 'normal' init CPU 07| <6> init: entering 'late' init [ 0.000339] CPU 07| <6> init -> ksched [ 0.000358] CPU 07| <6> init -> sched [ 0.000362] CPU 07| <5> sched: CPU configuration... node 0: [0,4][1,5][2,6][3,7] [ 0.000372] CPU 07| <5> sched: dataplane on 4, control on 0 [ 0.000375] CPU 07| <6> init -> simple [ 0.000381] CPU 07| <6> init -> numa [ 0.000383] CPU 07| <6> init -> ias [ 0.000387] CPU 07| <6> init -> control [ 0.026285] CPU 07| <5> control: spawning control thread [ 0.026364] CPU 07| <6> init -> dpdk EAL: Detected 8 lcore(s) EAL: Detected 1 NUMA nodes EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' EAL: No available hugepages reported in hugepages-1048576kB EAL: Probing VFIO support... EAL: VFIO support initialized EAL: PCI device 0000:00:1f.6 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:15b7 net_e1000_em EAL: PCI device 0000:01:00.0 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:10fb net_ixgbe EAL: PCI device 0000:01:00.1 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:10fb net_ixgbe [ 1.482301] CPU 04| <6> init -> rx EAL: WARNING! Base virtual address hint (0x1100805000 != 0x7f683838a000) not respected! EAL: This may cause issues with mapping memory into secondary processes [ 1.589685] CPU 04| <6> init -> tx [ 1.592309] CPU 04| <6> init -> dp_clients [ 1.592322] CPU 04| <6> init -> dpdk_late [ 3.138781] CPU 04| <5> dpdk: driver: net_ixgbe port 0 MAC: 00 1b 21 88 d2 94 [ 3.138792] CPU 04| <6> init -> hw_timestamp [ 3.138797] CPU 04| <5> main: core 4 running dataplane. [Ctrl+C to quit] [ 3.735374] CPU 04| <6> rx: received 1 packets on port 0 [ 3.735384] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 3.735388] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 3.735446] CPU 04| <6> rx: received 1 packets on port 0 [ 3.735449] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 3.735453] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e [ 5.622595] CPU 04| <6> rx: received 1 packets on port 0 [ 5.622601] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 5.622605] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e [ 6.438784] CPU 04| <6> rx: received 1 packets on port 0 [ 6.438791] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 6.438795] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 6.622797] CPU 04| <6> rx: received 1 packets on port 0 [ 6.622801] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 6.622804] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e [ 7.623111] CPU 04| <6> rx: received 1 packets on port 0 [ 7.623116] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 7.623121] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e [ 8.439352] CPU 04| <6> rx: received 1 packets on port 0 [ 8.439357] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 8.439360] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 .......... (repeat above)

Client (Client and server are connected via a switch)

~/caladan$ sudo ./iokerneld ias nobw no_hw_qdel CPU 00| <6> entering 'iokernel' init phase CPU 00| <6> init -> base CPU 00| <5> cpu: detected 8 cores, 1 nodes CPU 00| <5> time: detected 3408 ticks / us CPU 00| <6> init: entering 'early' init CPU 00| <6> init: entering 'normal' init CPU 00| <6> init: entering 'late' init [ 0.000287] CPU 00| <6> init -> ksched [ 0.000304] CPU 00| <6> init -> sched [ 0.000312] CPU 00| <5> sched: CPU configuration... node 0: [0,4][1,5][2,6][3,7] [ 0.000319] CPU 00| <5> sched: dataplane on 4, control on 0 [ 0.000322] CPU 00| <6> init -> simple [ 0.000327] CPU 00| <6> init -> numa [ 0.000330] CPU 00| <6> init -> ias [ 0.000334] CPU 00| <6> init -> control [ 0.025872] CPU 00| <5> control: spawning control thread [ 0.025942] CPU 00| <6> init -> dpdk EAL: Detected 8 lcore(s) EAL: Detected 1 NUMA nodes EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'PA' EAL: No available hugepages reported in hugepages-1048576kB EAL: Probing VFIO support... EAL: VFIO support initialized EAL: PCI device 0000:00:1f.6 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:15b7 net_e1000_em EAL: PCI device 0000:01:00.0 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:10fb net_ixgbe EAL: PCI device 0000:01:00.1 on NUMA socket -1 EAL: Invalid NUMA socket, default to 0 EAL: probe driver: 8086:10fb net_ixgbe [ 1.487067] CPU 04| <6> init -> rx EAL: WARNING! Base virtual address hint (0x1100805000 != 0x7f8c3558d000) not respected! EAL: This may cause issues with mapping memory into secondary processes [ 1.519877] CPU 04| <6> init -> tx [ 1.522722] CPU 04| <6> init -> dp_clients [ 1.522735] CPU 04| <6> init -> dpdk_late [ 3.142901] CPU 04| <5> dpdk: driver: net_ixgbe port 0 MAC: 00 1b 21 89 6e ac [ 3.142910] CPU 04| <6> init -> hw_timestamp [ 3.142915] CPU 04| <5> main: core 4 running dataplane. [Ctrl+C to quit] [ 3.570855] CPU 04| <6> rx: received 1 packets on port 0 [ 3.570861] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 3.570866] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e [ 3.571357] CPU 04| <6> rx: received 1 packets on port 0 [ 3.571360] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 3.571364] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 4.905497] CPU 04| <6> rx: received 1 packets on port 0 [ 4.905501] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 4.905505] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e [ 5.716392] CPU 04| <6> rx: received 1 packets on port 0 [ 5.716397] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 5.716401] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 5.900380] CPU 04| <6> rx: received 1 packets on port 0 [ 5.900384] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 5.900388] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e [ 6.954245] CPU 04| <6> rx: received 1 packets on port 0 [ 6.954249] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 6.954252] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e [ 7.716403] CPU 04| <6> rx: received 1 packets on port 0 [ 7.716407] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 7.716411] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 9.716431] CPU 04| <6> rx: received 1 packets on port 0 [ 9.716436] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 9.716441] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 11.716449] CPU 04| <6> rx: received 1 packets on port 0 [ 11.716453] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 11.716458] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 13.716467] CPU 04| <6> rx: received 1 packets on port 0 [ 13.716471] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 13.716475] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 15.716480] CPU 04| <6> rx: received 1 packets on port 0 [ 15.716486] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 15.716490] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 17.716504] CPU 04| <6> rx: received 1 packets on port 0 [ 17.716509] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 17.716513] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 19.716519] CPU 04| <6> rx: received 1 packets on port 0 [ 19.716524] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 19.716528] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 21.716542] CPU 04| <6> rx: received 1 packets on port 0 [ 21.716546] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 21.716550] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 23.289722] CPU 04| <6> tx: transmitted 1 packets on port 0 [ 23.716562] CPU 04| <6> rx: received 1 packets on port 0 [ 23.716567] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 23.716571] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 25.289688] CPU 04| <6> tx: transmitted 1 packets on port 0 [ 25.716579] CPU 04| <6> rx: received 1 packets on port 0 [ 25.716584] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 25.716589] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 26.289717] CPU 04| <6> tx: transmitted 1 packets on port 0 [ 27.289746] CPU 04| <6> tx: transmitted 1 packets on port 0 [ 27.716597] CPU 04| <6> rx: received 1 packets on port 0 [ 27.716602] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 27.716605] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 29.716613] CPU 04| <6> rx: received 1 packets on port 0 [ 29.716618] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 29.716622] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 31.716632] CPU 04| <6> rx: received 1 packets on port 0 [ 31.716637] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 31.716640] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 33.716647] CPU 04| <6> rx: received 1 packets on port 0 [ 33.716652] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 33.716656] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 33.720804] CPU 04| <6> rx: received 1 packets on port 0 [ 33.720808] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 33.720811] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 36.717163] CPU 04| <6> rx: received 1 packets on port 0 [ 36.717167] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 00 [ 36.717171] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 0 [ 36.900714] CPU 04| <6> rx: received 1 packets on port 0 [ 36.900718] CPU 04| <6> rx: rx packet with MAC 01 80 c2 00 00 0e [ 36.900721] CPU 04| <6> rx: unhandled packet with MAC 1 80 c2 0 0 e ........

joshuafried commented 3 years ago

Thanks! Could you also check dmesg and see if there are any relevant messages there (perhaps from igb_uio)? Can you try unbinding and re-binding the NIC to the igb_uio driver between experiments? You can use the dpdk-setup.sh script in dpdk/usertools

Yi-ran commented 3 years ago

Thanks! Could you also check dmesg and see if there are any relevant messages there (perhaps from igb_uio)? Can you try unbinding and re-binding the NIC to the igb_uio driver between experiments? You can use the dpdk-setup.sh script in dpdk/usertools

Thanks. Unfortunately, I do not notice any special warning messages in dmesg. I also try unbinding and re-binding the NIC at both client and server, but the problem remains. I'll try to run under the Ubuntu version and kernel version as suggested by @inhocho89

inhocho89 commented 3 years ago

@Yi-ran You mentioned that it works fine after the machine reboot. Did you have to reboot the server, the client, or both?

Yi-ran commented 3 years ago

@Yi-ran You mentioned that it works fine after the machine reboot. Did you have to reboot the server, the client, or both?

Only rebooting the client is needed. I guess that the client encounters problems after sending requests? However, no other useful error output under the debug mode and in dmesg. By the way, I also test the Breakwater example application under Ubuntu 18.04 with Linux 4.15 (both client and server), but the problem still exists.

inhocho89 commented 3 years ago

Thanks for trying with an older Linux kernel. Based on the information you provided, the client cannot establish a TCP connection with the server even before sending requests. It looks like the TCP SYN packets from the client are dropped somewhere, but I'm not sure where and what causes this for now.

Caladan's synthetic application runs fine because it uses UDP by default. As Caladan and Breakwater share the same runtime API for TCP connection, I think you will encounter the same problem if you run Caladan's synthetic application with TCP. To run the synthetic application with TCP, you need to add a command-line argument --transport tcp.

Do you have any information which can be useful to determine where the TCP SYN packets are dropped (e.g. counters at client, server, switch)? I think DPDK (igb_uio driver), NIC, or switch may drop the packets.

+) Check firewalls too. They might block the packets.

Yi-ran commented 3 years ago

@joshuafried Hi, I recheck the synthetic application example in Calanda (Ubuntu 18.04 with Linux 4.15), it turns out that after the first successful run, the following runs also have packet drops. Because it uses UDP, it does not have the connection error like TCP uses in Breakwater. Also, no error messages in dmesg and rebinding the NIC does not help. Packets are not dropped by the switch, since no packets tx info in the debug output of iokerneld.

The output:

Distribution, Target, Actual, Dropped, Never Sent, Median, 90th, 99th, 99.9tht
zero, 669, 441, 2047, 2913, 2053062.0, inf, inf, inf, inf, 1618448074, 1323146
zero, 1999, 1999, 0, 0, 27.0, 29.0, 35.0, 40.0, 43.0, 1618448098, 118741284479
zero, 2628, 2628, 0, 3354, 28.0, 1179937.0, 1587158.0, 1657009.0, 1662081.0, 5
[ 82.380460] CPU 06| <6> arp: responding to arp request from IP 192.168.2.2
[ 94.226195] CPU 06| <6> arp: responding to arp request from IP 192.168.2.2
[ 94.226199] CPU 02| <6> arp: responding to arp request from IP 192.168.2.2
zero, 2647, 2550, 867, 12137, 103622.0, 1214256.0, inf, inf, inf, 1618448147,3
[115.408214] CPU 06| <6> arp: responding to arp request from IP 192.168.2.2
zero, 3261, 2810, 2968, 23485, 450190.0, inf, inf, inf, inf, 1618448173, 86963
zero, 2852, 2836, 127, 30697, 791102.0, 2049957.0, 2318952.0, inf, inf, 161840
[166.703801] CPU 06| <6> arp: responding to arp request from IP 192.168.2.2
zero, 2539, 2539, 0, 41767, 102222.0, 2383708.0, 2658246.0, 2685632.0, 268902

I guess the problem is not limited to TCP or Breakwater, maybe dropped by DPDK? But now I have no idea how to obtain the information of drop packets.

joshuafried commented 3 years ago

Thanks! Are there other processes/interrupts that are competing for CPU time on the cores used by the iokernel and the runtime? It seems from this output that one or both of iokernel and synthetic are being interrupted for long periods of time. Caladan generally assumes exclusive use of the cores that it is managing, and you may need to configure your OS to ensure that this is the case (using isolcpus, interrupt pinning/disabling irqbalance, etc).

Yi-ran commented 3 years ago

@joshuafried @inhocho89 The connection issue disappears after I remove the switch and directly connect the client and server.

I opened the DPDK log and no error TX stats. After that, I notice that each run, the real source MAC addresses in the log are different. At runtime/ioqueues.c, the generate_random_mac() function generates source MAC addresses randomly. As the switch maintains the MAC table, I guess it will drop packets if learned MAC address changes frequently. At iokernel/dpdk.c, in dpdk_port_init(), the original MAC address of the NIC is obtained by rte_eth_macaddr_get(port, &addr). As a result, there are two MAC addresses : the real MAC address (but is generated randomly) showed in output of ./apps/netbench/netbench and an original MAC address showed in output of ./iokerneld.

I wonder why a random generated MAC address is needed, why do not use the original MAC address of NIC? As for the problem I encountered, one direct way seems to use a fixed MAC address instead of a randomly generated MAC address.

joshuafried commented 3 years ago

Sounds good. I'm surprised the switch is not able to learn the MAC addresses quickly. We have never seen this problem with any switch that we've used. What is the model of this switch?

Each runtime instance on the machine has its own MAC so the iokernel (or hardware) can route inbound packets to the correct runtime. In theory we could have chosen to route packets to runtimes by IP address instead.

abelay commented 3 years ago

The host_mac option may help as well, as it would allow you to hardcode a MAC for each runtime.

Yi-ran commented 3 years ago

The real reason is that STP is default enabled in our switch after I check the log in the switch... (not due to different mac, I was wrong before) Now the problem solved. Thanks a lot.