Closed JisunChae closed 1 year ago
Hi @JisunChae
Thanks for reporting this, it is a bit weird indeed. You can probably check the output of show errors
in vppctl
to see if anything weird appears there. If nothing clear shows up, maybe try doing a clear run
; show run
while the test is running to see what is happening within VPP.
In the meantime we'll try to reproduce this setup to get a sense of the throughput to expect on this setup. One thing to keep in mind is EKS has a limitation of 5Gbps per flow which over IPIP means that all the node to node traffic will be considered as a single flow and thus bound to 5Gbps. Using VXLan instead (and multiple iperf flows) might help here, but I haven't tested yet what figures it gives you.
But definitely the first thing to understand is why DPDK doesn't yield 5Gbps as well in you setup.
Hi, whats the ec2 instance type you are using?
Hi, whats the ec2 instance type you are using?
I tested with m5.xlarge
I would like ask you few more questions.
Using VXLan instead (and multiple iperf flows) might help here
- First, I thought calico-vpp needs to have BGP enabled, so I previously tried with IPIP which supports BGP. Now I would like to try the same process with VXLan as your suggestion. However, VXLan seems to have BGP option disabled. According to the guide, Calico-vpp needs to have BGP option enabled, and these two thoughts are conflicting. So I was wondering how do I go from here.
show trace
command. Does that mean packet didn't reach to vpp at all?For your reference.
2022/08/02 00:52:05:078 notice plugin/load Loaded plugin: abf_plugin.so (Access Control List (ACL) Based Forwarding)
2022/08/02 00:52:05:081 notice plugin/load Loaded plugin: acl_plugin.so (Access Control Lists (ACL))
2022/08/02 00:52:05:082 notice plugin/load Loaded plugin: adl_plugin.so (Allow/deny list plugin)
2022/08/02 00:52:05:082 notice plugin/load Loaded plugin: af_xdp_plugin.so (AF_XDP Device Plugin)
2022/08/02 00:52:05:082 notice plugin/load Loaded plugin: arping_plugin.so (Arping (arping))
2022/08/02 00:52:05:083 notice plugin/load Loaded plugin: avf_plugin.so (Intel Adaptive Virtual Function (AVF) Device Driver)
2022/08/02 00:52:05:083 notice plugin/load Loaded plugin: builtinurl_plugin.so (vpp built-in URL support)
2022/08/02 00:52:05:083 notice plugin/load Loaded plugin: capo_plugin.so (Calico Policy)
2022/08/02 00:52:05:083 notice plugin/load Loaded plugin: cdp_plugin.so (Cisco Discovery Protocol (CDP))
2022/08/02 00:52:05:083 notice plugin/load Loaded plugin: cnat_plugin.so (CNat Translate)
2022/08/02 00:52:05:107 notice plugin/load Loaded plugin: crypto_ipsecmb_plugin.so (Intel IPSEC Multi-buffer Crypto Engine)
2022/08/02 00:52:05:107 notice plugin/load Loaded plugin: crypto_native_plugin.so (Intel IA32 Software Crypto Engine)
2022/08/02 00:52:05:108 notice plugin/load Loaded plugin: crypto_openssl_plugin.so (OpenSSL Crypto Engine)
2022/08/02 00:52:05:108 notice plugin/load Loaded plugin: crypto_sw_scheduler_plugin.so (SW Scheduler Crypto Async Engine plugin)
2022/08/02 00:52:05:108 notice plugin/load Loaded plugin: ct6_plugin.so (IPv6 Connection Tracker)
2022/08/02 00:52:05:108 notice plugin/load Loaded plugin: det44_plugin.so (Deterministic NAT (CGN))
2022/08/02 00:52:05:109 notice plugin/load Loaded plugin: dhcp_plugin.so (Dynamic Host Configuration Protocol (DHCP))
2022/08/02 00:52:05:109 notice plugin/load Loaded plugin: dns_plugin.so (Simple DNS name resolver)
2022/08/02 00:52:05:121 notice plugin/load Loaded plugin: dpdk_plugin.so (Data Plane Development Kit (DPDK))
2022/08/02 00:52:05:121 notice plugin/load Loaded plugin: dslite_plugin.so (Dual-Stack Lite)
2022/08/02 00:52:05:121 notice plugin/load Loaded plugin: flowprobe_plugin.so (Flow per Packet)
2022/08/02 00:52:05:121 notice plugin/load Loaded plugin: geneve_plugin.so (GENEVE Tunnels)
2022/08/02 00:52:05:121 notice plugin/load Loaded plugin: gtpu_plugin.so (GPRS Tunnelling Protocol, User Data (GTPv1-U))
2022/08/02 00:52:05:121 notice plugin/load Loaded plugin: hs_apps_plugin.so (Host Stack Applications)
2022/08/02 00:52:05:121 notice plugin/load Loaded plugin: hsi_plugin.so (Host Stack Intercept (HSI))
2022/08/02 00:52:05:122 notice plugin/load Loaded plugin: http_plugin.so (Hypertext Transfer Protocol (HTTP))
2022/08/02 00:52:05:122 notice plugin/load Loaded plugin: http_static_plugin.so (HTTP Static Server)
2022/08/02 00:52:05:122 notice plugin/load Loaded plugin: igmp_plugin.so (Internet Group Management Protocol (IGMP))
2022/08/02 00:52:05:122 notice plugin/load Loaded plugin: ikev2_plugin.so (Internet Key Exchange (IKEv2) Protocol)
2022/08/02 00:52:05:122 notice plugin/load Loaded plugin: ila_plugin.so (Identifier Locator Addressing (ILA) for IPv6)
2022/08/02 00:52:05:122 notice plugin/load Loaded plugin: ioam_plugin.so (Inbound Operations, Administration, and Maintenance (OAM))
2022/08/02 00:52:05:122 notice plugin/load Loaded plugin: l2tp_plugin.so (Layer 2 Tunneling Protocol v3 (L2TP))
2022/08/02 00:52:05:122 notice plugin/load Loaded plugin: l3xc_plugin.so (L3 Cross-Connect (L3XC))
2022/08/02 00:52:05:122 notice plugin/load Loaded plugin: lacp_plugin.so (Link Aggregation Control Protocol (LACP))
2022/08/02 00:52:05:123 notice plugin/load Loaded plugin: lb_plugin.so (Load Balancer (LB))
2022/08/02 00:52:05:123 notice plugin/load Plugin disabled (default): linux_cp_plugin.so
2022/08/02 00:52:05:123 notice plugin/load Plugin disabled (default): linux_cp_unittest_plugin.so
2022/08/02 00:52:05:123 notice plugin/load Plugin disabled (default): linux_nl_plugin.so
2022/08/02 00:52:05:123 notice plugin/load Loaded plugin: lisp_plugin.so (Locator ID Separation Protocol (LISP))
2022/08/02 00:52:05:123 notice plugin/load Plugin disabled (default): lisp_unittest_plugin.so
2022/08/02 00:52:05:123 notice plugin/load Loaded plugin: lldp_plugin.so (Link Layer Discovery Protocol (LLDP))
2022/08/02 00:52:05:123 notice plugin/load Loaded plugin: mactime_plugin.so (Time-based MAC Source Address Filter)
2022/08/02 00:52:05:123 notice plugin/load Loaded plugin: map_plugin.so (Mapping of Address and Port (MAP))
2022/08/02 00:52:05:123 notice plugin/load Loaded plugin: mdata_plugin.so (Buffer metadata change tracker.)
2022/08/02 00:52:05:124 notice plugin/load Loaded plugin: memif_plugin.so (Packet Memory Interface (memif) – Experimental)
2022/08/02 00:52:05:124 notice plugin/load Loaded plugin: mss_clamp_plugin.so (TCP MSS clamping plugin)
2022/08/02 00:52:05:124 notice plugin/load Loaded plugin: nat44_ei_plugin.so (IPv4 Endpoint-Independent NAT (NAT44 EI))
2022/08/02 00:52:05:124 notice plugin/load Loaded plugin: nat64_plugin.so (NAT64)
2022/08/02 00:52:05:124 notice plugin/load Loaded plugin: nat66_plugin.so (NAT66)
2022/08/02 00:52:05:125 notice plugin/load Loaded plugin: nat_plugin.so (Network Address Translation (NAT))
2022/08/02 00:52:05:125 notice plugin/load Loaded plugin: nsh_plugin.so (Network Service Header (NSH))
2022/08/02 00:52:05:125 notice plugin/load Loaded plugin: nsim_plugin.so (Network Delay Simulator)
2022/08/02 00:52:05:125 notice plugin/load Loaded plugin: pbl_plugin.so (Port based balancer (PBL))
2022/08/02 00:52:05:125 notice plugin/load Plugin disabled: ping_plugin.so
2022/08/02 00:52:05:125 notice plugin/load Loaded plugin: pnat_plugin.so (Policy 1:1 NAT)
2022/08/02 00:52:05:125 notice plugin/load Loaded plugin: pppoe_plugin.so (PPP over Ethernet (PPPoE))
2022/08/02 00:52:05:126 notice plugin/load Loaded plugin: prom_plugin.so (Prometheus Stats Exporter)
2022/08/02 00:52:05:126 notice plugin/load Plugin disabled (default): quic_plugin.so
2022/08/02 00:52:05:126 notice plugin/load Loaded plugin: rdma_plugin.so (RDMA IBverbs Device Driver)
2022/08/02 00:52:05:126 notice plugin/load Loaded plugin: srv6ad_plugin.so (Dynamic Segment Routing for IPv6 (SRv6) Proxy)
2022/08/02 00:52:05:127 notice plugin/load Loaded plugin: srv6adflow_plugin.so (Dynamic Segment Routing for IPv6 (SRv6) Proxy)
2022/08/02 00:52:05:127 notice plugin/load Loaded plugin: srv6am_plugin.so (Masquerading Segment Routing for IPv6 (SRv6) Proxy)
2022/08/02 00:52:05:127 notice plugin/load Loaded plugin: srv6as_plugin.so (Static Segment Routing for IPv6 (SRv6) Proxy)
2022/08/02 00:52:05:127 notice plugin/load Loaded plugin: srv6mobile_plugin.so (SRv6 GTP Endpoint Functions)
2022/08/02 00:52:05:127 notice plugin/load Loaded plugin: stn_plugin.so (VPP Steals the NIC (STN) for Container Integration)
2022/08/02 00:52:05:127 notice plugin/load Loaded plugin: svs_plugin.so (Source Virtual Routing and Forwarding (VRF) Select)
2022/08/02 00:52:05:127 notice plugin/load Loaded plugin: tlsmbedtls_plugin.so (Transport Layer Security (TLS) Engine, Mbedtls Based)
2022/08/02 00:52:05:127 notice plugin/load Loaded plugin: tlsopenssl_plugin.so (Transport Layer Security (TLS) Engine, OpenSSL Based)
2022/08/02 00:52:05:127 notice plugin/load Loaded plugin: tlspicotls_plugin.so (Transport Layer Security (TLS) Engine, Picotls Based)
2022/08/02 00:52:05:128 notice plugin/load Loaded plugin: urpf_plugin.so (Unicast Reverse Path Forwarding (uRPF))
2022/08/02 00:52:05:128 notice plugin/load Loaded plugin: vmxnet3_plugin.so (VMWare Vmxnet3 Device Driver)
2022/08/02 00:52:05:128 notice plugin/load Loaded plugin: vrrp_plugin.so (VRRP v3 (RFC 5798))
2022/08/02 00:52:05:128 notice plugin/load Loaded plugin: wireguard_plugin.so (Wireguard Protocol)
2022/08/02 00:52:05:428 notice dpdk EAL init args: --in-memory --no-telemetry --file-prefix vpp -a 0000:00:05.0
2022/08/02 00:52:05:908 notice vat-plug/load Loaded plugin: acl_test_plugin.so
2022/08/02 00:52:05:908 notice vat-plug/load Loaded plugin: adl_test_plugin.so
2022/08/02 00:52:05:908 notice vat-plug/load Loaded plugin: af_xdp_test_plugin.so
2022/08/02 00:52:05:908 notice vat-plug/load Loaded plugin: arping_test_plugin.so
2022/08/02 00:52:05:908 notice vat-plug/load Loaded plugin: avf_test_plugin.so
2022/08/02 00:52:05:908 notice vat-plug/load Loaded plugin: builtinurl_test_plugin.so
2022/08/02 00:52:05:908 notice vat-plug/load Loaded plugin: capo_test_plugin.so
2022/08/02 00:52:05:908 notice vat-plug/load Loaded plugin: cdp_test_plugin.so
2022/08/02 00:52:05:908 notice vat-plug/load Loaded plugin: ct6_test_plugin.so
2022/08/02 00:52:05:909 notice vat-plug/load Loaded plugin: dhcp_test_plugin.so
2022/08/02 00:52:05:909 notice vat-plug/load Loaded plugin: dns_test_plugin.so
2022/08/02 00:52:05:909 notice vat-plug/load Loaded plugin: flowprobe_test_plugin.so
2022/08/02 00:52:05:909 notice vat-plug/load Loaded plugin: geneve_test_plugin.so
2022/08/02 00:52:05:909 notice vat-plug/load Loaded plugin: gtpu_test_plugin.so
2022/08/02 00:52:05:909 notice vat-plug/load Loaded plugin: http_static_test_plugin.so
2022/08/02 00:52:05:909 notice vat-plug/load Loaded plugin: ikev2_test_plugin.so
2022/08/02 00:52:05:909 notice vat-plug/load Loaded plugin: ioam_test_plugin.so
2022/08/02 00:52:05:909 notice vat-plug/load Loaded plugin: l2tp_test_plugin.so
2022/08/02 00:52:05:909 notice vat-plug/load Loaded plugin: lacp_test_plugin.so
2022/08/02 00:52:05:909 notice vat-plug/load Loaded plugin: lb_test_plugin.so
2022/08/02 00:52:05:909 notice vat-plug/load Loaded plugin: lisp_test_plugin.so
2022/08/02 00:52:05:909 notice vat-plug/load Loaded plugin: lldp_test_plugin.so
2022/08/02 00:52:05:909 notice vat-plug/load Loaded plugin: mactime_test_plugin.so
2022/08/02 00:52:05:909 notice vat-plug/load Loaded plugin: mdata_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: memif_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: nsh_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: nsim_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: pppoe_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: rdma_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: stn_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: tlsopenssl_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: vlib_vlibapi_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: vmxnet3_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: vnet_arp_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: vnet_interface_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: vnet_ip6_nd_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: vnet_ip_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: vnet_ipsec_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: vnet_l2_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: vnet_session_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: vnet_sr_mpls_test_plugin.so
2022/08/02 00:52:05:910 notice vat-plug/load Loaded plugin: vpp_api_test_plugin.so
2022/08/02 00:52:05:911 notice vat-plug/load Loaded plugin: vrrp_test_plugin.so
2022/08/02 00:52:05:911 notice dpdk EAL: Detected CPU lcores: 4
2022/08/02 00:52:05:911 notice dpdk EAL: Detected NUMA nodes: 1
2022/08/02 00:52:05:911 notice dpdk EAL: Detected static linkage of DPDK
2022/08/02 00:52:05:911 notice dpdk EAL: Selected IOVA mode 'PA'
2022/08/02 00:52:05:911 notice dpdk EAL: No available 1048576 kB hugepages reported
2022/08/02 00:52:05:911 notice dpdk EAL: No free 1048576 kB hugepages reported on node 0
2022/08/02 00:52:05:911 notice dpdk EAL: No available 1048576 kB hugepages reported
2022/08/02 00:52:05:911 notice dpdk EAL: VFIO support initialized
2022/08/02 00:52:05:911 notice dpdk EAL: Using IOMMU type 8 (No-IOMMU)
2022/08/02 00:52:05:911 notice dpdk EAL: Probe PCI driver: net_ena (1d0f:ec20) device: 0000:00:05.0 (socket 0)
2022/08/02 00:52:05:911 notice dpdk ena_mtu_set(): MTU set to: 9008
2022/08/02 00:52:07:075 error interface/rx-q setting rx mode on the interface VirtualFunctionEthernet0/5/0 queue-id 0 failed.
dpdk_interface_rx_mode_change: unsupported op (is the interface up?)
2022/08/02 00:52:07:076 notice ip6/link enable: VirtualFunctionEthernet0/5/0
2022/08/02 00:52:07:076 error interface hw_add_del_mac_address: dpdk_add_del_mac_address: mac address add/del failed: -95
2022/08/02 00:52:07:076 error interface hw_add_del_mac_address: dpdk_add_del_mac_address: mac address add/del failed: -95
2022/08/02 00:52:07:076 error interface hw_add_del_mac_address: dpdk_add_del_mac_address: mac address add/del failed: -95
2022/08/02 00:52:07:076 error interface hw_add_del_mac_address: dpdk_add_del_mac_address: mac address add/del failed: -95
2022/08/02 00:52:07:077 error interface hw_add_del_mac_address: dpdk_add_del_mac_address: mac address add/del failed: -95
2022/08/02 00:52:07:077 error interface hw_add_del_mac_address: dpdk_add_del_mac_address: mac address add/del failed: -95
2022/08/02 00:52:07:078 notice ip/neighbor add: VirtualFunctionEthernet0/5/0, 169.254.169.254
2022/08/02 00:52:07:078 notice ip/neighbor add: VirtualFunctionEthernet0/5/0, 192.168.64.1
2022/08/02 00:52:07:218 notice ip6/link enable: tap0
2022/08/02 00:52:07:219 notice ip/neighbor add: tap0, 169.254.0.1
2022/08/02 00:52:07:219 notice ip/neighbor add: tap0, fc00:ffff:ffff:ffff:ca11:c000:fd10:fffe
Iperf3 test log
vpp# show errors
Count Node Reason Severity
1 ip6-icmp-input router advertisements sent error
53 acl-plugin-out-ip4-fa new sessions added error
26228 acl-plugin-out-ip4-fa existing session packets error
26281 acl-plugin-out-ip4-fa checked packets error
197 acl-plugin-out-ip4-fa restart session timer error
178 acl-plugin-in-ip4-fa new sessions added error
29764 acl-plugin-in-ip4-fa existing session packets error
29942 acl-plugin-in-ip4-fa checked packets error
216 acl-plugin-in-ip4-fa restart session timer error
3 arp-proxy ARP replies sent error
2 arp-reply ARP replies sent error
19929 ipip4-input packets decapsulated error
1 ip6-input Multicast RPF check failed error
I tried following commands.
clear run
-> iperf3 test -> show run
I'm unsure if the log below indicates vpp and dpdk receive any packets.
vpp# show run
Thread 0 vpp_main (lcore 1)
Time 9.0, 10 sec internal node vector rate 0.00 loops/sec 503787.45
vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0
Name State Calls Vectors Suspends Clocks Vectors/Call
acl-plugin-fa-cleaner-process any wait 0 0 36 5.44e3 0.00
acl-plugin-fa-worker-cleaner-pinterrupt wa 18 0 0 3.68e3 0.00
api-rx-from-ring any wait 0 0 1 4.34e4 0.00
cnat-scanner-process any wait 0 0 9 7.99e3 0.00
dpdk-process any wait 0 0 3 1.31e5 0.00
fib-walk any wait 0 0 5 9.21e3 0.00
ip4-full-reassembly-expire-wal any wait 0 0 1 2.72e3 0.00
ip4-sv-reassembly-expire-walk any wait 0 0 1 5.40e3 0.00
ip6-full-reassembly-expire-wal any wait 0 0 1 3.83e3 0.00
ip6-mld-process any wait 0 0 9 4.27e3 0.00
ip6-ra-process any wait 0 0 9 3.42e3 0.00
ip6-sv-reassembly-expire-walk any wait 0 0 1 5.37e3 0.00
statseg-collector-process time wait 0 0 1 2.74e6 0.00
unix-cli-local:0 active 0 0 9 7.74e4 0.00
unix-epoll-input polling 621328 0 0 3.57e4 0.00
virtio-pre-input polling 621328 0 0 2.26e2 0.00
wg-timer-manager any wait 0 0 899 1.49e3 0.00
---------------
Thread 1 vpp_wk_0 (lcore 2)
Time 9.0, 10 sec internal node vector rate 3.49 loops/sec 4805837.35
vector rates in 6.3161e3, out 6.6319e3, drop 0.0000e0, punt 0.0000e0
Name State Calls Vectors Suspends Clocks Vectors/Call
VirtualFunctionEthernet0/5/0-o active 3062 6274 0 1.91e2 2.05
VirtualFunctionEthernet0/5/0-t active 3062 6274 0 1.19e3 2.05
acl-plugin-fa-worker-cleaner-pinterrupt wa 64 0 0 2.31e3 0.00
acl-plugin-in-ip4-fa active 3052 6272 0 4.83e2 2.06
acl-plugin-out-ip4-fa active 13037 50617 0 2.79e2 3.88
arp-input active 1 1 0 2.32e3 1.00
arp-reply active 1 1 0 3.36e4 1.00
cnat-input-ip4 active 28982 107349 0 2.07e2 3.70
cnat-output-ip4 active 19070 63031 0 1.80e2 3.31
dpdk-input polling 48577742 50618 0 1.42e5 0.00
ethernet-input active 13117 50748 0 2.33e2 3.87
gso-ip4 active 6178 12571 0 1.88e2 2.03
interface-output active 16098 56891 0 7.39e1 3.53
ip4-input active 15946 56732 0 1.76e2 3.56
ip4-input-no-checksum active 13036 50617 0 1.56e2 3.88
ip4-lookup active 28982 107349 0 1.23e2 3.70
ip4-midchain active 2981 6142 0 4.98e2 2.06
ip4-punt-redirect active 145 157 0 8.39e2 1.08
ip4-punt active 145 157 0 6.39e2 1.08
ip4-receive active 13036 50617 0 1.41e2 3.88
ip4-rewrite active 16098 56889 0 1.36e2 3.53
ip4-udp-lookup active 3 3 0 2.61e3 1.00
ipip4-input active 12894 50460 0 2.56e2 3.91
lookup-ip4-dst active 6178 12571 0 2.71e2 2.03
tap0-output active 145 157 0 6.52e2 1.08
tap0-tx active 148 160 0 1.31e4 1.08
tun2-output active 12894 50460 0 9.19e1 3.91
tun2-tx active 15734 53300 0 2.93e3 3.39
tunnel-output active 2981 6142 0 2.15e2 2.06
unix-epoll-input polling 47393 0 0 3.25e3 0.00
virtio-input interrupt wa 3065 6272 0 8.52e2 2.05
virtio-pre-input polling 48577742 0 0 1.95e2 0.00
Hi, whats the ec2 instance type you are using?
I tested with m5.xlarge
With m5.xlarge this is what I am getting:
# iperf3 -c 10.10.143.4 -p 5002
Connecting to host 10.10.143.4, port 5002
[ 5] local 10.10.143.3 port 47500 connected to 10.10.143.4 port 5002
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.70 GBytes 14.6 Gbits/sec 7 3.05 MBytes
[ 5] 1.00-2.00 sec 1.84 GBytes 15.8 Gbits/sec 1 3.05 MBytes
[ 5] 2.00-3.00 sec 1.90 GBytes 16.4 Gbits/sec 0 3.05 MBytes
[ 5] 3.00-4.00 sec 1.81 GBytes 15.6 Gbits/sec 0 3.05 MBytes
[ 5] 4.00-5.00 sec 1.83 GBytes 15.7 Gbits/sec 0 3.05 MBytes
[ 5] 5.00-6.00 sec 1.88 GBytes 16.2 Gbits/sec 0 3.05 MBytes
[ 5] 6.00-7.00 sec 1.90 GBytes 16.3 Gbits/sec 0 3.05 MBytes
[ 5] 7.00-8.00 sec 1.90 GBytes 16.3 Gbits/sec 1 3.05 MBytes
[ 5] 8.00-9.00 sec 1.88 GBytes 16.1 Gbits/sec 0 3.05 MBytes
[ 5] 9.00-10.00 sec 1.83 GBytes 15.7 Gbits/sec 0 3.05 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 18.5 GBytes 15.9 Gbits/sec 9 sender
[ 5] 0.00-10.00 sec 18.5 GBytes 15.9 Gbits/sec receiver
iperf Done.
#
Are you sure you used m5.xlarge? Could you share the output of lscpu
command on the worker node? Also, could you pls share the iperf3 command that you used?
I just double checked and all the configurations are correct. I ran script in following url (https://projectcalico.docs.tigera.io/getting-started/kubernetes/vpp/getting-started)
cat <<EOF | eksctl create nodegroup -f -
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: my-calico-cluster
region: us-east-2
managedNodeGroups:
- name: my-calico-cluster-ng
desiredCapacity: 2
instanceType: m5.xlarge
labels: {role: worker}
preBootstrapCommands:
- sudo curl -o /tmp/init_eks.sh "https://raw.githubusercontent.com/projectcalico/vpp-dataplane/master/scripts/init_eks.sh"
- sudo chmod +x /tmp/init_eks.sh
- sudo /tmp/init_eks.sh
EOF
lscpu
output:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
Stepping: 4
CPU MHz: 3102.604
BogoMIPS: 4999.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 33792K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
I tested with two difference nodes containing iperf3 server and client each.
run server : kubectl -n iperf-test exec -it iperf3-server -- iperf3 -s
tcp traffic : kubectl -n iperf-test exec -it <client_pod_name> -- iperf3 -c <server_ip_address>
apiVersion: v1
kind: Pod
metadata:
labels:
app: iperf3-server
role: iperf3-server
name: iperf3-server
namespace: iperf-test
spec:
containers:
- name: iperf3
image: clearlinux/iperf:3
command: ['/bin/sh', '-c', 'sleep 1d']
ports:
- containerPort: 5201
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
app: iperf3-client
role: iperf3-client
name: iperf3
namespace: iperf-test
spec:
selector:
matchLabels:
app: iperf3-client
template:
metadata:
labels:
app: iperf3-client
spec:
containers:
- command:
- /bin/sh
- "-c"
- "sleep 1d"
image: "clearlinux/iperf:3"
name: iperf3
ports:
- containerPort: 5201
Here's my raw script I executed
eksctl create cluster --name my-calico-cluster --without-nodegroup
kubectl delete daemonset -n kube-system aws-node
kubectl apply -f https://projectcalico.docs.tigera.io/manifests/tigera-operator.yaml
kubectl apply -f https://raw.githubusercontent.com/projectcalico/vpp-dataplane/v3.23.0/yaml/calico/installation-eks.yaml
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: calico-vpp-dataplane
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: calico-vpp-node-sa
namespace: calico-vpp-dataplane
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: calico-vpp-node-role
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- namespaces
verbs:
- get
- apiGroups:
- ""
resources:
- endpoints
- services
verbs:
- watch
- list
- get
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
- update
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
verbs:
- watch
- list
- apiGroups:
- ""
resources:
- pods
- namespaces
- serviceaccounts
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- pods/status
verbs:
- patch
- apiGroups:
- crd.projectcalico.org
resources:
- globalfelixconfigs
- felixconfigurations
- bgppeers
- globalbgpconfigs
- bgpconfigurations
- ippools
- ipamblocks
- globalnetworkpolicies
- globalnetworksets
- networkpolicies
- networksets
- clusterinformations
- hostendpoints
- blockaffinities
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- crd.projectcalico.org
resources:
- blockaffinities
- ipamblocks
- ipamhandles
verbs:
- get
- list
- create
- update
- delete
- apiGroups:
- crd.projectcalico.org
resources:
- ipamconfigs
verbs:
- get
- apiGroups:
- crd.projectcalico.org
resources:
- blockaffinities
verbs:
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: calico-vpp-node
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: calico-vpp-node-role
subjects:
- kind: ServiceAccount
name: calico-vpp-node-sa
namespace: calico-vpp-dataplane
---
apiVersion: v1
data:
service_prefix: 10.100.0.0/16
vpp_config_template: |-
unix {
nodaemon
full-coredump
cli-listen /var/run/vpp/cli.sock
pidfile /run/vpp/vpp.pid
exec /etc/vpp/startup.exec
}
api-trace { on }
cpu {
workers 1
}
socksvr {
socket-name /var/run/vpp/vpp-api.sock
}
dpdk {
dev __PCI_DEVICE_ID__ { num-rx-queues 1 num-tx-queues 1 }
}
plugins {
plugin default { enable }
plugin dpdk_plugin.so { enable }
plugin calico_plugin.so { enable }
plugin ping_plugin.so { disable }
}
buffers {
buffers-per-numa 131072
}
vpp_dataplane_interface: eth0
vpp_uplink_driver: dpdk
kind: ConfigMap
metadata:
name: calico-vpp-config
namespace: calico-vpp-dataplane
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
k8s-app: calico-vpp-node
name: calico-vpp-node
namespace: calico-vpp-dataplane
spec:
selector:
matchLabels:
k8s-app: calico-vpp-node
template:
metadata:
labels:
k8s-app: calico-vpp-node
spec:
containers:
- env:
- name: CALICOVPP_HOOK_BEFORE_VPP_RUN
value: echo 'sudo systemctl stop network ; sudo systemctl kill network'
| chroot /host
- name: CALICOVPP_HOOK_VPP_RUNNING
value: echo 'sudo systemctl start network' | chroot /host
- name: CALICOVPP_HOOK_VPP_DONE_OK
value: echo 'sudo systemctl stop network ; sudo systemctl kill network ;
sudo systemctl start network' | chroot /host
- name: CALICOVPP_HOOK_VPP_ERRORED
value: echo 'sudo systemctl stop network ; sudo systemctl kill network ;
sudo systemctl start network' | chroot /host
- name: CALICOVPP_NATIVE_DRIVER
valueFrom:
configMapKeyRef:
key: vpp_uplink_driver
name: calico-vpp-config
- name: CALICOVPP_IP_CONFIG
value: linux
- name: CALICOVPP_INTERFACE
valueFrom:
configMapKeyRef:
key: vpp_dataplane_interface
name: calico-vpp-config
- name: CALICOVPP_CONFIG_TEMPLATE
valueFrom:
configMapKeyRef:
key: vpp_config_template
name: calico-vpp-config
- name: SERVICE_PREFIX
valueFrom:
configMapKeyRef:
key: service_prefix
name: calico-vpp-config
- name: DATASTORE_TYPE
value: kubernetes
- name: WAIT_FOR_DATASTORE
value: "true"
- name: NODENAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: CALICOVPP_CORE_PATTERN
value: /var/lib/vpp/vppcore.%e.%p
image: docker.io/calicovpp/vpp:v3.23.0
imagePullPolicy: IfNotPresent
name: vpp
resources:
limits:
hugepages-2Mi: 512Mi
requests:
cpu: 500m
memory: 512Mi
securityContext:
privileged: true
volumeMounts:
- mountPath: /lib/firmware
name: lib-firmware
- mountPath: /var/run/vpp
name: vpp-rundir
- mountPath: /var/lib/vpp
name: vpp-data
- mountPath: /etc/vpp
name: vpp-config
- mountPath: /dev
name: devices
- mountPath: /sys
name: hostsys
- mountPath: /run/netns/
mountPropagation: Bidirectional
name: netns
- mountPath: /host
name: host-root
- env:
- name: DATASTORE_TYPE
value: kubernetes
- name: WAIT_FOR_DATASTORE
value: "true"
- name: NODENAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: SERVICE_PREFIX
valueFrom:
configMapKeyRef:
key: service_prefix
name: calico-vpp-config
image: docker.io/calicovpp/agent:v3.23.0
imagePullPolicy: IfNotPresent
name: agent
resources:
requests:
cpu: 250m
securityContext:
privileged: true
volumeMounts:
- mountPath: /var/run/calico
name: var-run-calico
readOnly: false
- mountPath: /var/lib/calico/felix-plugins
name: felix-plugins
readOnly: false
- mountPath: /var/run/vpp
name: vpp-rundir
- mountPath: /run/netns/
mountPropagation: Bidirectional
name: netns
hostNetwork: true
hostPID: true
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-node-critical
serviceAccountName: calico-vpp-node-sa
terminationGracePeriodSeconds: 10
tolerations:
- effect: NoSchedule
operator: Exists
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute
operator: Exists
volumes:
- hostPath:
path: /lib/firmware
name: lib-firmware
- hostPath:
path: /var/run/vpp
name: vpp-rundir
- hostPath:
path: /var/lib/vpp
type: DirectoryOrCreate
name: vpp-data
- hostPath:
path: /etc/vpp
name: vpp-config
- hostPath:
path: /dev
name: devices
- hostPath:
path: /sys
name: hostsys
- hostPath:
path: /var/run/calico
name: var-run-calico
- hostPath:
path: /run/netns
name: netns
- hostPath:
path: /var/lib/calico/felix-plugins
name: felix-plugins
- hostPath:
path: /
name: host-root
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
EOF
cat <<EOF | eksctl create nodegroup -f -
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: my-calico-cluster
region: us-east-1
managedNodeGroups:
- name: my-calico-cluster-ng
desiredCapacity: 2
instanceType: m5.xlarge
labels: {role: worker}
preBootstrapCommands:
- sudo curl -o /tmp/init_eks.sh "https://raw.githubusercontent.com/projectcalico/vpp-dataplane/master/scripts/init_eks.sh"
- sudo chmod +x /tmp/init_eks.sh
- sudo /tmp/init_eks.sh
ssh:
publicKeyPath: /home/ec2-user/.ssh/id_rsa.pub
EOF
Thank you for sharing such detailed information. We are able to reproduce the issue and are looking into it. Will keep you posted.
@onong It would be big of help if you could respond back
Hi @JisunChae, we are still investigating it and it is still unclear as to what is going on here. What we do know with some amount of certainty is that VPP and DPDK are not the culprits. This is based on testing just VPP + DPDK in aws and we are also in touch with the good folks owning DPDK ENA PMD and they confirm that DPDK is working fine in their tests. That leaves us the EKS env especially the customised AMI used in EKS. We are still digging into it.
Hello,
Could it be that performance has degraded due to absence of HugePages memory for DPDK?
What I see in @JisunChae's VPP trace:
2022/08/02 00:52:05:911 notice dpdk EAL: No available 1048576 kB hugepages reported
2022/08/02 00:52:05:911 notice dpdk EAL: No free 1048576 kB hugepages reported on node 0
2022/08/02 00:52:05:911 notice dpdk EAL: No available 1048576 kB hugepages reported
This is because the calico-vpp-node
DaemonSet in the installation-eks.yaml
is missing hugepages volume mount.
I'd suggest to patch it by adding something like
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: calico-vpp-node
spec:
template:
spec:
containers:
- name: vpp
volumeMounts:
- mountPath: /hugepages-2Mi
name: hugepage-2mi
volumes:
- name: hugepage-2mi
emptyDir:
medium: HugePages-2Mi
More on configuring hugepages https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/
Hi @aireaire, thank you for looking into it. The msg in question is harmless - it is just reporting the absence of 1GB hugepages. VPP by default uses 2MB hugepages and the EKS worker nodes are configured with the 2MB hugepages at node creation time.
Hi @JisunChae, I think we may have found out the root cause behind the issue and we are working to sort it out. However, it is a bit convoluted at the moment and might take a while longer to sort out completely. In the meantime, may I suggest that you try out Calico/VPP release v0.16.0-calicov3.20.0 if feasible for your purposes? It does not suffer from the perf degradation. You can follow the instructions at:
https://projectcalico.docs.tigera.io/archive/v3.20/getting-started/kubernetes/vpp/getting-started
Or you can still follow the steps from v3.23.0 but use the following manifest yaml appropriate for calico/vpp v0.16.0-calicov3.20.0:
Hi @onong, I tried with the first method from the two options you have provided, then I faced an error as described in the image below. `[✖] exceeded max wait time for StackCreateComplete waiter Error: failed to create nodegroups for cluster "my-calico-vpp-cluster"
ERROR: Failed to create nodegroup. Refer to error logs above.`
With the second option, I got the result below. "within the same node"
"different nodes"
The speed clearly increased but it's still slower than the test without vpp and dpdk. I assume that it's not fast enough, so wondering how you get your result.
Hi @JisunChae, could you try after adding the following to the vpp_config_template
section in the manifest yaml file:
buffers {
buffers-per-numa 131072
}
The speed clearly increased but it's still slower than the test without vpp and dpdk.
Could you elaborate a bit on "test without vpp and dpdk"? Are you comparing against native aws vpc cni or stock calico cni?
Hi @onong, the buffers option, you've described, makes not much difference.
For your reference, I'm adding results from the test with native aws cni and calico cni(without vpp and dpdk)
-native aws cni
-calico cni without vpp and dpdk
Hi @JisunChae, could you try after adding the following to the
vpp_config_template
section in the manifest yaml file:buffers { buffers-per-numa 131072 }
The speed clearly increased but it's still slower than the test without vpp and dpdk.
Could you elaborate a bit on "test without vpp and dpdk"? Are you comparing against native aws vpc cni or stock calico cni?
Hi @onong. I'am working with @JisunChae. We appreciate all the help.
What we are tyring to do is to get a clear difference in performance between Calico/VPP and the other using the traditional network stack.
Thus, "test without vpp and dpdk" means exactly what you mentioned. We want to compare Calico/VPP against native aws CNI and vanlia calico CNI.
We've tested the three CNIs in the same settings, where iperf3 pods in two physical nodes communicate with each other. The result shows that there is no much differences - results under all CNI are around 4.5Gbits/sec.
Please let us know if we are missing something.
Hi @JisunChae
I forgot to mention that you would also need to restart the calico/vpp daemonset. Sorry about that :)
kubectl rollout restart -n calico-vpp-dataplane ds/calico-vpp-node
Make sure that /etc/vpp/startup.conf
has the buffers config lines and then you can check if the buffer change got applied in VPP? On the EKS worker nodes do the following:
sudo docker ps | grep vpp-manager
sudo docker exec -it <container id> bash
vppctl show buffers
If this does not work, then the other way would be to re-deploy the worker nodes afresh.
Hi @bj3680, thanks for clarifying. About a year ago, we did this comparison study between the native aws vpc ani, stock calico and calico with VPP+DPDK with m5.xlarge worker nodes albeit with Calico/VPP v0.15.0 instead of v0.16.0. We used KNB (https://github.com/InfraBuilder/k8s-bench-suite) which in turn uses iperf. We were interested in an overall performance comparison but were really interested in the encrypted traffic scenario since VPP has a really high-performance IPSec implementation.
Calico/VPP+DPDK
is on par or better compared to native aws vpc cni
and stock calico
for unencrypted traffic (tcp/udp)Calico/VPP+DPDK+IPSec
slightly outperforms stock calico+wireguard
for encrypted traffic (tcp/udp).I am attaching the relevant slides.
It would be interesting to see how things are now. Pls do share your findings.
Closing it as this issue seems stale. Feel free to re-open if the problem comes back again on newer releases.
Environment
Issue description I tested the speed using iperf3 and expected Calico CNI with VPP and DPDK to be faster than the one without VPP and DPDK. However, the result was the opposite.
-with VPP and DPDK
-without VPP and DPDK
To Reproduce Steps to reproduce the behavior: I followed the guide provided below address and modified few lines.
dpdk { dev PCI_DEVICE_ID { num-rx-queues 1 num-tx-queues 1 } } plugins { plugin default { enable } plugin dpdk_plugin.so { enable } plugin calico_plugin.so { enable } plugin ping_plugin.so { disable } } buffers { buffers-per-numa 131072 } vpp_dataplane_interface: eth0 vpp_uplink_driver: dpdk kind: ConfigMap metadata: name: calico-vpp-config namespace: calico-vpp-dataplane
Expected behavior Described above.
Additional context Configuration of eks worker node.