openvswitch / ovs-issues

Issue tracker repo for Open vSwitch
10 stars 3 forks source link

The datapath print cpu_id mismatch with handler threads #318

Closed danieldin95 closed 8 months ago

danieldin95 commented 8 months ago

the log by printk:

[Tue Jan 30 22:31:11 2024] openvswitch: cpu_id mismatch with handler threads
[Tue Jan 30 22:31:11 2024] openvswitch: cpu_id mismatch with handler threads
[Tue Jan 30 22:31:11 2024] openvswitch: cpu_id mismatch with handler threads
[Tue Jan 30 22:31:11 2024] openvswitch: cpu_id mismatch with handler threads
[Tue Jan 30 22:31:11 2024] openvswitch: cpu_id mismatch with handler threads
[Tue Jan 30 22:31:11 2024] openvswitch: cpu_id mismatch with handler threads

ovs-vswitchd:

2024-01-18T06:31:55.172Z|00082|ofproto_dpif_upcall|INFO|Setting n-handler-threads to 26, setting n-revalidator-threads to 10
2024-01-18T07:00:05.835Z|00032|ofproto_dpif_upcall|INFO|Overriding n-handler-threads to 36, setting n-revalidator-threads to 10
2024-01-18T07:08:00.135Z|00054|ofproto_dpif_upcall|INFO|Overriding n-handler-threads to 36, setting n-revalidator-threads to 10
2024-01-18T07:08:01.577Z|00229|ofproto_dpif_upcall|INFO|Setting n-handler-threads to 26, setting n-revalidator-threads to 10

On our ovs, we setup netdev datapath for dpdk, and also has a system datapath for kernel, and set the isocpus on dpdk device's numa.

[root@node-2 ~]# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt3)/vmlinuz-4.18.0-147.5.1.es8_24.aarch64 root=/dev/mapper/os-root ro isolcpus=2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47 default_hugepagesz=512M hugepagesz=512M hugepages=936 transparent_hugepage=never cgroup.memory=nokmem iommu.passthrough=1 ixgbe.allow_unsupported_sfp=1 biosdevname=0 rootdelay=90 nomodeset intel_idle.max_cstate=0 processor.max_cstate=0 crashkernel=736M rd.lvm.lv=os/root net.ifnames=1 console=tty0
[root@node-2 ~]#
[root@node-2 ~]# ovs-vsctl list interface enp4s0 | grep -e ^options -e ^other_config
options             : {dpdk-devargs="0000:04:00.0", flow-ctrl-autoneg="true", n_rxq="44"}
other_config        : {pmd-rxq-affinity="0:2,1:3,2:4,3:5,4:6,5:7,6:8,7:9,8:10,9:11,10:12,11:13,12:14,13:15,14:16,15:17,16:18,17:19,18:20,19:21,20:22,21:23,22:26,23:27,24:28,25:29,26:30,27:31,28:32,29:33,30:34,31:35,32:36,33:37,34:38,35:39,36:40,37:41,38:42,39:43,40:44,41:45,42:46,43:47"}
[root@node-2 ~]#
danieldin95 commented 8 months ago

Some body has idea to fix it? I want to let count_cpu_cores get really online cpus that include isolcpus. The commit at https://github.com/openvswitch/ovs/commit/be15ec48d7669902f7f7ca2f76fda190b0ccfa5a exclude isolcpus, I think that lead to datapath compute wrong n_pids for handlers.

u32 ovs_dp_get_upcall_portid(const struct datapath *dp, uint32_t cpu_id)
{
    struct dp_nlsk_pids *dp_nlsk_pids;

    dp_nlsk_pids = rcu_dereference(dp->upcall_portids);

    if (dp_nlsk_pids) {
        if (cpu_id < dp_nlsk_pids->n_pids) {
            return dp_nlsk_pids->pids[cpu_id];
        } else if (dp_nlsk_pids->n_pids > 0 && cpu_id >= dp_nlsk_pids->n_pids) {
            /* If the number of netlink PIDs is mismatched with the number of
             * CPUs as seen by the kernel, log this and send the upcall to an
             * arbitrary socket (0) in order to not drop packets
             */
            pr_info_ratelimited("cpu_id mismatch with handler threads");
            return dp_nlsk_pids->pids[cpu_id % dp_nlsk_pids->n_pids];
        } else {
            return 0;
        }
    } else {
        return 0;
    }
}
chaudron commented 8 months ago

There is a blog explaining this message, maybe it clears things up: https://developers.redhat.com/articles/2023/07/20/how-balance-cpu-upcall-dispatch-mode-open-vswitch

Make sure your version of OVS has the mentioned patches included.

CCing @Maickii

danieldin95 commented 8 months ago

@chaudron thx! It's very useful for me.