Open mestery opened 2 years ago
@stolsma for your review
Hmmm, don't understand... At my computer it just works? Don't have a lot of time tomorrow to dig into it. ☹️Will go full force on it on monday...
I'll take a look at this later today after some meetings are finished.
For reference, here is what my CPU is on the VM I'm trying to run the container:
vagrant@ubuntu-focal:~/ipdk/build$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 126
model name : Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
stepping : 5
cpu MHz : 2304.000
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti tpr_shadow flexpriority fsgsbase avx2 invpcid rdseed clflushopt md_clear flush_l1d arch_capabilities
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 4608.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 126
model name : Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
stepping : 5
cpu MHz : 2304.000
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 4
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti tpr_shadow flexpriority fsgsbase avx2 invpcid rdseed clflushopt md_clear flush_l1d arch_capabilities
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 4608.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 126
model name : Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
stepping : 5
cpu MHz : 2304.000
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 4
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti tpr_shadow flexpriority fsgsbase avx2 invpcid rdseed clflushopt md_clear flush_l1d arch_capabilities
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 4608.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 126
model name : Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
stepping : 5
cpu MHz : 2304.000
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti tpr_shadow flexpriority fsgsbase avx2 invpcid rdseed clflushopt md_clear flush_l1d arch_capabilities
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 4608.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
vagrant@ubuntu-focal:~/ipdk/build$
Running strace
on ovs-vswitchd in the container results in this:
mprotect(0x7fc7765ef000, 4096, PROT_READ) = 0
munmap(0x7fc7765b7000, 43358) = 0
set_tid_address(0x7fc7734d46d0) = 99
set_robust_list(0x7fc7734d46e0, 24) = 0
rt_sigaction(SIGRTMIN, {sa_handler=0x7fc775484bf0, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7fc7754923c0}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {sa_handler=0x7fc775484c90, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO, sa_restorer=0x7fc7754923c0}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
brk(NULL) = 0x55cf9f087000
brk(0x55cf9f0a8000) = 0x55cf9f0a8000
getrandom("\x79\x67\x2c\x34\x56\x58\x52\x8a", 8, 0) = 8
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x7fc773632b4b} ---
+++ killed by SIGILL (core dumped) +++
Illegal instruction (core dumped)
root@bdb91ca96701:~#
Hmmmmm.... it looks like the CPUs we're building DPDK with support AVX512, which I know my local CPU does not. I wonder if this is the issue?
2022-02-23T14:33:47.3453979Z #17 428.0 Fetching value of define "__AVX2__" : 1 (cached)
2022-02-23T14:33:48.5429862Z #17 429.1 Fetching value of define "__AVX512F__" : 1 (cached)
2022-02-23T14:33:48.5430200Z #17 429.1 Fetching value of define "__AVX512BW__" : 1 (cached)
2022-02-23T14:33:48.5430979Z #17 429.1 Compiler for C supports arguments -mavx512f: YES (cached)
2022-02-23T14:33:48.5431387Z #17 429.1 Compiler for C supports arguments -mavx512bw: YES (cached)
2022-02-23T14:33:48.5431786Z #17 429.1 Compiler for C supports arguments -march=skylake-avx512: YES
2022-02-23T14:33:48.5432097Z #17 429.1 Fetching value of define "__AVX2__" : 1 (cached)
2022-02-23T14:33:48.5432357Z #17 429.1 Fetching value of define "__AVX512F__" : 1 (cached)
2022-02-23T14:33:48.5535212Z #17 429.1 Fetching value of define "__AVX512BW__" : 1 (cached)
2022-02-23T14:33:48.5537485Z #17 429.1 Compiler for C supports arguments -mavx512f: YES (cached)
2022-02-23T14:33:48.5539651Z #17 429.1 Compiler for C supports arguments -mavx512bw: YES (cached)
2022-02-23T14:33:48.5541813Z #17 429.1 Compiler for C supports arguments -march=skylake-avx512: YES (cached)
2022-02-23T14:33:48.5544007Z #17 429.1 Compiler for C supports arguments -Wno-unused-value: YES (cached)
2022-02-23T14:33:48.5546223Z #17 429.1 Compiler for C supports arguments -Wno-unused-but-set-variable: YES (cached)
2022-02-23T14:33:48.5548418Z #17 429.1 Compiler for C supports arguments -Wno-unused-variable: YES (cached)
2022-02-23T14:33:48.5550604Z #17 429.1 Compiler for C supports arguments -Wno-unused-parameter: YES (cached)
@Namrata-intel Do you know how to compile DPDK without AVX512 support, maybe move back to SSE2 instead, which is more broadly supported across most CPUs?
You can try unsetting this in Configs and build. It seems to be configurable. I have never tried it. CONFIG_RTE_ENABLE_AVX=y CONFIG_RTE_ENABLE_AVX512=n
@Namrata-intel Any pointers to where exactly in the build I would set this?
I think I figured it out, so I forked p4-dpdk-target
to try my fix from here: https://github.com/ipdk-io/p4-dpdk-target/commit/cd1ee1ed225e84ee779ff53c8d6daaf8ab32d2c1 If this works, I'll push a PR to the p4-dpdk-target
repository.
Well, my attempt at fixing isn't working. The layers of build options, nested in git submodules with hard coded build commands, are making this challenging for me. If anyone can help to use a more generic set of CPU options which will work in virtual machines (VirtualBox does not support AVX512, for example), that would be amazing. Until then, we have to build locally and can't use the images in GHCR.
This should fix it: https://github.com/p4lang/p4-dpdk-target/pull/18
@Namrata-intel Do you know who can review that patch to p4-dpdk-target
to disable AVX512?
I just spent a few hours looking into an issue which turns out to be caused by the fact the images we are building and pushing to GHCR do not work.
Pull the GHCR Ubuntu 20.04 image like this:
Next, setup your ipdk.env file as shown:
Star the container:
Login and see how ovs-vswitchd is not running, and when you start it manually, you see a segfault:
Now, I have pulled down the IPDK code and built a container locally, and when I run that image, it works fine: