llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.61k stars 11.83k forks source link

OpenMP runtime detects incorrect processor topology on AMD CPUs #40073

Open tycho opened 5 years ago

tycho commented 5 years ago
Bugzilla Link 40727
Version unspecified
OS Linux
CC @RKSimon

Extended Description

The processor topology detection in the OpenMP runtime is confused by AMD CPUs:

$ env KMP_AFFINITY=scatter,verbose ./nbody --bodies 64 --no-crosscheck --iterations 1 --cycle-after 5 OMP: Info #​211: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #​212: KMP_AFFINITY: cpuid leaf 11 not supported - decoding legacy APIC ids. OMP: Info #​149: KMP_AFFINITY: Affinity capable, using global cpuid info OMP: Info #​154: KMP_AFFINITY: Initial OS proc set respected: 0-15 OMP: Info #​156: KMP_AFFINITY: 16 available OS procs OMP: Info #​157: KMP_AFFINITY: Uniform topology OMP: Info #​159: KMP_AFFINITY: 1 packages x 1 cores/pkg x 16 threads/core (1 total cores) OMP: Info #​213: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #​171: KMP_AFFINITY: OS proc 0 maps to package 0 thread 0 OMP: Info #​171: KMP_AFFINITY: OS proc 1 maps to package 0 thread 1 OMP: Info #​171: KMP_AFFINITY: OS proc 2 maps to package 0 thread 2 OMP: Info #​171: KMP_AFFINITY: OS proc 3 maps to package 0 thread 3 OMP: Info #​171: KMP_AFFINITY: OS proc 4 maps to package 0 thread 4 OMP: Info #​171: KMP_AFFINITY: OS proc 5 maps to package 0 thread 5 OMP: Info #​171: KMP_AFFINITY: OS proc 6 maps to package 0 thread 6 OMP: Info #​171: KMP_AFFINITY: OS proc 7 maps to package 0 thread 7 OMP: Info #​171: KMP_AFFINITY: OS proc 8 maps to package 0 thread 8 OMP: Info #​171: KMP_AFFINITY: OS proc 9 maps to package 0 thread 9 OMP: Info #​171: KMP_AFFINITY: OS proc 10 maps to package 0 thread 10 OMP: Info #​171: KMP_AFFINITY: OS proc 11 maps to package 0 thread 11 OMP: Info #​171: KMP_AFFINITY: OS proc 12 maps to package 0 thread 12 OMP: Info #​171: KMP_AFFINITY: OS proc 13 maps to package 0 thread 13 OMP: Info #​171: KMP_AFFINITY: OS proc 14 maps to package 0 thread 14 OMP: Info #​171: KMP_AFFINITY: OS proc 15 maps to package 0 thread 15 OMP: Info #​144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine OMP: Info #​249: KMP_AFFINITY: pid 7608 tid 7608 thread 0 bound to OS proc set 0-15 OMP: Info #​249: KMP_AFFINITY: pid 7608 tid 7610 thread 1 bound to OS proc set 0-15 OMP: Info #​249: KMP_AFFINITY: pid 7608 tid 7611 thread 2 bound to OS proc set 0-15 OMP: Info #​249: KMP_AFFINITY: pid 7608 tid 7612 thread 3 bound to OS proc set 0-15 OMP: Info #​249: KMP_AFFINITY: pid 7608 tid 7613 thread 4 bound to OS proc set 0-15 OMP: Info #​249: KMP_AFFINITY: pid 7608 tid 7614 thread 5 bound to OS proc set 0-15 OMP: Info #​249: KMP_AFFINITY: pid 7608 tid 7615 thread 6 bound to OS proc set 0-15 OMP: Info #​249: KMP_AFFINITY: pid 7608 tid 7616 thread 7 bound to OS proc set 0-15 OMP: Info #​249: KMP_AFFINITY: pid 7608 tid 7617 thread 8 bound to OS proc set 0-15 OMP: Info #​249: KMP_AFFINITY: pid 7608 tid 7618 thread 9 bound to OS proc set 0-15 OMP: Info #​249: KMP_AFFINITY: pid 7608 tid 7619 thread 10 bound to OS proc set 0-15 OMP: Info #​249: KMP_AFFINITY: pid 7608 tid 7620 thread 11 bound to OS proc set 0-15 OMP: Info #​249: KMP_AFFINITY: pid 7608 tid 7621 thread 12 bound to OS proc set 0-15 OMP: Info #​249: KMP_AFFINITY: pid 7608 tid 7622 thread 13 bound to OS proc set 0-15 OMP: Info #​249: KMP_AFFINITY: pid 7608 tid 7623 thread 14 bound to OS proc set 0-15 OMP: Info #​249: KMP_AFFINITY: pid 7608 tid 7624 thread 15 bound to OS proc set 0-15

$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 43 bits physical, 48 bits virtual CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD CPU family: 23 Model: 8 Model name: AMD Ryzen 7 2700X Eight-Core Processor Stepping: 2 CPU MHz: 4200.705 CPU max MHz: 3700.0000 CPU min MHz: 2200.0000 BogoMIPS: 7384.85 Virtualization: AMD-V L1d cache: 32K L1i cache: 64K L2 cache: 512K L3 cache: 8192K NUMA node0 CPU(s): 0-15 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca

It looks like the OpenMP runtime needs to be taught about the AMD-specific CPUID leaf 0x8000001E. Or as an ugly fallback, learn topology from /proc/cpuinfo when leaf 0xb is unavailable?

tycho commented 5 years ago

Neutering the legacy APIC ID stuff yields a much better result, as it falls back to parsing /proc/cpuinfo next:

OMP: Info #​211: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #​231: KMP_AFFINITY: cpuid leaf 11 not supported - parsing /proc/cpuinfo. OMP: Info #​148: KMP_AFFINITY: Affinity capable, using cpuinfo file OMP: Info #​154: KMP_AFFINITY: Initial OS proc set respected: 0-15 OMP: Info #​156: KMP_AFFINITY: 16 available OS procs OMP: Info #​157: KMP_AFFINITY: Uniform topology OMP: Info #​179: KMP_AFFINITY: 1 packages x 8 cores/pkg x 2 threads/core (8 total cores) OMP: Info #​213: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #​171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 OMP: Info #​171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 OMP: Info #​171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 OMP: Info #​171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 OMP: Info #​171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 OMP: Info #​171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 OMP: Info #​171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 OMP: Info #​171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 OMP: Info #​171: KMP_AFFINITY: OS proc 8 maps to package 0 core 4 thread 0 OMP: Info #​171: KMP_AFFINITY: OS proc 9 maps to package 0 core 4 thread 1 OMP: Info #​171: KMP_AFFINITY: OS proc 10 maps to package 0 core 5 thread 0 OMP: Info #​171: KMP_AFFINITY: OS proc 11 maps to package 0 core 5 thread 1 OMP: Info #​171: KMP_AFFINITY: OS proc 12 maps to package 0 core 6 thread 0 OMP: Info #​171: KMP_AFFINITY: OS proc 13 maps to package 0 core 6 thread 1 OMP: Info #​171: KMP_AFFINITY: OS proc 14 maps to package 0 core 7 thread 0 OMP: Info #​171: KMP_AFFINITY: OS proc 15 maps to package 0 core 7 thread 1 OMP: Info #​144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine OMP: Info #​249: KMP_AFFINITY: pid 30302 tid 30302 thread 0 bound to OS proc set 0,1 OMP: Info #​249: KMP_AFFINITY: pid 30302 tid 30304 thread 1 bound to OS proc set 2,3 OMP: Info #​249: KMP_AFFINITY: pid 30302 tid 30305 thread 2 bound to OS proc set 4,5 OMP: Info #​249: KMP_AFFINITY: pid 30302 tid 30306 thread 3 bound to OS proc set 6,7 OMP: Info #​249: KMP_AFFINITY: pid 30302 tid 30307 thread 4 bound to OS proc set 8,9 OMP: Info #​249: KMP_AFFINITY: pid 30302 tid 30308 thread 5 bound to OS proc set 10,11 OMP: Info #​249: KMP_AFFINITY: pid 30302 tid 30309 thread 6 bound to OS proc set 12,13 OMP: Info #​249: KMP_AFFINITY: pid 30302 tid 30310 thread 7 bound to OS proc set 14,15 OMP: Info #​249: KMP_AFFINITY: pid 30302 tid 30311 thread 8 bound to OS proc set 0,1 OMP: Info #​249: KMP_AFFINITY: pid 30302 tid 30312 thread 9 bound to OS proc set 2,3 OMP: Info #​249: KMP_AFFINITY: pid 30302 tid 30313 thread 10 bound to OS proc set 4,5 OMP: Info #​249: KMP_AFFINITY: pid 30302 tid 30314 thread 11 bound to OS proc set 6,7 OMP: Info #​249: KMP_AFFINITY: pid 30302 tid 30315 thread 12 bound to OS proc set 8,9 OMP: Info #​249: KMP_AFFINITY: pid 30302 tid 30316 thread 13 bound to OS proc set 10,11 OMP: Info #​249: KMP_AFFINITY: pid 30302 tid 30317 thread 14 bound to OS proc set 12,13 OMP: Info #​249: KMP_AFFINITY: pid 30302 tid 30318 thread 15 bound to OS proc set 14,15

Digging a bit further, the fallback order in the code (for __kmp_affinity_top_method == affinity_top_method_all) is this:

So it skips hwloc unless explicitly specifying KMP_TOPOLOGY_METHOD=hwloc in the environment. Neutering the legacy APIC IDs fallback causes it to use /proc/cpuinfo, which gives the good result above.

So a couple possible actions here:

tycho commented 5 years ago

Built the runtime with hwloc support, but it doesn't seem to want to use it. Looking at the code, it appears to prefer decoding legacy APIC IDs over using hwloc for topology. That should probably be changed.

Going to try neutering the legacy APIC ID stuff and see what happens.

tycho commented 5 years ago

I just started looking at the kmp_affinity.cpp code and noticed hwloc is supported, but I didn't build with that. It probably would solve this problem without any additional code.