google / cpu_features

A cross platform C99 library to get cpu features at runtime.
Apache License 2.0
2.47k stars 262 forks source link

Add uarch support for AWS r7iz #296

Open WilliamTambellini opened 1 year ago

WilliamTambellini commented 1 year ago
[wtambellini@r7iz ~]$ cpu_features-9613390/bin/list_cpu_features 
arch            : x86
brand           : Intel(R) Xeon(R) Processor
family          :   6 (0x06)
model           : 143 (0x8F)
stepping        :   3 (0x03)
uarch           : X86_UNKNOWN
flags           : aes,avx,avx2,avx512bitalg,avx512bw,avx512cd,avx512dq,avx512f,avx512ifma,avx512vbmi,avx512vbmi2,avx512vl,avx512vnni,avx512vpopcntdq,bmi1,bmi2,cx16,erms,f16c,fma3,movbe,popcnt,rdrnd,sha,sse4_1,sse4_2,ssse3,vpclmulqdq

lscpu

[wtambellini@r7iz ~]$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               143
Model name:          Intel(R) Xeon(R) Processor
Stepping:            3
CPU MHz:             3667.317
BogoMIPS:            6000.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           48K
L1i cache:           32K
L2 cache:            2048K
L3 cache:            61440K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 wbnoinvd ida arat avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid cldemote movdiri movdir64b md_clear serialize flush_l1d arch_capabilities

https://aws.amazon.com/ec2/instance-types/r7iz/

toor1245 commented 1 year ago

@WilliamTambellini, could you provide cpuid dump via cpuid -r -1?

toor1245 commented 1 year ago

@WilliamTambellini did you test with the latest commit? Since we already have Sapphire_Rapids detection: https://github.com/google/cpu_features/blob/main/src/impl_x86__base_implementation.inl#L618

also, we added detection movdiri movdir64b: https://github.com/google/cpu_features/blob/main/src/impl_x86__base_implementation.inl#L385-L386

WilliamTambellini commented 1 year ago

Hi @toor1245
Voila, retested with todays master on r7iz.2xl and

[wtambellini@ahuang-fc37 Release]$ ./list_cpu_features 
arch            : x86
brand           : Intel(R) Xeon(R) Processor
family          :   6 (0x06)
model           : 143 (0x8F)
stepping        :   7 (0x07)
uarch           : INTEL_SPR
flags           : adx,aes,avx,avx2,avx512_bf16,avx512_second_fma,avx512bitalg,avx512bw,avx512cd,avx512dq,avx512f,avx512ifma,avx512vbmi,avx512vbmi2,avx512vl,avx512vnni,avx512vpopcntdq,avx_vnni,bmi1,bmi2,clflushopt,clfsh,clwb,cx16,cx8,erms,f16c,fma3,fpu,gfni,lzcnt,mmx,movbe,movdir64b,movdiri,pclmulqdq,popcnt,rdrnd,rdseed,sha,ss,sse,sse2,sse3,sse4_1,sse4_2,ssse3,tsc,vaes,vpclmulqdq
cache_info      : {"level":1,"cache_type":"data","cache_size":49152,"ways":12,"line_size":64,"tlb_entries":64,"partitioning":1},{"level":1,"cache_type":"instruction","cache_size":32768,"ways":8,"line_size":64,"tlb_entries":64,"partitioning":1},{"level":2,"cache_type":"unified","cache_size":2097152,"ways":16,"line_size":64,"tlb_entries":2048,"partitioning":1},{"level":3,"cache_type":"unified","cache_size":62914560,"ways":15,"line_size":64,"tlb_entries":65536,"partitioning":1}

The uarch is now good but no amx flags. Have you added the code to detect amx flags ? Best W.

toor1245 commented 1 year ago

@WilliamTambellini, yes, we have amx flags detection

https://github.com/google/cpu_features/blob/main/src/impl_x86__base_implementation.inl#L228-L232 image

https://github.com/google/cpu_features/blob/main/src/impl_x86__base_implementation.inl#L447-L449 image

we don't have cpuid dump for Sapphire Rapids, could you send dump to check this flags via cpuid -r -1 command or /proc/cpuinfo

skaiware commented 1 year ago

Tks @toor1245 AWS (atm) disables AMX on VMs so that s why there is no amx flags in the output. Here is the cpuid output:

[wtambellini@r7iz2xl ~]$ cpuid -r -1
CPU:
   0x00000000 0x00: eax=0x0000001f ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69
   0x00000001 0x00: eax=0x000806f7 ebx=0x00080800 ecx=0xfffab20b edx=0x1f8bfbff
   0x00000002 0x00: eax=0x00feff01 ebx=0x000000f0 ecx=0x00000000 edx=0x00000000
   0x00000003 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000004 0x00: eax=0x0c004121 ebx=0x02c0003f ecx=0x0000003f edx=0x00000000
   0x00000004 0x01: eax=0x0c004122 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
   0x00000004 0x02: eax=0x0c004143 ebx=0x03c0003f ecx=0x000007ff edx=0x00000000
   0x00000004 0x03: eax=0x0c01c163 ebx=0x0380003f ecx=0x0000ffff edx=0x00000004
   0x00000004 0x04: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000005 0x00: eax=0x00000040 ebx=0x00000040 ecx=0x00000003 edx=0x00001020
   0x00000006 0x00: eax=0x00000006 ebx=0x00000000 ecx=0x00000001 edx=0x00000000
   0x00000007 0x00: eax=0x00000001 ebx=0xf1bf07ab ecx=0x1a407f7e edx=0xbc004400
   0x00000007 0x01: eax=0x00000030 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000008 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000009 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000000a 0x00: eax=0x08300805 ebx=0x00000000 ecx=0x00000005 edx=0x00008600
   0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000000
   0x0000000b 0x01: eax=0x00000003 ebx=0x00000008 ecx=0x00000201 edx=0x00000000
   0x0000000b 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x00000000
   0x0000000c 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000000d 0x00: eax=0x000002e7 ebx=0x00000a88 ecx=0x00000a88 edx=0x00000000
   0x0000000d 0x01: eax=0x0000000f ebx=0x00000988 ecx=0x00000000 edx=0x00000000
   0x0000000d 0x02: eax=0x00000100 ebx=0x00000240 ecx=0x00000000 edx=0x00000000
   0x0000000d 0x05: eax=0x00000040 ebx=0x00000440 ecx=0x00000000 edx=0x00000000
   0x0000000d 0x06: eax=0x00000200 ebx=0x00000480 ecx=0x00000000 edx=0x00000000
   0x0000000d 0x07: eax=0x00000400 ebx=0x00000680 ecx=0x00000000 edx=0x00000000
   0x0000000d 0x09: eax=0x00000008 ebx=0x00000a80 ecx=0x00000000 edx=0x00000000
   0x0000000e 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000000f 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000010 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000011 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000012 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000012 0x01: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000012 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000013 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000014 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000015 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000016 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000017 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000018 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000019 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000001a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000001b 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000001b 0x01: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000001c 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000001d 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000001e 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000001f 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000000
   0x0000001f 0x01: eax=0x00000003 ebx=0x00000008 ecx=0x00000201 edx=0x00000000
   0x0000001f 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 edx=0x00000000
   0x20000000 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000000
   0x40000000 0x00: eax=0x40000010 ebx=0x4b4d564b ecx=0x564b4d56 edx=0x0000004d
   0x40000001 0x00: eax=0x0100807b ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000002 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000003 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000004 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000005 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000006 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000008 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000009 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x4000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x4000000b 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x4000000c 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x4000000d 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x4000000e 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x4000000f 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x40000010 0x00: eax=0x002dc6c0 ebx=0x000f4240 ecx=0x00000000 edx=0x00000000
   0x40000100 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000000
   0x80000000 0x00: eax=0x80000008 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000001 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000121 edx=0x2c100800
   0x80000002 0x00: eax=0x65746e49 ebx=0x2952286c ecx=0x6f655820 edx=0x2952286e
   0x80000003 0x00: eax=0x6f725020 ebx=0x73736563 ecx=0x0000726f edx=0x00000000
   0x80000004 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000005 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000006 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x08007040 edx=0x00000000
   0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000100
   0x80000008 0x00: eax=0x00003030 ebx=0x00000200 ecx=0x00000000 edx=0x00000000
   0x80860000 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000000
   0xc0000000 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000000
[wtambellini@r7iz2xl ~]$ cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 143
model name  : Intel(R) Xeon(R) Processor
stepping    : 7
microcode   : 0x2a000080
cpu MHz     : 3835.538
cache size  : 61440 KB
physical id : 0
siblings    : 8
core id     : 0
cpu cores   : 4
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 31
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd ida arat avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid cldemote movdiri movdir64b md_clear serialize flush_l1d arch_capabilities
bugs        : spectre_v1 spectre_v2 spec_store_bypass swapgs eibrs_pbrsb
bogomips    : 6000.00
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: