jdmccalpin / low-overhead-timers

Very low-overhead timer/counter interfaces for C on Intel 64 processors.
BSD 3-Clause "New" or "Revised" License
116 stars 16 forks source link

Seg faults with this AMD CPU :-( #4

Open simonhf opened 1 year ago

simonhf commented 1 year ago
$ git clone https://github.com/jdmccalpin/low-overhead-timers.git
$ cd low-overhead-timers/
$ cd LowOverheadTimersTests/
$ ./build_timer_tests.sh 
Intel icc compiler not found, skipping....
compiling externally linked version with gcc
compiling inlined version with gcc

$ ./timer_ovhd_inline.gcc.exe 
Nominal GHz -0.000000
programmable core counter width is 0 bits
fixed-function core counter width is 0 bits
Affinity set to cpu 4
  get_core_number returns 4
  get_socket_number returns 0
  full_rdtsc returns chip 0, core 4
Spinning for a short time to allow the processor to ramp up to full speed
Segmentation fault

$ ./timer_ovhd_external.gcc.exe 
Nominal GHz -0.000000
programmable core counter width is 0 bits
fixed-function core counter width is 0 bits
Affinity set to cpu 4
  get_core_number returns 4
  get_socket_number returns 0
  full_rdtsc returns chip 0, core 4
Spinning for a short time to allow the processor to ramp up to full speed
Segmentation fault (core dumped)

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  32
  On-line CPU(s) list:   0-31
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 9 3950X 16-Core Processor
    CPU family:          23
    Model:               113
    Thread(s) per core:  2
    Core(s) per socket:  16
    Socket(s):           1
    Stepping:            0
    Frequency boost:     disabled
    CPU max MHz:         4761.2300
    CPU min MHz:         0.0000
    BogoMIPS:            6986.84
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_ts
                         c cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misal
                         ignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep b
                         mi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt 
                         lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   512 KiB (16 instances)
  L1i:                   512 KiB (16 instances)
  L2:                    8 MiB (16 instances)
  L3:                    64 MiB (4 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-31
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Mitigation; untrained return thunk; SMT enabled with STIBP protection
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected
jdmccalpin commented 1 year ago

Hmmm..... AMD does not support the Intel "trick" of using RDPMC with performance counter numbers (1<<30)+[0123] to access the fixed-function performance counters. AMD does appear to support the same fixed-function counters, but only via the MSR interfaces (which can only be accessed in kernel mode, so don't qualify as "low overhead" counters from user space).
Fixing this is going to take a modest amount of work -- the three functions rdpmc_instructions, rdpmc_actual_cycles, and rdpmc_reference cycles will have to be removed for an AMD version. The harder part will be going through the CPUID-based routines and trying to figure out if alternative functionality is available in the AMD processors.

travisdowns commented 1 year ago

Hmmm..... AMD does not support the Intel "trick" of using RDPMC with performance counter numbers (1<<30)+[0123] to access the fixed-function performance counters. AMD does appear to support the same fixed-function counters, but only via the MSR interfaces (which can only be accessed in kernel mode, so don't qualify as "low overhead" counters from user space).

Interesting. Do you have any reference for this? Seems like a pretty big limitation.

jdmccalpin commented 1 year ago

The "AMD64 Architecture Programmer's Manual. Volume 3 General Purpose and System Instructions" (document 24594, revision 3.35, June 2023),section on the RDPMC instructio:n has a table listing the allowed counter numbers. Counters 0-5 are core counters, 6-9 are Northbridge counters, 10-15 are L3 counters, and 16-27 are additional Northbridge counters. Values > 27 are reserved.
It makes me very happy that the L3 and Northbridge counters are accessible using RDPMC, but there is no indication that the "fixed function" performance counters can be accessed this way.