rr-debugger / rr

Record and Replay Framework
http://rr-project.org/
Other
9.17k stars 586 forks source link

Support Goldmont Plus aka. Gemini Lake #3098

Closed Frederick888 closed 2 years ago

Frederick888 commented 2 years ago

I've got a small box running an Intel J4105 CPU.

When running rr, I got the error below:

[FATAL /tmp/frederick/.cache/paru/clone/rr/src/rr-5.5.0/src/PerfCounters_x86.h:106:compute_cpu_microarch()] Intel CPU type 0x706a0 unknown

I noticed that Goldmont, which I assume should be rather similar, is supported, so I modified the code,

diff --git a/src/PerfCounters_x86.h b/src/PerfCounters_x86.h
index db23cbe0..c3df82c4 100644
--- a/src/PerfCounters_x86.h
+++ b/src/PerfCounters_x86.h
@@ -62,8 +62,9 @@ static CpuMicroarch compute_cpu_microarch() {
     case 0x406c0:
     case 0x50670:
       return IntelSilvermont;
     case 0x506f0:
+    case 0x706a0:
       return IntelGoldmont;
     case 0x706e0:
     case 0x606a0:
       return IntelIcelake;

...tested it out on a hello world C program, and it looked ok.

I'm pretty new to rr and I'm not sure if this workaround is a proper solution. It'd be great if I can have some help to mainline this if possible. Thanks :)

PS: perf list output

  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  bus-cycles                                         [Hardware event]
  cache-misses                                       [Hardware event]
  cache-references                                   [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  ref-cycles                                         [Hardware event]
  alignment-faults                                   [Software event]
  bpf-output                                         [Software event]
  cgroup-switches                                    [Software event]
  context-switches OR cs                             [Software event]
  cpu-clock                                          [Software event]
  cpu-migrations OR migrations                       [Software event]
  dummy                                              [Software event]
  emulation-faults                                   [Software event]
  major-faults                                       [Software event]
  minor-faults                                       [Software event]
  page-faults OR faults                              [Software event]
  task-clock                                         [Software event]
  duration_time                                      [Tool event]
  L1-dcache-loads                                    [Hardware cache event]
  L1-dcache-stores                                   [Hardware cache event]
  L1-icache-load-misses                              [Hardware cache event]
  L1-icache-loads                                    [Hardware cache event]
  LLC-load-misses                                    [Hardware cache event]
  LLC-loads                                          [Hardware cache event]
  LLC-store-misses                                   [Hardware cache event]
  LLC-stores                                         [Hardware cache event]
  branch-load-misses                                 [Hardware cache event]
  branch-loads                                       [Hardware cache event]
  dTLB-load-misses                                   [Hardware cache event]
  dTLB-loads                                         [Hardware cache event]
  dTLB-store-misses                                  [Hardware cache event]
  dTLB-stores                                        [Hardware cache event]
  iTLB-load-misses                                   [Hardware cache event]
  iTLB-loads                                         [Hardware cache event]
  branch-instructions OR cpu/branch-instructions/    [Kernel PMU event]
  branch-misses OR cpu/branch-misses/                [Kernel PMU event]
  bus-cycles OR cpu/bus-cycles/                      [Kernel PMU event]
  cache-misses OR cpu/cache-misses/                  [Kernel PMU event]
  cache-references OR cpu/cache-references/          [Kernel PMU event]
  cpu-cycles OR cpu/cpu-cycles/                      [Kernel PMU event]
  instructions OR cpu/instructions/                  [Kernel PMU event]
  ref-cycles OR cpu/ref-cycles/                      [Kernel PMU event]
  topdown-fetch-bubbles OR cpu/topdown-fetch-bubbles/ [Kernel PMU event]
  topdown-recovery-bubbles OR cpu/topdown-recovery-bubbles/ [Kernel PMU event]
  topdown-slots-issued OR cpu/topdown-slots-issued/  [Kernel PMU event]
  topdown-slots-retired OR cpu/topdown-slots-retired/ [Kernel PMU event]
  topdown-total-slots OR cpu/topdown-total-slots/    [Kernel PMU event]
  cstate_core/c1-residency/                          [Kernel PMU event]
  cstate_core/c3-residency/                          [Kernel PMU event]
  cstate_core/c6-residency/                          [Kernel PMU event]
  cstate_pkg/c10-residency/                          [Kernel PMU event]
  cstate_pkg/c2-residency/                           [Kernel PMU event]
  cstate_pkg/c3-residency/                           [Kernel PMU event]
  cstate_pkg/c6-residency/                           [Kernel PMU event]
  i915/actual-frequency/                             [Kernel PMU event]
  i915/bcs0-busy/                                    [Kernel PMU event]
  i915/bcs0-sema/                                    [Kernel PMU event]
  i915/bcs0-wait/                                    [Kernel PMU event]
  i915/interrupts/                                   [Kernel PMU event]
  i915/rc6-residency/                                [Kernel PMU event]
  i915/rcs0-busy/                                    [Kernel PMU event]
  i915/rcs0-sema/                                    [Kernel PMU event]
  i915/rcs0-wait/                                    [Kernel PMU event]
  i915/requested-frequency/                          [Kernel PMU event]
  i915/software-gt-awake-time/                       [Kernel PMU event]
  i915/vcs0-busy/                                    [Kernel PMU event]
  i915/vcs0-sema/                                    [Kernel PMU event]
  i915/vcs0-wait/                                    [Kernel PMU event]
  i915/vecs0-busy/                                   [Kernel PMU event]
  i915/vecs0-sema/                                   [Kernel PMU event]
  i915/vecs0-wait/                                   [Kernel PMU event]
  intel_pt//                                         [Kernel PMU event]
  msr/aperf/                                         [Kernel PMU event]
  msr/cpu_thermal_margin/                            [Kernel PMU event]
  msr/mperf/                                         [Kernel PMU event]
  msr/smi/                                           [Kernel PMU event]
  msr/tsc/                                           [Kernel PMU event]
  power/energy-cores/                                [Kernel PMU event]
  power/energy-gpu/                                  [Kernel PMU event]
  power/energy-pkg/                                  [Kernel PMU event]
  power/energy-ram/                                  [Kernel PMU event]

cache:
  core_reject_l2q.all                               
       [Requests rejected by the L2Q]
  dl1.replacement                                   
       [L1 Cache evictions for dirty data]
  fetch_stall.icache_fill_pending_cycles            
       [Cycles code-fetch stalled due to an outstanding ICache miss]
  l2_reject_xq.all                                  
       [Requests rejected by the XQ]
  longest_lat_cache.miss                            
       [L2 cache request misses]
  longest_lat_cache.reference                       
       [L2 cache requests]
  mem_load_uops_retired.dram_hit                    
       [Loads retired that came from DRAM (Precise event capable) Supports
        address when precise (Must be precise)]
  mem_load_uops_retired.hitm                        
       [Memory uop retired where cross core or cross module HITM occurred
        (Precise event capable) Supports address when precise (Must be
        precise)]
  mem_load_uops_retired.l1_hit                      
       [Load uops retired that hit L1 data cache (Precise event capable)
        Supports address when precise (Must be precise)]
  mem_load_uops_retired.l1_miss                     
       [Load uops retired that missed L1 data cache (Precise event capable)
        Supports address when precise (Must be precise)]
  mem_load_uops_retired.l2_hit                      
       [Load uops retired that hit L2 (Precise event capable) Supports address
        when precise (Must be precise)]
  mem_load_uops_retired.l2_miss                     
       [Load uops retired that missed L2 (Precise event capable) Supports
        address when precise (Must be precise)]
  mem_load_uops_retired.wcb_hit                     
       [Loads retired that hit WCB (Precise event capable) Supports address
        when precise (Must be precise)]
  mem_uops_retired.all                              
       [Memory uops retired (Precise event capable) Supports address when
        precise (Must be precise)]
  mem_uops_retired.all_loads                        
       [Load uops retired (Precise event capable) Supports address when
        precise (Must be precise)]
  mem_uops_retired.all_stores                       
       [Store uops retired (Precise event capable) Supports address when
        precise (Must be precise)]
  mem_uops_retired.lock_loads                       
       [Locked load uops retired (Precise event capable) Supports address when
        precise (Must be precise)]
  mem_uops_retired.split                            
       [Memory uops retired that split a cache-line (Precise event capable)
        Supports address when precise (Must be precise)]
  mem_uops_retired.split_loads                      
       [Load uops retired that split a cache-line (Precise event capable)
        Supports address when precise (Must be precise)]
  mem_uops_retired.split_stores                     
       [Stores uops retired that split a cache-line (Precise event capable)
        Supports address when precise (Must be precise)]
  offcore_response                                  
       [Requires MSR_OFFCORE_RESP[0,1] to specify request type and response.
        (duplicated for both MSRs)]
  offcore_response.any_data_rd.any_response         
       [Counts data reads (demand & prefetch) have any transaction responses
        from the uncore subsystem]
  offcore_response.any_data_rd.l2_hit               
       [Counts data reads (demand & prefetch) hit the L2 cache]
  offcore_response.any_data_rd.l2_miss.hitm_other_core
       [Counts data reads (demand & prefetch) miss the L2 cache with a snoop
        hit in the other processor module, data forwarding is required]
  offcore_response.any_data_rd.l2_miss.snoop_miss_or_no_snoop_needed
       [Counts data reads (demand & prefetch) true miss for the L2 cache with
        a snoop miss in the other processor module]
  offcore_response.any_data_rd.outstanding          
       [Counts data reads (demand & prefetch) outstanding, per cycle, from the
        time of the L2 miss to when any response is received]
  offcore_response.any_pf_data_rd.any_response      
       [Counts data reads generated by L1 or L2 prefetchers have any
        transaction responses from the uncore subsystem]
  offcore_response.any_pf_data_rd.l2_hit            
       [Counts data reads generated by L1 or L2 prefetchers hit the L2 cache]
  offcore_response.any_pf_data_rd.l2_miss.hitm_other_core
       [Counts data reads generated by L1 or L2 prefetchers miss the L2 cache
        with a snoop hit in the other processor module, data forwarding is
        required]
  offcore_response.any_pf_data_rd.l2_miss.snoop_miss_or_no_snoop_needed
       [Counts data reads generated by L1 or L2 prefetchers true miss for the
        L2 cache with a snoop miss in the other processor module]
  offcore_response.any_pf_data_rd.outstanding       
       [Counts data reads generated by L1 or L2 prefetchers outstanding, per
        cycle, from the time of the L2 miss to when any response is received]
  offcore_response.any_read.any_response            
       [Counts data read, code read, and read for ownership (RFO) requests
        (demand & prefetch) have any transaction responses from the uncore
        subsystem]
  offcore_response.any_read.l2_hit                  
       [Counts data read, code read, and read for ownership (RFO) requests
        (demand & prefetch) hit the L2 cache]
  offcore_response.any_read.l2_miss.hitm_other_core 
       [Counts data read, code read, and read for ownership (RFO) requests
        (demand & prefetch) miss the L2 cache with a snoop hit in the other
        processor module, data forwarding is required]
  offcore_response.any_read.l2_miss.snoop_miss_or_no_snoop_needed
       [Counts data read, code read, and read for ownership (RFO) requests
        (demand & prefetch) true miss for the L2 cache with a snoop miss in
        the other processor module]
  offcore_response.any_read.outstanding             
       [Counts data read, code read, and read for ownership (RFO) requests
        (demand & prefetch) outstanding, per cycle, from the time of the L2
        miss to when any response is received]
  offcore_response.any_request.any_response         
       [Counts requests to the uncore subsystem have any transaction responses
        from the uncore subsystem]
  offcore_response.any_request.l2_hit               
       [Counts requests to the uncore subsystem hit the L2 cache]
  offcore_response.any_request.l2_miss.hitm_other_core
       [Counts requests to the uncore subsystem miss the L2 cache with a snoop
        hit in the other processor module, data forwarding is required]
  offcore_response.any_request.l2_miss.snoop_miss_or_no_snoop_needed
       [Counts requests to the uncore subsystem true miss for the L2 cache
        with a snoop miss in the other processor module]
  offcore_response.any_request.outstanding          
       [Counts requests to the uncore subsystem outstanding, per cycle, from
        the time of the L2 miss to when any response is received]
  offcore_response.any_rfo.any_response             
       [Counts reads for ownership (RFO) requests (demand & prefetch) have any
        transaction responses from the uncore subsystem]
  offcore_response.any_rfo.l2_hit                   
       [Counts reads for ownership (RFO) requests (demand & prefetch) hit the
        L2 cache]
  offcore_response.any_rfo.l2_miss.hitm_other_core  
       [Counts reads for ownership (RFO) requests (demand & prefetch) miss the
        L2 cache with a snoop hit in the other processor module, data
        forwarding is required]
  offcore_response.any_rfo.l2_miss.snoop_miss_or_no_snoop_needed
       [Counts reads for ownership (RFO) requests (demand & prefetch) true
        miss for the L2 cache with a snoop miss in the other processor module]
  offcore_response.any_rfo.outstanding              
       [Counts reads for ownership (RFO) requests (demand & prefetch)
        outstanding, per cycle, from the time of the L2 miss to when any
        response is received]
  offcore_response.bus_locks.any_response           
       [Counts bus lock and split lock requests have any transaction responses
        from the uncore subsystem]
  offcore_response.bus_locks.l2_hit                 
       [Counts bus lock and split lock requests hit the L2 cache]
  offcore_response.bus_locks.l2_miss.hitm_other_core
       [Counts bus lock and split lock requests miss the L2 cache with a snoop
        hit in the other processor module, data forwarding is required]
  offcore_response.bus_locks.l2_miss.snoop_miss_or_no_snoop_needed
       [Counts bus lock and split lock requests true miss for the L2 cache
        with a snoop miss in the other processor module]
  offcore_response.bus_locks.outstanding            
       [Counts bus lock and split lock requests outstanding, per cycle, from
        the time of the L2 miss to when any response is received]
  offcore_response.corewb.any_response              
       [Counts the number of writeback transactions caused by L1 or L2 cache
        evictions have any transaction responses from the uncore subsystem]
  offcore_response.corewb.l2_hit                    
       [Counts the number of writeback transactions caused by L1 or L2 cache
        evictions hit the L2 cache]
  offcore_response.corewb.l2_miss.hitm_other_core   
       [Counts the number of writeback transactions caused by L1 or L2 cache
        evictions miss the L2 cache with a snoop hit in the other processor
        module, data forwarding is required]
  offcore_response.corewb.l2_miss.snoop_miss_or_no_snoop_needed
       [Counts the number of writeback transactions caused by L1 or L2 cache
        evictions true miss for the L2 cache with a snoop miss in the other
        processor module]
  offcore_response.corewb.outstanding               
       [Counts the number of writeback transactions caused by L1 or L2 cache
        evictions outstanding, per cycle, from the time of the L2 miss to when
        any response is received]
  offcore_response.demand_code_rd.any_response      
       [Counts demand instruction cacheline and I-side prefetch requests that
        miss the instruction cache have any transaction responses from the
        uncore subsystem]
  offcore_response.demand_code_rd.l2_hit            
       [Counts demand instruction cacheline and I-side prefetch requests that
        miss the instruction cache hit the L2 cache]
  offcore_response.demand_code_rd.l2_miss.hitm_other_core
       [Counts demand instruction cacheline and I-side prefetch requests that
        miss the instruction cache miss the L2 cache with a snoop hit in the
        other processor module, data forwarding is required]
  offcore_response.demand_code_rd.l2_miss.snoop_miss_or_no_snoop_needed
       [Counts demand instruction cacheline and I-side prefetch requests that
        miss the instruction cache true miss for the L2 cache with a snoop
        miss in the other processor module]
  offcore_response.demand_code_rd.outstanding       
       [Counts demand instruction cacheline and I-side prefetch requests that
        miss the instruction cache outstanding, per cycle, from the time of
        the L2 miss to when any response is received]
  offcore_response.demand_data_rd.any_response      
       [Counts demand cacheable data reads of full cache lines have any
        transaction responses from the uncore subsystem]
  offcore_response.demand_data_rd.l2_hit            
       [Counts demand cacheable data reads of full cache lines hit the L2
        cache]
  offcore_response.demand_data_rd.l2_miss.hitm_other_core
       [Counts demand cacheable data reads of full cache lines miss the L2
        cache with a snoop hit in the other processor module, data forwarding
        is required]
  offcore_response.demand_data_rd.l2_miss.snoop_miss_or_no_snoop_needed
       [Counts demand cacheable data reads of full cache lines true miss for
        the L2 cache with a snoop miss in the other processor module]
  offcore_response.demand_data_rd.outstanding       
       [Counts demand cacheable data reads of full cache lines outstanding,
        per cycle, from the time of the L2 miss to when any response is
        received]
  offcore_response.demand_rfo.any_response          
       [Counts demand reads for ownership (RFO) requests generated by a write
        to full data cache line have any transaction responses from the uncore
        subsystem]
  offcore_response.demand_rfo.l2_hit                
       [Counts demand reads for ownership (RFO) requests generated by a write
        to full data cache line hit the L2 cache]
  offcore_response.demand_rfo.l2_miss.hitm_other_core
       [Counts demand reads for ownership (RFO) requests generated by a write
        to full data cache line miss the L2 cache with a snoop hit in the
        other processor module, data forwarding is required]
  offcore_response.demand_rfo.l2_miss.snoop_miss_or_no_snoop_needed
       [Counts demand reads for ownership (RFO) requests generated by a write
        to full data cache line true miss for the L2 cache with a snoop miss
        in the other processor module]
  offcore_response.demand_rfo.outstanding           
       [Counts demand reads for ownership (RFO) requests generated by a write
        to full data cache line outstanding, per cycle, from the time of the
        L2 miss to when any response is received]
  offcore_response.full_streaming_stores.any_response
       [Counts full cache line data writes to uncacheable write combining
        (USWC) memory region and full cache-line non-temporal writes have any
        transaction responses from the uncore subsystem]
  offcore_response.full_streaming_stores.l2_hit     
       [Counts full cache line data writes to uncacheable write combining
        (USWC) memory region and full cache-line non-temporal writes hit the
        L2 cache]
  offcore_response.full_streaming_stores.l2_miss.hitm_other_core
       [Counts full cache line data writes to uncacheable write combining
        (USWC) memory region and full cache-line non-temporal writes miss the
        L2 cache with a snoop hit in the other processor module, data
        forwarding is required]
  offcore_response.full_streaming_stores.l2_miss.snoop_miss_or_no_snoop_needed
       [Counts full cache line data writes to uncacheable write combining
        (USWC) memory region and full cache-line non-temporal writes true miss
        for the L2 cache with a snoop miss in the other processor module]
  offcore_response.full_streaming_stores.outstanding
       [Counts full cache line data writes to uncacheable write combining
        (USWC) memory region and full cache-line non-temporal writes
        outstanding, per cycle, from the time of the L2 miss to when any
        response is received]
  offcore_response.pf_l1_data_rd.any_response       
       [Counts data cache line reads generated by hardware L1 data cache
        prefetcher have any transaction responses from the uncore subsystem]
  offcore_response.pf_l1_data_rd.l2_hit             
       [Counts data cache line reads generated by hardware L1 data cache
        prefetcher hit the L2 cache]
  offcore_response.pf_l1_data_rd.l2_miss.hitm_other_core
       [Counts data cache line reads generated by hardware L1 data cache
        prefetcher miss the L2 cache with a snoop hit in the other processor
        module, data forwarding is required]
  offcore_response.pf_l1_data_rd.l2_miss.snoop_miss_or_no_snoop_needed
       [Counts data cache line reads generated by hardware L1 data cache
        prefetcher true miss for the L2 cache with a snoop miss in the other
        processor module]
  offcore_response.pf_l1_data_rd.outstanding        
       [Counts data cache line reads generated by hardware L1 data cache
        prefetcher outstanding, per cycle, from the time of the L2 miss to
        when any response is received]
  offcore_response.pf_l2_data_rd.any_response       
       [Counts data cacheline reads generated by hardware L2 cache prefetcher
        have any transaction responses from the uncore subsystem]
  offcore_response.pf_l2_data_rd.l2_hit             
       [Counts data cacheline reads generated by hardware L2 cache prefetcher
        hit the L2 cache]
  offcore_response.pf_l2_data_rd.l2_miss.hitm_other_core
       [Counts data cacheline reads generated by hardware L2 cache prefetcher
        miss the L2 cache with a snoop hit in the other processor module, data
        forwarding is required]
  offcore_response.pf_l2_data_rd.l2_miss.snoop_miss_or_no_snoop_needed
       [Counts data cacheline reads generated by hardware L2 cache prefetcher
        true miss for the L2 cache with a snoop miss in the other processor
        module]
  offcore_response.pf_l2_data_rd.outstanding        
       [Counts data cacheline reads generated by hardware L2 cache prefetcher
        outstanding, per cycle, from the time of the L2 miss to when any
        response is received]
  offcore_response.pf_l2_rfo.any_response           
       [Counts reads for ownership (RFO) requests generated by L2 prefetcher
        have any transaction responses from the uncore subsystem]
  offcore_response.pf_l2_rfo.l2_hit                 
       [Counts reads for ownership (RFO) requests generated by L2 prefetcher
        hit the L2 cache]
  offcore_response.pf_l2_rfo.l2_miss.hitm_other_core
       [Counts reads for ownership (RFO) requests generated by L2 prefetcher
        miss the L2 cache with a snoop hit in the other processor module, data
        forwarding is required]
  offcore_response.pf_l2_rfo.l2_miss.snoop_miss_or_no_snoop_needed
       [Counts reads for ownership (RFO) requests generated by L2 prefetcher
        true miss for the L2 cache with a snoop miss in the other processor
        module]
  offcore_response.pf_l2_rfo.outstanding            
       [Counts reads for ownership (RFO) requests generated by L2 prefetcher
        outstanding, per cycle, from the time of the L2 miss to when any
        response is received]
  offcore_response.streaming_stores.any_response    
       [Counts any data writes to uncacheable write combining (USWC) memory
        region have any transaction responses from the uncore subsystem]
  offcore_response.streaming_stores.l2_hit          
       [Counts any data writes to uncacheable write combining (USWC) memory
        region hit the L2 cache]
  offcore_response.streaming_stores.l2_miss.hitm_other_core
       [Counts any data writes to uncacheable write combining (USWC) memory
        region miss the L2 cache with a snoop hit in the other processor
        module, data forwarding is required]
  offcore_response.streaming_stores.l2_miss.snoop_miss_or_no_snoop_needed
       [Counts any data writes to uncacheable write combining (USWC) memory
        region true miss for the L2 cache with a snoop miss in the other
        processor module]
  offcore_response.streaming_stores.outstanding     
       [Counts any data writes to uncacheable write combining (USWC) memory
        region outstanding, per cycle, from the time of the L2 miss to when
        any response is received]
  offcore_response.sw_prefetch.any_response         
       [Counts data cache lines requests by software prefetch instructions
        have any transaction responses from the uncore subsystem]
  offcore_response.sw_prefetch.l2_hit               
       [Counts data cache lines requests by software prefetch instructions hit
        the L2 cache]
  offcore_response.sw_prefetch.l2_miss.hitm_other_core
       [Counts data cache lines requests by software prefetch instructions
        miss the L2 cache with a snoop hit in the other processor module, data
        forwarding is required]
  offcore_response.sw_prefetch.l2_miss.snoop_miss_or_no_snoop_needed
       [Counts data cache lines requests by software prefetch instructions
        true miss for the L2 cache with a snoop miss in the other processor
        module]
  offcore_response.sw_prefetch.outstanding          
       [Counts data cache lines requests by software prefetch instructions
        outstanding, per cycle, from the time of the L2 miss to when any
        response is received]

frontend:
  decode_restriction.predecode_wrong                
       [Decode restrictions due to predicting wrong instruction length]
  icache.accesses                                   
       [References per ICache line. This event counts differently than Intel
        processors based on Silvermont microarchitecture]
  icache.hit                                        
       [References per ICache line that are available in the ICache (hit).
        This event counts differently than Intel processors based on
        Silvermont microarchitecture]
  icache.misses                                     
       [References per ICache line that are not available in the ICache
        (miss). This event counts differently than Intel processors based on
        Silvermont microarchitecture]
  ms_decoded.ms_entry                               
       [MS decode starts]

memory:
  machine_clears.memory_ordering                    
       [Machine clears due to memory ordering issue]
  misalign_mem_ref.load_page_split                  
       [Load uops that split a page (Precise event capable) (Must be precise)]
  misalign_mem_ref.store_page_split                 
       [Store uops that split a page (Precise event capable) (Must be precise)]

other:
  fetch_stall.all                                   
       [Cycles code-fetch stalled due to any reason]
  fetch_stall.itlb_fill_pending_cycles              
       [Cycles the code-fetch stalls and an ITLB miss is outstanding]
  hw_interrupts.masked                              
       [Cycles hardware interrupts are masked]
  hw_interrupts.pending_and_masked                  
       [Cycles pending interrupts are masked]
  hw_interrupts.received                            
       [Hardware interrupts received]
  issue_slots_not_consumed.any                      
       [Unfilled issue slots per cycle]
  issue_slots_not_consumed.recovery                 
       [Unfilled issue slots per cycle to recover]
  issue_slots_not_consumed.resource_full            
       [Unfilled issue slots per cycle because of a full resource in the
        backend]

pipeline:
  baclears.all                                      
       [BACLEARs asserted for any branch type]
  baclears.cond                                     
       [BACLEARs asserted for conditional branch]
  baclears.return                                   
       [BACLEARs asserted for return branch]
  br_inst_retired.all_branches                      
       [Retired branch instructions (Precise event capable) (Must be precise)]
  br_inst_retired.all_taken_branches                
       [Retired taken branch instructions (Precise event capable) (Must be
        precise)]
  br_inst_retired.call                              
       [Retired near call instructions (Precise event capable) (Must be
        precise)]
  br_inst_retired.far_branch                        
       [Retired far branch instructions (Precise event capable) (Must be
        precise)]
  br_inst_retired.ind_call                          
       [Retired near indirect call instructions (Precise event capable) (Must
        be precise)]
  br_inst_retired.jcc                               
       [Retired conditional branch instructions (Precise event capable) (Must
        be precise)]
  br_inst_retired.non_return_ind                    
       [Retired instructions of near indirect Jmp or call (Precise event
        capable) (Must be precise)]
  br_inst_retired.rel_call                          
       [Retired near relative call instructions (Precise event capable) (Must
        be precise)]
  br_inst_retired.return                            
       [Retired near return instructions (Precise event capable) (Must be
        precise)]
  br_inst_retired.taken_jcc                         
       [Retired conditional branch instructions that were taken (Precise event
        capable) (Must be precise)]
  br_misp_retired.all_branches                      
       [Retired mispredicted branch instructions (Precise event capable) (Must
        be precise)]
  br_misp_retired.ind_call                          
       [Retired mispredicted near indirect call instructions (Precise event
        capable) (Must be precise)]
  br_misp_retired.jcc                               
       [Retired mispredicted conditional branch instructions (Precise event
        capable) (Must be precise)]
  br_misp_retired.non_return_ind                    
       [Retired mispredicted instructions of near indirect Jmp or near
        indirect call (Precise event capable) (Must be precise)]
  br_misp_retired.return                            
       [Retired mispredicted near return instructions (Precise event capable)
        (Must be precise)]
  br_misp_retired.taken_jcc                         
       [Retired mispredicted conditional branch instructions that were taken
        (Precise event capable) (Must be precise)]
  cpu_clk_unhalted.core                             
       [Core cycles when core is not halted (Fixed event)]
  cpu_clk_unhalted.core_p                           
       [Core cycles when core is not halted]
  cpu_clk_unhalted.ref                              
       [Reference cycles when core is not halted]
  cpu_clk_unhalted.ref_tsc                          
       [Reference cycles when core is not halted (Fixed event)]
  cycles_div_busy.all                               
       [Cycles a divider is busy]
  cycles_div_busy.fpdiv                             
       [Cycles the FP divide unit is busy]
  cycles_div_busy.idiv                              
       [Cycles the integer divide unit is busy]
  inst_retired.any                                  
       [Instructions retired (Fixed event) (Must be precise)]
  inst_retired.any_p                                
       [Instructions retired (Precise event capable) (Must be precise)]
  inst_retired.prec_dist                            
       [Instructions retired - using Reduced Skid PEBS feature (Must be
        precise)]
  ld_blocks.4k_alias                                
       [Loads blocked because address has 4k partial address false dependence
        (Precise event capable) (Must be precise)]
  ld_blocks.all_block                               
       [Loads blocked (Precise event capable) (Must be precise)]
  ld_blocks.data_unknown                            
       [Loads blocked due to store data not ready (Precise event capable)
        (Must be precise)]
  ld_blocks.store_forward                           
       [Loads blocked due to store forward restriction (Precise event capable)
        (Must be precise)]
  ld_blocks.utlb_miss                               
       [Loads blocked because address in not in the UTLB (Precise event
        capable) (Must be precise)]
  machine_clears.all                                
       [All machine clears]
  machine_clears.disambiguation                     
       [Machine clears due to memory disambiguation]
  machine_clears.fp_assist                          
       [Machine clears due to FP assists]
  machine_clears.page_fault                         
       [Machines clear due to a page fault]
  machine_clears.smc                                
       [Self-Modifying Code detected]
  uops_issued.any                                   
       [Uops issued to the back end per cycle]
  uops_not_delivered.any                            
       [Uops requested but not-delivered to the back-end per cycle]
  uops_retired.any                                  
       [Uops retired (Precise event capable) (Must be precise)]
  uops_retired.fpdiv                                
       [Floating point divide uops retired (Precise Event Capable) (Must be
        precise)]
  uops_retired.idiv                                 
       [Integer divide uops retired (Precise Event Capable) (Must be precise)]
  uops_retired.ms                                   
       [MS uops retired (Precise event capable) (Must be precise)]

virtual memory:
  dtlb_load_misses.walk_completed_1gb               
       [Page walk completed due to a demand load to a 1GB page]
  dtlb_load_misses.walk_completed_2m_4m             
       [Page walk completed due to a demand load to a 2M or 4M page]
  dtlb_load_misses.walk_completed_4k                
       [Page walk completed due to a demand load to a 4K page]
  dtlb_load_misses.walk_pending                     
       [Page walks outstanding due to a demand load every cycle]
  dtlb_store_misses.walk_completed_1gb              
       [Page walk completed due to a demand data store to a 1GB page]
  dtlb_store_misses.walk_completed_2m_4m            
       [Page walk completed due to a demand data store to a 2M or 4M page]
  dtlb_store_misses.walk_completed_4k               
       [Page walk completed due to a demand data store to a 4K page]
  dtlb_store_misses.walk_pending                    
       [Page walks outstanding due to a demand data store every cycle]
  ept.walk_pending                                  
       [Page walks outstanding due to walking the EPT every cycle]
  itlb.miss                                         
       [ITLB misses]
  itlb_misses.walk_completed_1gb                    
       [Page walk completed due to an instruction fetch in a 1GB page]
  itlb_misses.walk_completed_2m_4m                  
       [Page walk completed due to an instruction fetch in a 2M or 4M page]
  itlb_misses.walk_completed_4k                     
       [Page walk completed due to an instruction fetch in a 4K page]
  itlb_misses.walk_pending                          
       [Page walks outstanding due to an instruction fetch every cycle]
  mem_uops_retired.dtlb_miss                        
       [Memory uops retired that missed the DTLB (Precise event capable)
        Supports address when precise (Must be precise)]
  mem_uops_retired.dtlb_miss_loads                  
       [Load uops retired that missed the DTLB (Precise event capable)
        Supports address when precise (Must be precise)]
  mem_uops_retired.dtlb_miss_stores                 
       [Store uops retired that missed the DTLB (Precise event capable)
        Supports address when precise (Must be precise)]
  tlb_flushes.stlb_any                              
       [STLB flushes]
  rNNN                                               [Raw hardware event descriptor]
  cpu/t1=v1[,t2=v2,t3 ...]/modifier                  [Raw hardware event descriptor]
  mem:<addr>[/len][:access]                          [Hardware breakpoint]
  sdt_rtld:init_complete                             [SDT event]
  sdt_rtld:init_start                                [SDT event]
  sdt_rtld:lll_lock_wait                             [SDT event]
  sdt_rtld:lll_lock_wait_private                     [SDT event]
  sdt_rtld:longjmp                                   [SDT event]
  sdt_rtld:longjmp_target                            [SDT event]
  sdt_rtld:map_complete                              [SDT event]
  sdt_rtld:map_start                                 [SDT event]
  sdt_rtld:reloc_complete                            [SDT event]
  sdt_rtld:reloc_start                               [SDT event]
  sdt_rtld:setjmp                                    [SDT event]
  sdt_rtld:unmap_complete                            [SDT event]
  sdt_rtld:unmap_start                               [SDT event]

Metric Groups:
khuey commented 2 years ago

Can you run make check on your machine and post which if any tests fail?

Frederick888 commented 2 years ago

@khuey I wasn't able to run make check on latest master, it somehow hung at (full log nevertheless: rr_check.log)

440: Test command: /usr/bin/bash "source_dir/src/test/basic_test.run" "pid_ns_kill_threads_exit_wait" "" "bin_dir" "120"
440: Test timeout computed to be: 10000000

So instead I ran make check on 5.5.0. Failed tests were:

Full log: rr_check_5_5_0.log

khuey commented 2 years ago

Ok. It's a bit disconcerting that the tests hung, and that those other tests are failing for you, but the performance counters are working in general or you would have hundreds of test failures instead of 7.

khuey commented 2 years ago

Can you send your diff as a PR and I'll merge it?

Frederick888 commented 2 years ago

@khuey Of course. https://github.com/rr-debugger/rr/pull/3099 :)