rr-debugger / rr

Record and Replay Framework
http://rr-project.org/
Other
9.12k stars 583 forks source link

Intel CPU type 0x706e0 unknown #2610

Closed mdavidsaver closed 4 years ago

mdavidsaver commented 4 years ago

The CPU is an Intel i5-1035G1.

I have RR 5.2.0 at present (from Debian stable). Apologies if this has already been added. It isn't obvious to me which entry would apply. I would expect to see "Ice Lake" in the list.

https://github.com/mozilla/rr/blob/e89f84c77a31ab6474fdd021517b33ff7cb0a256/src/PerfCounters.cc#L123-L143

khuey commented 4 years ago

No, we haven't added Ice Lake yet. It should be fairly straightforward (e.g. https://github.com/mozilla/rr/commit/262e285313fe47c995a73497391f981011975845#diff-fc2123fdd766dab1791fdbcbd769807f, though some of this code moved into src/PerfCounters_x86.h since this changeset). The actual values that go in the pmu_configs table are unlikely to have changed since Comet Lake. Can you test and send a PR?

mdavidsaver commented 4 years ago

2611 takes the referenced commit as a template. However, this gives:

$ ./build/usr/bin/rr record echo true
[FATAL /home/mdavidsaver/source/rr/src/PerfCounters.cc:277:check_working_counters() errno: ENOENT] 
Got 11 branch events, expected at least 500.

The hardware performance counter seems to not be working. Check
that hardware performance counters are working by running
  perf stat -e r5101c4 true
and checking that it reports a nonzero number of events.
If performance counters seem to be working with 'perf', file an
rr issue, otherwise check your hardware/OS/VM configuration. Also
check that other software is not using performance counters on
this CPU.
=== Start rr backtrace:
./build/usr/bin/rr(_ZN2rr13dump_rr_stackEv+0x35)[0x55fb6b497e1b]
./build/usr/bin/rr(_ZN2rr15notifying_abortEv+0x53)[0x55fb6b497de1]
./build/usr/bin/rr(_ZN2rr12FatalOstreamD1Ev+0x30)[0x55fb6b32f8f8]
./build/usr/bin/rr(+0x380aa0)[0x55fb6b35aaa0]
./build/usr/bin/rr(+0x380bae)[0x55fb6b35abae]
./build/usr/bin/rr(+0x381068)[0x55fb6b35b068]
./build/usr/bin/rr(_ZN2rr12PerfCounters23default_ticks_semanticsEv+0xe)[0x55fb6b35b1b2]
./build/usr/bin/rr(_ZN2rr7SessionC2Ev+0xfd)[0x55fb6b440a81]
./build/usr/bin/rr(_ZN2rr13RecordSessionC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS6_SaIS6_EESD_RKNS_20DisableCPUIDFeaturesENS0_16SyscallBufferingEiNS_7BindCPUES8_PKNS_9TraceUuidEbb+0x3b)[0x55fb6b37252d]
./build/usr/bin/rr(_ZN2rr13RecordSession6createERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EESB_RKNS_20DisableCPUIDFeaturesENS0_16SyscallBufferingEhNS_7BindCPUERKS7_PKNS_9TraceUuidEbb+0x99b)[0x55fb6b37218f]
./build/usr/bin/rr(+0x38bbf2)[0x55fb6b365bf2]
./build/usr/bin/rr(_ZN2rr13RecordCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x396)[0x55fb6b36687e]
./build/usr/bin/rr(main+0x20c)[0x55fb6b4b1b1c]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7f2fa3bfa09b]
./build/usr/bin/rr(_start+0x2a)[0x55fb6b2735ea]
=== End rr backtrace
Aborted

Running the requested perf incantation seems to give a non-zero count.

$ perf stat -e r5101c4 true

 Performance counter stats for 'true':

            28,869      r5101c4                                                     

       0.001646102 seconds time elapsed

       0.001410000 seconds user
       0.000000000 seconds sys
khuey commented 4 years ago

Reading the Intel SDM it appears that Ice Lake did change the BR_INST_RETIRED.COND performance counter (shame, that had been stable for some time). Can you try using 5111c4 instead of 5101c4?

khuey commented 4 years ago

This is not actually needed for rr's operation but I don't see any documented value for HW_INTERRUPTS.RECEIVED (which was previously 5301cb) on Ice Lake, which is surprising.

mdavidsaver commented 4 years ago

Ok, that at least seems to work. I can run forward and backward through a simple echo true.

khuey commented 4 years ago

Try make check? :)

mdavidsaver commented 4 years ago

After realizing that make check assumes an in-tree build. I get 2 failures.

...
99% tests passed, 2 tests failed out of 1221

Total Test time (real) = 344.08 sec

The following tests FAILED:
        724 - vsyscall (Failed)
        725 - vsyscall-no-syscallbuf (Failed)
khuey commented 4 years ago

I'm not certain what's going on with those but that's unlikely to be related to your CPU, we'd see a lot more failures if the performance counters weren't working.

mdavidsaver commented 4 years ago

Maybe relevant.

$ uname -a
Linux md-laptop 5.6.0-0.bpo.2-amd64 #1 SMP Debian 5.6.14-2~bpo10+1 (2020-06-09) x86_64 GNU/Linux
khuey commented 4 years ago

Yeah it's possible that test is too dependent on kernel behavior.

khuey commented 4 years ago

Resolved by #2611