rr-debugger / rr

Record and Replay Framework
http://rr-project.org/
Other
9.12k stars 583 forks source link

rr record fails for all executables with: check_for_ioc_period_bug() errno: EINVAL] ioctl(PERF_EVENT_IOC_PERIOD) failed #2786

Open rocurley opened 3 years ago

rocurley commented 3 years ago

Example backtrace:

$ rr record make
[FATAL /home/roger/git/rr/src/PerfCounters.cc:232:check_for_ioc_period_bug() errno: EINVAL] ioctl(PERF_EVENT_IOC_PERIOD) failed
=== Start rr backtrace:
rr(_ZN2rr13dump_rr_stackEv+0x5d)[0x558e07bab269]
rr(_ZN2rr15notifying_abortEv+0x57)[0x558e07bab207]
rr(_ZN2rr12FatalOstreamD1Ev+0x34)[0x558e07a03c76]
rr(+0x3cbe1a)[0x558e07a35e1a]
rr(+0x3cd543)[0x558e07a37543]
rr(+0x3cda97)[0x558e07a37a97]
rr(_ZN2rr12PerfCounters23default_ticks_semanticsEv+0x21)[0x558e07a37c39]
rr(_ZN2rr7SessionC2Ev+0x107)[0x558e07b40041]
rr(_ZN2rr13RecordSessionC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS6_SaIS6_EESD_RKNS_20DisableCPUIDFeaturesENS0_16SyscallBufferingEiNS_7BindCPUES8_PKNS_9TraceUuidEbb+0x65)[0x558e07a50e63]
rr(_ZN2rr13RecordSession6createERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EESB_RKNS_20DisableCPUIDFeaturesENS0_16SyscallBufferingEhNS_7BindCPUERKS7_PKNS_9TraceUuidEbbb+0xb40)[0x558e07a5094c]
rr(+0x3d9e31)[0x558e07a43e31]
rr(_ZN2rr13RecordCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x3da)[0x558e07a44bd0]
rr(main+0x21f)[0x558e07bc6f7b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fed4b1450b3]
rr(_start+0x2e)[0x558e0792761e]
=== End rr backtrace
Aborted (core dumped)

I've tried this with the actual program I'm trying to debug, ls, and make.

This happens both on current master (https://github.com/rr-debugger/rr/commit/3f5262f90e63a8ba4d5ed4156b806495830aae2f) and on the version from my package manager (5.3.0-2). I'm using Ubuntu 20.04, CPU is an i7-930. Happy to provide more details, but I'm not sure exactly what information would be helpful here.

khuey commented 3 years ago

Is there output related to the "PMU" in dmesg startup on your system?

An i7-930 is really old. rr theoretically supports Nehalem architecture CPUs but it's possible kernel support for that hardware has regressed or something.

rocurley commented 3 years ago

Here's what I've got for pmu:

$ dmesg | grep -i pmu
[    0.275222] Performance Events: PEBS fmt1+, Nehalem events, 16-deep LBR, Intel PMU driver.
[    0.275703] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
[   55.256868] PAX: PMU arbitration service v1.0.2 has been started.
[   56.379721] socperf3_0: SocPerf Driver: detected 8 CPUs in lwpmudrv_Load
[   56.379775] socperf3_0: PMU check enabled! F6.M1a.S5 index=-1
[   57.386075] sep5_16: [load] [lwpmu_Load@6327]: Major number is 510
[   57.386078] sep5_16: [load] [lwpmu_Load@6334]: Detected 8 total CPUs and 8 active CPUs.
[   57.388719] sep5_16: [load] [lwpmu_Load@6596]: PMU collection driver v5.16.4  has been loaded.
[   57.388722] sep5_16: [load] [lwpmu_Load@6606]: NMI will be used for handling PMU interrupts.
[   57.388726] sep5_16: [load] [PMU_LIST_Initialize@603]: PMU check enabled! F6.M1a.S5 index=-1 drv_type=PUBLIC
[   57.388727] sep5_16: [load] [PMU_LIST_Build_MSR_List@621]: No MSR list information detected!
[   57.388729] sep5_16: [load] [PMU_LIST_Build_PCI_List@650]: No PCI list information detected!
[   57.388731] sep5_16: [load] [PMU_LIST_Build_MMIO_List@687]: No MMIO list information detected!
[   58.416251] vtsspp: PMU: fixed counters: 3, general counters: 4
guillaume-roche commented 3 years ago

Hi,

Got the same issue while trying to run rr on qemu.

 >rr record ./qemu-system-x86_64 <qemu args>
[FATAL /builddir/build/BUILD/rr-5.4.0/src/PerfCounters.cc:232:check_for_ioc_period_bug() errno: EINVAL] ioctl(PERF_EVENT_IOC_PERIOD) failed
=== Start rr backtrace:
rr(_ZN2rr13dump_rr_stackEv+0x5a)[0x556e0f1bfe2a]
rr(_ZN2rr15notifying_abortEv+0x4f)[0x556e0f1bfebf]
rr(+0x1e9549)[0x556e0f213549]
rr(+0xb9733)[0x556e0f0e3733]
rr(_ZN2rr12PerfCounters23default_ticks_semanticsEv+0x1e)[0x556e0f0e479e]
rr(_ZN2rr7SessionC1Ev+0x17a)[0x556e0f17c3da]
rr(_ZN2rr13RecordSessionC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS6_SaIS6_EESD_RKNS_20DisableCPUIDFeaturesENS0_16SyscallBufferingEiNS_7BindCPUES8_PKNS_9TraceUuidEbb+0x65)[0x556e0f0f9f55]
rr(_ZN2rr13RecordSession6createERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EESB_RKNS_20DisableCPUIDFeaturesENS0_16SyscallBufferingEhNS_7BindCPUERKS7_PKNS_9TraceUuidEbbb+0x6ac)[0x556e0f0f776c]
rr(_ZN2rr13RecordCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x938)[0x556e0f0eaf78]
rr(main+0x138)[0x556e0f065b88]
/lib64/libc.so.6(__libc_start_main+0xd5)[0x7fe42ae0fb75]
rr(_start+0x2e)[0x556e0f06877e]
=== End rr backtrace
[1]    61629 IOT instruction (core dumped)  rr record ./qemu-system-x86_64 [...]
>dmesg | grep -i pmu
[    0.350049] Performance Events: PEBS fmt1+, Nehalem events, 16-deep LBR, Intel PMU driver.
[    0.351421] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
khuey commented 3 years ago

Realistically the only way this is going to get fixed is if someone with this hardware figures out what's going on in the kernel here. There are a number of branches in _perf_event_period that could result in returning EINVAL, knowing which one is being taken is the first step. https://elixir.bootlin.com/linux/latest/source/kernel/events/core.c#L5448