rust-lang / rustc-perf

Website for graphing performance of rustc
615 stars 147 forks source link

Panic "could not find header" when running collector on Windows #1032

Open petrochenkov opened 2 years ago

petrochenkov commented 2 years ago

Perf tools (installed from ADK as suggested for Windows 8.1):

C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\xperf.exe
C:\Program Files (x86)\Windows Kits\10\bin\10.0.19041.0\x64\tracelog.exe
Microsoft (R) Windows (R) Performance Analyzer Version 10.0.19041
Microsoft (R) tracelog.exe (10.0.19041.685)

Output from running collector (with some debug logging added): (Collector has to be run from admin console, otherwise you get other errors like "error: rustc-perf-counters: The instance name passed was not recognized as valid by a WMI data provider".)

.\target\x86_64-pc-windows-msvc\release\collector.exe bench_local "C:\Users\we\.cargo\bin\rustc.exe" 127
Benchmarking 127 for triple x86_64-unknown-linux-gnu
45 benchmarks remaining
Preparing await-call-tree
Running await-call-tree: Check + [Full, IncrFull, IncrUnchanged, IncrPatched]
[1/2]    100.0%
[2/2]    100.0%
[collector\src\] & = "P-Start"
[collector\src\] & = "P-Start"
[collector\src\] & = "P-End"
[collector\src\] & = "P-Start"
[collector\src\] & = "P-End"
[collector\src\] & = "CSwitch"
thread 'main' panicked at 'could not find header: Pmc', collector\src\
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
petrochenkov commented 2 years ago

profile_local runs successfully though.

.\target\x86_64-pc-windows-msvc\release\collector.exe profile_local self-profile "C:\Users\we\.cargo\bin\rustc.exe" 127 --include await-call-tree
Profiling 127 with SelfProfile
1 benchmark remaining
Preparing await-call-tree
Running await-call-tree: Check + [Full, IncrFull, IncrUnchanged, IncrPatched]
Running await-call-tree: Debug + [Full, IncrFull, IncrUnchanged, IncrPatched]
Running await-call-tree: Opt + [Full, IncrFull, IncrUnchanged, IncrPatched]
Mark-Simulacrum commented 2 years ago

cc @wesleywiser

wesleywiser commented 2 years ago

If the file is missing the PMC event, then that usually means hardware performance counters aren't supported for some reason. It's possible WPA on Win 8.1 just doesn't support that but I think we'd be getting a different error if that was the case. The other possibility is that WPA doesn't have support for capturing the counters on this CPU at the HAL level.

@petrochenkov is this by any chance an AMD CPU?

petrochenkov commented 2 years ago

@wesleywiser No, the CPU is Intel.

Details ``` Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz Intel64 Family 6 Model 60 Stepping 3, GenuineIntel Microcode signature: 0000001A HTT * Hyperthreading enabled CET - Supports Control Flow Enforcement Technology Kernel CET - Kernel-mode CET Enabled User CET - User-mode CET Allowed HYPERVISOR - Hypervisor is present VMX * Supports Intel hardware-assisted virtualization SVM - Supports AMD hardware-assisted virtualization X64 * Supports 64-bit mode SMX - Supports Intel trusted execution SKINIT - Supports AMD SKINIT SGX - Supports Intel SGX NX * Supports no-execute page protection SMEP * Supports Supervisor Mode Execution Prevention SMAP - Supports Supervisor Mode Access Prevention PAGE1GB * Supports 1 GB large pages PAE * Supports > 32-bit physical addresses PAT * Supports Page Attribute Table PSE * Supports 4 MB pages PSE36 * Supports > 32-bit address 4 MB pages PGE * Supports global bit in page tables SS * Supports bus snooping for cache operations VME * Supports Virtual-8086 mode RDWRFSGSBASE * Supports direct GS/FS base access FPU * Implements i387 floating point instructions MMX * Supports MMX instruction set MMXEXT - Implements AMD MMX extensions 3DNOW - Supports 3DNow! instructions 3DNOWEXT - Supports 3DNow! extension instructions SSE * Supports Streaming SIMD Extensions SSE2 * Supports Streaming SIMD Extensions 2 SSE3 * Supports Streaming SIMD Extensions 3 SSSE3 * Supports Supplemental SIMD Extensions 3 SSE4a - Supports Streaming SIMDR Extensions 4a SSE4.1 * Supports Streaming SIMD Extensions 4.1 SSE4.2 * Supports Streaming SIMD Extensions 4.2 AES * Supports AES extensions AVX * Supports AVX instruction extensions AVX2 * Supports AVX2 instruction extensions AVX-512-F - Supports AVX-512 Foundation instructions AVX-512-DQ - Supports AVX-512 double and quadword instructions AVX-512-IFAMA - Supports AVX-512 integer Fused multiply-add instructions AVX-512-PF - Supports AVX-512 prefetch instructions AVX-512-ER - Supports AVX-512 exponential and reciprocal instructions AVX-512-CD - Supports AVX-512 conflict detection instructions AVX-512-BW - Supports AVX-512 byte and word instructions AVX-512-VL - Supports AVX-512 vector length instructions FMA * Supports FMA extensions using YMM state MSR * Implements RDMSR/WRMSR instructions MTRR * Supports Memory Type Range Registers XSAVE * Supports XSAVE/XRSTOR instructions OSXSAVE * Supports XSETBV/XGETBV instructions RDRAND * Supports RDRAND instruction RDSEED - Supports RDSEED instruction CMOV * Supports CMOVcc instruction CLFSH * Supports CLFLUSH instruction CX8 * Supports compare and exchange 8-byte instructions CX16 * Supports CMPXCHG16B instruction BMI1 * Supports bit manipulation extensions 1 BMI2 * Supports bit manipulation extensions 2 ADX - Supports ADCX/ADOX instructions DCA - Supports prefetch from memory-mapped device F16C * Supports half-precision instruction FXSR * Supports FXSAVE/FXSTOR instructions FFXSR - Supports optimized FXSAVE/FSRSTOR instruction MONITOR * Supports MONITOR and MWAIT instructions MOVBE * Supports MOVBE instruction ERMSB * Supports Enhanced REP MOVSB/STOSB PCLMULDQ * Supports PCLMULDQ instruction POPCNT * Supports POPCNT instruction LZCNT * Supports LZCNT instruction SEP * Supports fast system call instructions LAHF-SAHF * Supports LAHF/SAHF instructions in 64-bit mode HLE * Supports Hardware Lock Elision instructions RTM * Supports Restricted Transactional Memory instructions DE * Supports I/O breakpoints including CR4.DE DTES64 * Can write history of 64-bit branch addresses DS * Implements memory-resident debug buffer DS-CPL * Supports Debug Store feature with CPL PCID * Supports PCIDs and settable CR4.PCIDE INVPCID * Supports INVPCID instruction PDCM * Supports Performance Capabilities MSR RDTSCP * Supports RDTSCP instruction TSC * Supports RDTSC instruction TSC-DEADLINE * Local APIC supports one-shot deadline timer TSC-INVARIANT * TSC runs at constant rate xTPR * Supports disabling task priority messages EIST * Supports Enhanced Intel Speedstep ACPI * Implements MSR for power management TM * Implements thermal monitor circuitry TM2 * Implements Thermal Monitor 2 control APIC * Implements software-accessible local APIC x2APIC * Supports x2APIC CNXT-ID - L1 data cache mode adaptive or BIOS MCE * Supports Machine Check, INT18 and CR4.MCE MCA * Implements Machine Check Architecture PBE * Supports use of FERR#/PBE# pin PSN - Implements 96-bit processor serial number PREFETCHW * Supports PREFETCHW instruction Maximum implemented CPUID leaves: 0000000D (Basic), 80000008 (Extended). Maximum implemented address width: 48 bits (virtual), 39 bits (physical). Processor signature: 000306C3 Logical to Physical Processor Map: **------ Physical Processor 0 (Hyperthreaded) --**---- Physical Processor 1 (Hyperthreaded) ----**-- Physical Processor 2 (Hyperthreaded) ------** Physical Processor 3 (Hyperthreaded) Logical Processor to Socket Map: ******** Socket 0 Logical Processor to NUMA Node Map: ******** NUMA Node 0 No NUMA nodes. Logical Processor to Cache Map: **------ Data Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64 **------ Instruction Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64 **------ Unified Cache 0, Level 2, 256 KB, Assoc 8, LineSize 64 ******** Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64 --**---- Data Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64 --**---- Instruction Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64 --**---- Unified Cache 2, Level 2, 256 KB, Assoc 8, LineSize 64 ----**-- Data Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64 ----**-- Instruction Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64 ----**-- Unified Cache 3, Level 2, 256 KB, Assoc 8, LineSize 64 ------** Data Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64 ------** Instruction Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64 ------** Unified Cache 4, Level 2, 256 KB, Assoc 8, LineSize 64 Logical Processor to Group Map: ******** Group 0 ```