google / benchmark

A microbenchmark support library
Apache License 2.0
8.8k stars 1.6k forks source link

[BUG] Core Affinity doesn't seem to work (Or Reporter wrong?) #1812

Open FabianSchuetze opened 3 weeks ago

FabianSchuetze commented 3 weeks ago

Describe the bug I would like to run the benchmark on a particular CPU core. The [docs]( says:

2. Set the benchmark program's task affinity to a fixed cpu.  For example:
   ```sh
   taskset -c 0 ./mybenchmark

However, when I run the basic_test app, I see the following:

build git:(main) taskset -c 0 ./test/basic_test
2024-07-11T11:29:31+02:00
Running ./test/basic_test
Run on (24 X 4700 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x12)
  L1 Instruction 32 KiB (x12)
  L2 Unified 1280 KiB (x12)
  L3 Unified 25600 KiB (x1)
Load Average: 5.07, 4.92, 2.97
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
BM_empty                                            0.327 ns        0.325 ns   2148747493
BM_empty/threads:24                                 0.296 ns        0.328 ns   2130524928
...

I think it might be reporting here that is wrong. Looking at top, I can verify that only core 0 is used. The code in reporter[https://github.com/google/benchmark/blob/main/src/reporter.cc#L49C29-L49C37) seems to use a static number of cores.

System Which OS, compiler, and compiler version are you using:

To reproduce Steps to reproduce the behavior:

  1. sync to commit : ea71a14891474943fc1f34d359f9e0e82476ffe1
  2. cmake: cmake -D BENCHMARK_DOWNLOAD_DEPENDENCIES=1 -S . -B build
  3. make: cmake --build build/ -j20
  4. taskset -c 0 ./build/test/basic_test

Expected behavior I would expect the test to run only on core 0 and the output of the test be: Run on (1 X 4700 MHz CPU s) instead of Run on (24 X 4700 MHz CPU s)

LebedevRI commented 3 weeks ago

Doesn't core affinity only ensure that the main thread stays on the same CPU, not that the main thread is unable to start new threads?

https://man7.org/linux/man-pages/man1/taskset.1.html

       The taskset command is used to set or retrieve the CPU affinity
       of a running process given its pid, or to launch a new command
       with a given CPU affinity. CPU affinity is a scheduler property
       that "bonds" a process to a given set of CPUs on the system. The
       Linux scheduler will honor the given CPU affinity and the process
       will not run on any other CPUs.
FabianSchuetze commented 3 weeks ago

I agree.

What's the meaning of Run on (24 X 4700 MHz CPU s) then?

That does not indicate any threading decisions by benchmark instead it enumerates the number of CPU cores of the test system? Equivalently, if taskset is not used, Run on (24 X 4700 MHz CPU s) also don't indicate that the benchmark run parallel on different cores.

LebedevRI commented 3 weeks ago

I think so, yes. https://github.com/google/benchmark/blob/main/src/sysinfo.cc seems to temporarily unset core affinity to read CPU frequencies, but i don't think it ever reports what the actual affinity is. But again, i'm not sure what happens for all the extra threading that may happen (either libbenchmark-induced, or in the snippet-under-measurement).

dmah42 commented 3 weeks ago

we could fix the PrintBasicContext section to report on the number of CPUs used if we got the current affinity somewhere i guess?

FabianSchuetze commented 3 weeks ago

That would be wonderful, but I wonder if that solves only one-half of the issue?

If no affinity is set (0xFFFFFFF is returned), should the benchmark report that it runs on all cores? I think the benchmark is not scheduled to run in parallel on different cores, or?

However, particularly when the system is a hybrid architecture and consists of "performance" and "efficient" cores, reporting the affinity is useful, I think.