Support for Machine-specific configuration and denoise configuration

smarr commented 3 months ago

With the latest machines coming into our benchmarking infrastructure being hybrid processors with efficiency and performance cores, aka big.LITTLE architectures, it becomes more desirable to configure on which cores benchmarks are executing.

At the moment, denoise uses cset to enable shielding for a rather large number of cores based on a simple heuristic of leaving some room for the system.

def _shield_lower_bound(num_cores):
    return int(floor(log(num_cores)))

def _shield_upper_bound(num_cores):
    return num_cores - 1

def _activate_shielding(num_cores):
    min_cores = _shield_lower_bound(num_cores)
    max_cores = _shield_upper_bound(num_cores)
    core_spec = "%d-%d" % (min_cores, max_cores)

With hybrid architectures, but also multi-socket systems, it is desirable to be more proactive to decide what's executed where. One may even want to compare the different types of cores.

So, I think some kind of automatic approach of defining the cores is not sufficient. Instead, it would be good to be able to configure the settings explicitly.

denoise currently knows the following configuration parameters:

use_nice
use_shielding
for_profiling

And ReBench's configuration system has the following priority list of configurations:

benchmark
benchmark suites
executor
experiment
experiments
runs (as defined by the root element)

Here 1. overrides all other configurations, and 6. has the least priority.

Since https://github.com/smarr/ReBench/pull/170, we also have the option to mark invocations, iteration, and warmup settings as "important" with the !. Though, that's not yet documented...

So, at this point, I am thinking of adding a new lowest level of priority to the list: machine. Then we have the priority list:

benchmark
benchmark suites
executor
experiment
experiments
runs
machine

https://github.com/smarr/ReBench/pull/161 already introduced the notion of a machine to be able to filter by it so that we can run benchmarks easily on specific machines.

With a new type of setting for denoise in the configuration as part of the run details (rebench-schema.yml), we could then do something like:

runs:
  denoise:
    shield: 1-5 (with the cset syntax)

As well as:

machines:
  yuria1:
    denoise:
      shield: 7, 8, 9
    invocations: 4
  yuria2:
    denoise: 1-3,40-50

Of course, this opens the possibility to also do:

benchmark_suites:
  ExampleSuite:
    invocations: 3
    denoise:
      nice: false

So, we may need to frequently change the denoise settings. Though, because of https://github.com/smarr/ReBench/issues/249, we should rework how denoise settings are applied anyway. This should also consider https://github.com/smarr/ReBench/issues/168.

OctaveLarose commented 3 months ago

That sounds sound to me. Just make the default all cores and denoise (like it currently is, correct?). You probably also want to add a warning to Rebench if using an architecture that might not play nicely with the current default settings?

smarr commented 2 months ago

One of the issues I am not yet quite sure about is that we already have a notion of machine or rather machines.

Introduced with https://github.com/smarr/ReBench/pull/161 and in the schema here: https://github.com/smarr/ReBench/blob/master/rebench/rebench-schema.yml#L140-L145

Having two independent notions of machine seems like a great source for confusion.

So, I am currently thinking I would want to keep these two features separate. The machine feature that was introduced by #161 is really only used to filter the set of experiments based on the "tag". It's very convenient to split a configuration into something that can be ran on multiple machines.

However, it does not really relate to the machine itself.

Of course another option would be to combine the two notions of machine. For the machine-based configuration, I am thinking we may want a rebench -m yuria1 command-line option so that rebench activates the configuration of the selected machine for the execution.

This then would also be possible to filter the benchmarks at the same time.

By keeping things separated, we'd have more flexibility. On the one hand, one would not need to "tag" benchmarks for a specific machine, and could simply run the same set on multiple machines, with the corresponding configuration applied.

On the other hand, we don't really have any use case for a separate tagging mechanism. While it could be useful, for instance, to tag fast or slow benchmarks, or things like latency vs throughput, even when they are in the same benchmark suite, we have not really needed it so far.

smarr / ReBench

Support for Machine-specific configuration and denoise configuration #257