Configuration post-processing feature

qnbhd commented 2 years ago

OpenTuner finds good configurations, but they include many parameters and some of them may be redundant. It would be good to know which parameters in final configuration are crucial and which do not have much impact on the objective function.

There are 2 potential users:

Developers optimizing SW will be able to identify the most important parameters, and eliminate inefficient and unstable parameters. This helps to reduce number of parameters and thus length of command-line, dependency on SW features (e.g. less internal compiler optimization flags), potentially making optimal configuration more stable and more widely applicable (e.g. less internal flags, more compiler versions supported).
Compiler developers evaluate impact of specific compiler optimizations and improve compiler source code if behaviour is unexpected

Example:

configuration = {a=1, b=2, c=3, d=4, e=0}, time=10.0, id=1

The current proposal is to include an OpenTuner technique which performs various measurements of subset of configuration and tracks regressions/improvements, effectively finding the best and the shortest configuration.

{a=1, b=2, c=3, d=4}, id=2, time=10.0,  # -e from orig
{a=1, b=2, c=3, e=0}, id=5, time=11.5,  # -d from orig
...
{a=1, b=2, d=4}, id=9, time=9 # -e from id=2
{a=1, d=4}, id=10, time=11
{a=1, b=2}, id=11, time=12
...
{a=1}, id=12, time=13

Report

Post-processing detects and removes negligible parameters from original configuration, and score may even improve. First output shows new configuration with best result (better or within threshold = coeff of variation).

Original cfg: {a=1, b=2, c=3, d=4, e=0}, time=10.0
New best cfg: {a=1, b=2, d=4}, time=9.0

Next table shows impact of parameters. First row shows new best configuration. Each next rows shows result after removal of next parameter with smallest (precise or estimated) penalty.	Id	Score	Penalty	Best	Reduced option(s)
*7	9	n/a	0.0%
8	11	22.2%	22.2%	b	22.2%
11	13	18.2%	44.4%	d	33.3%
13	20	53.8%	122.2%	a	44.4%

Columns:

Id - configuration id, asterisk highlights the best config
Score - result (objective function outcome)
Penalty - impact of parameter (relation between current score and prev score)
To Best - relation between current score and the best score
Parameter - parameter which is dropped from previous configuration (row above)
Penalty on Best - penalty if parameter removed from the best configuration, useful as initial quick estimate of impact

qnbhd commented 2 years ago

Hello, @jansel ! What do you think about this feature?

jansel commented 2 years ago

I like this idea. It sounds like a more general version of the flag_importance / flag_histogram functions in the gccflags example. That was used to generate Figure 7 in this paper. If dropping flags frequently finding better configs (I did not find this was the case, but your search space might be different) -- perhaps you should add a search technique that drops flags randomly. I think report functionality would also be useful.

Feel free to submit a PR if you want to try adding this feature.

qnbhd commented 2 years ago

@jansel We have a question on your histogram implementation and random dropping of flags. What about GA techniques? In the process of tuning with GA techniques weak parameters will be excluded, random dropping of flags should not give good results, right? We propose to use another technique that has proved useful in our past experiments. Here's the brief description:

Linear stage

Let's calculate the reduction cost (penalty) by the metric when disabling each parameter separately, this will give us the order of exclusion in the next step.

Sort by reduction cost in descending order, and start disabling parameters from the original configuration in that order until until the result regression becomes large enough (we will adjust max-regression-threshold). After this step, a configuration with sufficiently important parameters will be formed, we will use it for the next step.

Quadratic stage

Let's take the configuration from the previous step, generate sub-configurations (throw out one parameter from the current one) and evaluate the result. Let's choose the best metric result and make it the current configuration. Repeat this algorithm until we reach the empty configuration. This step allows you to consider the importance of the parameters order.

jansel commented 2 years ago

GA-like techniques with more random mutation could work here too. For example something similar to debug_gcc_error, which keeps randomly dropping 50% of the flags (keeping the config if it still causes the error) until there are less than 8 flags, then it does a linear search similar to what you originally described.

This has the property that the search time is O(lg(n)) rather than O(n), so it works best for search spaces with a huge number of flags where most don't matter. In general, if you have time for a more exhaustive search that will do better -- but it doesn't scale to big search spaces.

You could imagine running something like a simulated annealing, where you start by trying to drop a random 50% of flags for some time, then 40%, 30%, 20%, etc (according to some "cooling" schedule). You could tune the population size, your accept/reject criteria of a new configuration, and the cooling schedule for how aggressively you drop flags.

Your Linear/Quadratic idea seems reasonable to try. In my opinion there is no wrong search technique. If the search technique works well for your problem than you should use it -- though try out many ones to explore what is best in your case.

jansel / opentuner