numbbo / coco

Numerical Black-Box Optimization Benchmarking Framework
https://numbbo.github.io/coco
Other
261 stars 86 forks source link

error bars for cumulative distribution functions #1019

Open nikohansen opened 8 years ago

nikohansen commented 8 years ago

Error bars for the ECDF are missing ~and could be computed by varying the number of simulated restarts per function-target pair~. As the natural variation of the ECDF is routinely underestimated, this could help to prevent quite a good amount of unjustified conclusions drawn in the mind of experimenters.

Specifically, the error bars should be vertical where the curve is flat and horizontal where the curve is steep, and placed at randomized positions (to avoid overlap from several graphs) maybe just where the symbols currently are placed.

The appropriate approach is probably to generate N ECDF ~which bootstrap number-of-instances~ simulated runtimes based on bootstrapped DataSet.evals ~(non-derandomized)~ and plot some observed percentile; e.g. the [10, 48, 86]-th value from N=95 or [5, 23, 41]-th value of N=45 to represent the interdecile range.

nikohansen commented 7 years ago

As suggested by @brockho, a single simulated restart would reveal the dispersion of single runs. This is however different from revealing the dispersion of the empirical distribution from the true distribution due to finite size "sampling errors".

nikohansen commented 7 years ago

We probably do rather not want to bootstrap between different target values or problems, because problems and targets are a given, not a measurement. In other words, when combining two very different problems, the split of the distribution masses will remain always at the exact same place. The horizontal line defining this split will not move.

screen shot 2017-06-17 at 19 32 35

This is not the case, if we plot different bootstraps of the data themselves instead of simulated restarts. In this case, the success probability varies, while with simulated restarts the "probability" can only be zero or one.

brockho commented 7 years ago

I just realized that we could, in principle, also investigate the dispersion of the ECDFs indirectly with the new functionality of background algorithms:

In [1]: import cocopp
In [2]: cocopp.genericsettings.background_algorithms = 6 * ['BIPOP-CMA-ES_hansen_noiseless.tgz']
In [3]: cocopp.rungenericmany.main(['BIPOP-CMA-ES_hansen_noiseless.tgz'])

[Note that I needed to call rungenericmany explicitly to invoke plots where the actual background algorithms are displayed]

Unfortunately, the current functionality removes all background algorithms above from the display because the names are exactly the same than the foreground algorithm.

If I copy the data six times to a different location (t/) and rename the folders, we get what we expect:

In [4]: cocopp.genericsettings.background_algorithms = ['t/BIPOP-CMA-ES-1', 't/BIPOP-CMA-ES-2', 't/
   ...: BIPOP-CMA-ES-3', 't/BIPOP-CMA-ES-4', 't/BIPOP-CMA-ES-5', 't/BIPOP-CMA-ES-6']
In [5]: cocopp.rungenericmany.main(['BIPOP-CMA-ES_hansen_noiseless.tgz'])

image image image

nikohansen commented 7 years ago

To get the "right" variation, the detEvals method would need to bootstrap. I implemented def detEvals(self, targets, copy=True, bootstrap=False) already, but didn't push it yet into the public. The little more annoying problem is the passing around of the bootstrap argument.

If there are not too many algorithms involved, this or displaying the envelope of these graphs should work. For (too) many algorithms, error bars are probably better.

nikohansen commented 1 year ago

From a discussion at Dagstuhl, the idea came about to use for each problem some lower and upper percentile of the trials and use the resulting lower and upper ECDFs.