Open nikohansen opened 8 years ago
As suggested by @brockho, a single simulated restart would reveal the dispersion of single runs. This is however different from revealing the dispersion of the empirical distribution from the true distribution due to finite size "sampling errors".
We probably do rather not want to bootstrap between different target values or problems, because problems and targets are a given, not a measurement. In other words, when combining two very different problems, the split of the distribution masses will remain always at the exact same place. The horizontal line defining this split will not move.
This is not the case, if we plot different bootstraps of the data themselves instead of simulated restarts. In this case, the success probability varies, while with simulated restarts the "probability" can only be zero or one.
I just realized that we could, in principle, also investigate the dispersion of the ECDFs indirectly with the new functionality of background algorithms:
In [1]: import cocopp
In [2]: cocopp.genericsettings.background_algorithms = 6 * ['BIPOP-CMA-ES_hansen_noiseless.tgz']
In [3]: cocopp.rungenericmany.main(['BIPOP-CMA-ES_hansen_noiseless.tgz'])
[Note that I needed to call rungenericmany
explicitly to invoke plots where the actual background algorithms are displayed]
Unfortunately, the current functionality removes all background algorithms above from the display because the names are exactly the same than the foreground algorithm.
If I copy the data six times to a different location (t/
) and rename the folders, we get what we expect:
In [4]: cocopp.genericsettings.background_algorithms = ['t/BIPOP-CMA-ES-1', 't/BIPOP-CMA-ES-2', 't/
...: BIPOP-CMA-ES-3', 't/BIPOP-CMA-ES-4', 't/BIPOP-CMA-ES-5', 't/BIPOP-CMA-ES-6']
In [5]: cocopp.rungenericmany.main(['BIPOP-CMA-ES_hansen_noiseless.tgz'])
To get the "right" variation, the detEvals
method would need to bootstrap. I implemented def detEvals(self, targets, copy=True, bootstrap=False)
already, but didn't push it yet into the public. The little more annoying problem is the passing around of the bootstrap argument.
If there are not too many algorithms involved, this or displaying the envelope of these graphs should work. For (too) many algorithms, error bars are probably better.
From a discussion at Dagstuhl, the idea came about to use for each problem some lower and upper percentile of the trials and use the resulting lower and upper ECDFs.
Error bars for the ECDF are missing ~and could be computed by varying the number of simulated restarts per function-target pair~. As the natural variation of the ECDF is routinely underestimated, this could help to prevent quite a good amount of unjustified conclusions drawn in the mind of experimenters.
Specifically, the error bars should be vertical where the curve is flat and horizontal where the curve is steep, and placed at randomized positions (to avoid overlap from several graphs) maybe just where the symbols currently are placed.
The appropriate approach is probably to generate
N
ECDF ~which bootstrapnumber-of-instances
~ simulated runtimes based on bootstrappedDataSet.evals
~(non-derandomized)~ and plot some observed percentile; e.g. the [10, 48, 86]-th value fromN=95
or [5, 23, 41]-th value ofN=45
to represent the interdecile range.