numbbo / coco

Numerical Black-Box Optimization Benchmarking Framework
https://numbbo.github.io/coco
Other
263 stars 91 forks source link

[Bug report] logged data are incompatible with postprocessing #2282

Closed nikohansen closed 4 months ago

nikohansen commented 6 months ago

When benchmarking the same suite several times like

### prepare
suite = cocoex.Suite(suite_name, "", "")  # see https://numbbo.github.io/coco-doc/C/#suite-parameters
observer = cocoex.Observer(suite_name, "result_folder: " + output_folder)
minimal_print = cocoex.utilities.MiniPrint()

### go
for sweep in range(2):
    for problem in suite:  # this loop will take 2-3 minutes x budget_multiplier
        problem.observe_with(observer)  # generates the data for cocopp
        problem(problem.dimension * [0])  # improve comparability to existing data
        xopt = fmin(problem, problem.initial_solution_proposal(), disp=False)
        problem(xopt)  # make sure the returned solution is evaluated
        minimal_print(problem, final=problem.index == len(suite) - 1)

one of the info files looks like this

suite = 'bbob', funcId = 7, DIM = 2, Precision = 1.000e-08, algId = 'ALG', coco_version = '2.6.4-dev207+ge83c48fe9d', logger = 'bbob', data_format = 'bbob-new2'
%
data_f7/bbobexp_f7_DIM2.dat, 1:12|7.4e+00, 2:45|5.4e+00, 3:46|1.2e+03, 4:48|2.8e+01, 5:44|2.6e+02
suite = 'bbob', funcId = 7, DIM = 3, Precision = 1.000e-08, algId = 'ALG', coco_version = '2.6.4-dev207+ge83c48fe9d', logger = 'bbob', data_format = 'bbob-new2'
%
data_f7/bbobexp_f7_DIM3.dat, 1:15|1.3e+01, 2:50|1.9e+02, 3:58|3.5e+02, 4:60|8.3e+01, 5:60|1.1e+02
suite = 'bbob', funcId = 7, DIM = 5, Precision = 1.000e-08, algId = 'ALG', coco_version = '2.6.4-dev207+ge83c48fe9d', logger = 'bbob', data_format = 'bbob-new2'
%
data_f7/bbobexp_f7_DIM5.dat, 1:21|1.3e+01, 2:86|3.8e+02, 3:89|7.3e+02, 4:90|1.6e+02, 5:84|3.3e+01
suite = 'bbob', funcId = 7, DIM = 10, Precision = 1.000e-08, algId = 'ALG', coco_version = '2.6.4-dev207+ge83c48fe9d', logger = 'bbob', data_format = 'bbob-new2'
%
data_f7/bbobexp_f7_DIM10.dat, 1:36|7.4e+02, 2:157|3.8e+02, 3:148|2.5e+02, 4:154|2.7e+02, 5:193|3.6e+02
suite = 'bbob', funcId = 7, DIM = 2, Precision = 1.000e-08, algId = 'ALG', coco_version = '2.6.4-dev207+ge83c48fe9d', logger = 'bbob', data_format = 'bbob-new2'
%
data_f7/bbobexp_f7_DIM2.dat, 1:44|7.4e+00, 2:44|5.4e+00, 3:48|1.2e+03, 4:40|2.2e+02, 5:62|4.9e-01
[...]

The problem now is that the last entry invokes the reading-in of the very same five recorded data as the first entry because it points to the very same file.

It looks like the observer should never write to the same data file for any two different entry lines of any .info file, because it is virtually impossible to detangle the information after this is done, in particular as different .info files could point to the same data file when its name is only and fully determined via the meta data. Additionally, the processing of .info line entries should preferably not depend on their order.

Asking the user to create each time a new observer with a unique result folder name seems error prone and putting the burden on the wrong place, hence it does not look like a viable option to me.

In other words, each .info file line entry must create a new/unique set of *dat files. IIRC, the *info and *dat filename convention were only meant for simpler debugging and never meant to guarantee the suggested semantics or meta data. All meta data are provided in the *.info files. To prevent the abuse of the *dat filename as meta information, it may be preferable to use entirely random names.

nikohansen commented 4 months ago

Should be fixed now.