Extended test suites - Githubissues

numbbo / coco

Numerical Black-Box Optimization Benchmarking Framework

https://numbbo.github.io/coco

Other

260 stars 86 forks source link

Extended test suites #1118

Open nikohansen opened 8 years ago

nikohansen commented 8 years ago

We should consider to proposed extended versions of the bbob and bbob-biobj test suites.

Proposal for bbob-biobj: we add all within-group combinations of bbob functions which are not already in bbob-biobj and which do not combine a function with itself. This will add 4*(4+3+2+1-1) + 3+2+1-1 = 4*9+5=41 functions.

Rationale: this will add more diversity w.r.t. the used single objective functions (as suggested by @loshchil) and more functions where both objectives are from a similar "function domain" (as suggested by @Ulfgard).

brockho commented 8 years ago

I agree with the proposal as discussed already at GECCO-2016 and in particular on the number of added functions.

loshchil commented 8 years ago

Does it mean that the data from this year will be abandoned?

On Sun, Aug 7, 2016 at 9:05 PM, Dimo Brockhoff notifications@github.com wrote:

I agree with the proposal as discussed already at GECCO-2016 and in particular on the number of added functions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/numbbo/coco/issues/1118#issuecomment-238101591, or mute the thread https://github.com/notifications/unsubscribe-auth/ADoe-3mRn8l5gz7nD_W7MKly6ORCc2ZEks5qdixcgaJpZM4JVO5Z .

nikohansen commented 8 years ago

Does it mean that the data from this year will be abandoned?

No, of course not.

brockho commented 7 years ago

I started to implement the extended bbob-biobj suite (called bbob-biobj-ext for now) based on the feature-refalg branch. The C code is running (tests run through in the CI) and I also tested already that we can keep the first 10 instances (i.e. the checks, that the instances produce an actual 2-objective problem, run through). Missing are still the following points:

the regression test of the new suite (in code-experiments/test/regression-test/)
the adaptation of the example_experiments, in particular in all other languages than C
meaningful hypervolume reference values for the new functions f56-f96 (because we have to run experiments first of course)
the adaptation of the postprocessing to the new suite, including new LaTeX templates

brockho commented 7 years ago

I have been again a bit too optimistic wrt the CI tests :-( : on one Windows machine and the Mac, the tests do not yet run through:

[...]
..f65..........f66..........f67..........f68..........f69..........f70..........f71......
....f72..........f73..........f74..........f75..........f76..........f77..........f78....
......f79.Assertion failed: ((nadir[i] - ideal[i]) > mo_discretization), function 
mo_normalize, file code-experiments/src/mo_utilities.c, line 54.

Build step 'Execute shell' marked build as failure

brockho commented 7 years ago

Remark: a closer look revealed that the second objective of f79 is the Weierstrass function for which seemingly no nadir_value is available---probably because the optimum is not unique (the nadir_value is actually nan for this function, not clear why it did not crash on my computer already before).

brockho commented 7 years ago

Update: the nan did not cause an error on my computer, because it is not defined outside COCO and thus defined through COCO as 8.888800e+088. After discussing with @nikohansen, we decided to simply take out the Weierstrass function out of the bbob-biobjextsuite since it'sbest_parametervector is not unique and thus, a nadir point not easily computable within COCO. The updatedbbob-biobj-ext` suite with 92 functions is now available in my fork and the tests run through on all CI slaves.

brockho commented 7 years ago

In order to keep the repository small, we need to put the (new and old) files of the regression test somewhere else (the new bbob-biobj-ext files are about 60MB large). The discussion of Jan 12, 2017 suggested that we keep them in the coco GForge svn repository and in addition provide them like any other data file on the GForge server from where they are downloaded if the test itself does not see them locally.

brockho commented 6 years ago

Updated list of things to do (possibly done already at the time of writing, to be checked):

[x] update the regression test of the new suite (in code-experiments/test/regression-test/)
[ ] adapt the example_experiments, in particular in all other languages than C
[ ] update hypervolume reference values for the new functions f56-f96 (because we have to run experiments first of course)
- [ ] the latter includes hv reference values to be updated also for the bbob-biobj suite (e.g. with the data from the 2017 algorithm of Simon)
- [ ] this also means, in turn, that the 16 available data sets have to be updated
- [ ] Addendum from COCO sprint on 10/24/2023: we might want to only update the hv values for all new functions/instances and the 40-D instances for the old bbob-biobj suite - like that, we need to touch less algorithm data sets when updating
[ ] the adaptation of the postprocessing to the new suite, including new LaTeX templates
- [ ] this includes a test for compatibility of the chosen data sets: the postprocessing should bail out when data sets with incompatible hv reference values are detected (hv reference values are written right now into each .dat file)
[x] add plots for the new functions to the documentation
- [x] update the documentation that is online/on arXiv

See also #1873.

brockho commented 6 years ago

Add-on: we should implement the functionality that the order of the objective functions is changed pseudo-randomly with the instance number. The reason is that in the original bbob-biobj suite, the first objective function never has a larger function index than the second objective function, resulting in the first objective being easier than the second one in more cases than the opposite.

brockho commented 4 years ago

I guess the closing of this issue was a mistake. Most issues above are still not addressed.