Distribution of solutions for suite regression test

nikohansen commented 5 years ago

The solutions are generated by this code

def solution_array(dimension, number=10):
    """return an array of `dimension`-dimensional search points"""
    return (0.1 * (np.random.randn(number) / (np.abs(np.random.randn(number)) + 1e-6)) *
        np.random.randn(number, dimension).T).T

in regression-test/create/create_suite_data.py. It samples a normally distributed vector multiplied by a single small but heavy-ish tail number (randn/randn is a Cauchy distribution). It produces about 5% of values not in [-1, 1] and 1% of values not in [-5, 5] (these are correlated, that is, the number of infeasible solutions does not dramatically increase with dimension: 2% are outside of [-5,5]^5 and 4% are outside of [-5,5]^500), and 1% of the vectors are longer than 5 x sqrt(dimension).

Replacing 0.1 with 1 would increase the above small probabilities by about a factor of ten. Is changing this desired and/or necessary?

nikohansen commented 5 years ago

One aspect is that an upcoming mixed-integer suite likes to have (much?) larger numbers to play with.

ttusar commented 5 years ago

For the mixed-integer case one would ideally take into account the ROI, which is index-dependent (unlike that of the other suites).

ttusar commented 5 years ago

To the question whether this is necessary - probably not.

nikohansen commented 5 years ago

The following works to have about (100 - prcnt)% in [l, u]::

l + (u - l) * (0.5 + 0.1 * prcnt * solution_array(dimension, number))

Instead of

for x in solution_array(f.dimension):

for y in solution_array(f.dimension):
    # accept 5% "out of bounds" per coordinate
    x = f.lower_bounds + (f.upper_bounds - f.lower_bounds) * (0.5 + 0.1 * 5 * y)

brockho commented 10 months ago

In the recent (re-)implementation of the bbob-noisy suite by @FMGS666, we had the case that the solutions in the test were never outside [-5,5]^n (because they were created differently in an independent code from the old implementation) and thus, we did not detect a major bug at first. So I would say it does matter to sample outside the region of interest. But if we have 2% of all samples outside, we should always see enough examples which are actually outside - just because we typically test on so many functions, instances, and dimensions.

numbbo / coco

Distribution of solutions for suite regression test #1850