test_tSP_opt_nograd is flaky when all seed-setting code is removed

Introduction

test_tSP_opt_nograd in tests/test_surrogates.py seems to be flaky when all seed-setting code (e.g. np.random.seed(0) or tf.random.set_seed(0)) is commented out.

In commit 295a7b17b0358ee73ad554e487961635986fdadc, test_tSP_opt_nograd failed ~5% of the time (out of 500 runs) compared to 0% of the time (out of 500 runs) when no seed-setting code is removed.

Motivation

Some tests can be flaky with high failure rates, but are not discovered when the seeds are set. We are trying to stabilize such tests.

Environment

The tests were run using pytest 6.2.4 in a conda environment with Python 3.6.13. The OS used was Ubuntu 16.04.

Possible Solution and Discussion

A possible solution is to change the values used in the assertions. Increasing the upper bound of the assertion on line 69 (i.e. the assertion that checks the value of params['l']) from 0.5 to 0.6 and the upper bound of the assertion on line 70 (i.e. the assertion that checks the value of params['sigmaf']) from 0.6 to 0.7 reduced flakiness to ~3%.

Increasing the upper bounds of the assertions did not increase runtimes significantly.

Several of the failures that occurred seemed to be due to params['l'] having a value of 0.0001, which may indicate a bug. Additionally, params['sigmaf'] had an abnormally high value of ~0.94 on one of the runs, which may also indicate a bug.

Please let us know if these solutions are feasible or if there are any other solutions that should be incorporated. We will be happy to raise a Pull Request to fix the tests and incorporate any feedback that you may have.

Dear @melonwater211,

Many thanks for noticing this.

As far as I understand from your comments, this only happens if the seed state is removed. It is set precisely to avoid optimization instability due to the initial starting point of L-BFGS-B. Currently this surrogate does not support gradient-based optimization and therefore the gradient is approximated via finite differences. If you're interested in providing a PR, I would suggest that you first start investigating whether this call can be made more stable by playing around with the eps parameter.

However, is there a specific case where you are experiencing instability with this surrogate on a real problem?

Let me know if this answers your question. Cheers,

josejimenezluna / pyGPGO