Quite easy to find configurations in rbv_super benchmark with negative memory [HPOBench]

Game4Move78 commented 1 year ago

Randomly sampling 13 times is enough to discover a configuration with negative memory:

from hpobench.container.benchmarks.surrogates.yahpo_gym import YAHPOGymBenchmark

scenario = 'rbv2_super'
instance = 6
rng = 9

benchmark = YAHPOGymBenchmark(scenario=scenario, instance=str(instance), rng=rng)
configspace = benchmark.get_configuration_space(rng)
fidelityspace = benchmark.get_fidelity_space(rng)
main_fidelity = fidelityspace.get_hyperparameters()[0]

if isinstance(main_fidelity, CSH.OrdinalHyperparameter):
    R = self.main_fidelity.sequence[-1]
elif isinstance(self.main_fidelity,(
        CSH.UniformFloatHyperparameter,
        CSH.UniformIntegerHyperparameter
)):
    R = self.main_fidelity.upper
else:
    raise NotImplementedError("Fidelity type not implemented")

config = configspace.sample_configuration()
fidelity = {main_fidelity.name: R}
num_tries = 0

while benchmark.objective_function(config, fidelity)['info']['objectives']['memory'] > 0.:
    num_tries += 1
    config = configspace.sample_configuration()
    assert num_tries <= 100, 'Stop inf loop'

sumny commented 1 year ago

Hi @Game4Move78,

thank you very much for opening this issue. The problem with rbv2 scenarios and memory (more precisely rbv2_super and rbv2_xgboost) is the following: During the raw data collection, overall memory usage of the process was tracked via /usr/bin/time on the cluster (i.e., maximum resident size). However, for xgboost models (especially if using the darts booster) this measurement apparently was quite buggy and did not always return meaningful values but often values close to zero or even numerically zero. Additionally, during fitting of the surrogate on the raw data, scalers on the targets are used which in this case (xgboost and memory) can result in some overflow so values slightly below 0 can then be predicted by the surrogate (you will notice that those negative memory estimates are usually very close to zero, i.e., around - 1e-5.

Overall, this is of course very suboptimal and currently there is no good workaround. We are aware of the issues of memory and rbv2_* (https://slds-lmu.github.io/yahpo_gym/frequently_asked.html memory estimation for rbv2_*) and will address this in the v2 version of Yahpo Gym (#65 #67) which however still needs some time. A potential workaround for now could be to restrict the configspace to not allow for the xgboost + darts combination but ideally users should not rely on memory objectives of rbv2_super and rbv2_xgboost for now (or at least take them with a grain of salt) - sorry.

I'll keep this issue open for visibility.

Edit: On a side note, the code above sets the repl fidelity (replication of cv folds) as the main fidelity parameter (instead of the trainsize) - not sure if you actually want this. Also, if you do not specify both fidelities in this line while benchmark.objective_function(config, fidelity)['info']['objectives']['memory'] > 0.: HPOBench will use the default value for the fidelity parameter not provided (which again might not be always meaningful). Note that this double fidelity space (repl and trainsize) is specific to the rbv2_* scenarios.

Game4Move78 commented 1 year ago

@sumny Thank you very much for pointing out the issue with the choice of fidelities. How would you suggest I specify the fidelity space for optimisers that would like one fidelity parameter when the benchmark is multi-multi-fidelity such as benchmark rbv2_* ? Regarding ignored fidelity parameters is it best to explicitly set them to their maximum value?

sumny commented 1 year ago

For the rbv2_* scenarios I would in general go with trainsize as the main fidelity and fix repl to 10.

slds-lmu / yahpo_gym

Quite easy to find configurations in rbv_super benchmark with negative memory [HPOBench] #73