Improve handling of failed evaluations

AngelFP commented 9 months ago

Addresses #143, #144.

This PR adds a status parameter to Trial, which can be either CANDIDATE, RUNNING, COMPLETED or FAILED. This status is also used to inform Ax of whether a trial has failed, so that it can be properly handled by the surrogate model. Failed trials in the Ax Service generators can be labeled as FAILED or ABANDONED, where the latter implies that the failed trial will not be suggested again. This behavior is controlled by the new parameter abandon_failed_trials (True by default).

With the proposed implementation, trials will be considered as failed if any of these two conditions are met:

LibEnsemble reports that the submitted task has failed. This only applies to the TemplateEvaluator.
The evaluation returns NaN for the value of any of the objectives. This applies to all evaluators.

Case 2 includes the case in which the evaluation or analysis function failed to provide a value of the objective. Previously, this would result in the objective being returned with 0 as value, which could confuse the optimizer.

Changes

Implement new TrialStatus class.
Add status property to Trial and other related methods.
Add trial_status to history.
Prefill output array with NaNs.
Handle FAILED trials in the Ax generators. By default they are set as ABANDONED so that they are not suggested again. This behavior can be controlled with the abandon_failed_trials argument.
Remove unnecessary variables in sim_specs["out"].
In the generator, distinguish between completed and evaluated trials. An evaluated trial is one whose evaluation has completed or failed.
Apply a workaround to prevent the cwd to change when running with threading comms.
Add option to mark trials as failed after completion (addresses #144).
Add new tests.

RemiLehe commented 7 months ago

Thanks for this PR @AngelFP. Could you fix the conflicts with the main branch?

AngelFP commented 7 months ago

Thanks for this PR @AngelFP. Could you fix the conflicts with the main branch?

Good you noticed that. Conflicts solved :)

optimas-org / optimas

Improve handling of failed evaluations #154

Changes