optimas-org / optimas

Optimization at scale, powered by libEnsemble
https://optimas.readthedocs.io
Other
22 stars 13 forks source link

Error handling: freeze instead of crash and backtrace #226

Closed n01r closed 1 month ago

n01r commented 2 months ago

If you have a FunctionEvaluator (and possibly other evaluator types, too?) and inside the function an error occurs, e.g., when a variable is uninitialized (for instance due to a typo) the execution of an optimas script will just freeze and there is no crash or backtrace that can tell a user what went wrong.

Reproducer: use an example from the docs and print an uninitialized variable.

def eval_func_sf_moo(input_params, output_params):
    """Example multi-objective function."""
    x1 = input_params["x1"]
    x2 = input_params["x2"]
    result = -(x1 + 10 * np.cos(x1)) * (x2 + 5 * np.cos(x2))
    output_params["f1"] = result
    output_params["f2"] = result * 2
    output_params["p1"] = np.sin(x1) + np.cos(x2)

    # the line below would normally cause a crash
    print(not_initialized_var)   

Should give the error

    print(not_initialized_var)
          ^^^^^^^^^^^^^^^^^^^
NameError: name 'not_initialized_var' is not defined

but instead the execution just freezes.

n01r commented 2 months ago

Is it possible to implement error handling and pass that error to the user as well as terminate the execution?

delaossa commented 2 months ago

This might be related with this other issue https://github.com/optimas-org/optimas/issues/218, which also freezes instead of crashing. In that case, the freezing problem was also observed when using TemplateEvaluator.

shuds13 commented 1 month ago

This is a bug with gen_on_manager where the manager waits on the thread shutting down too soon creating a deadlock. Should be fixed by https://github.com/Libensemble/libensemble/pull/1348

shuds13 commented 1 month ago

Our fix is now in the released libEnsemble from v1.4.0

n01r commented 1 month ago

Perfect, the error passing is working now. :)

test output ``` (optimas-wake-t) mgarten@nid200272:/pscratch/sd/m/mgarten/optimas/01_test_error_propagation> python3 run_test.py [INFO 07-30 14:46:18] optimas.generators.base: Generated trial 0 with parameters {'x1': 3.8332079262180385, 'x2': -1.2069030801224425} [0] 2024-07-30 14:46:18,771 libensemble.manager (ERROR): ---- Received error message from worker 1 ---- [0] 2024-07-30 14:46:18,771 libensemble.manager (ERROR): Message: NameError: name 'not_initialized_var' is not defined [0] 2024-07-30 14:46:18,771 libensemble.manager (ERROR): Traceback (most recent call last): File "/global/cfs/cdirs/m4272/mgarten/sw/perlmutter/gpu/venvs/optimas-wake-t/lib/python3.11/site-packages/libensemble/worker.py", line 405, in run response = self._handle(Work) ^^^^^^^^^^^^^^^^^^ File "/global/cfs/cdirs/m4272/mgarten/sw/perlmutter/gpu/venvs/optimas-wake-t/lib/python3.11/site-packages/libensemble/worker.py", line 351, in _handle calc_out, persis_info, calc_status = self._handle_calc(Work, calc_in) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/global/cfs/cdirs/m4272/mgarten/sw/perlmutter/gpu/venvs/optimas-wake-t/lib/python3.11/site-packages/libensemble/worker.py", line 271, in _handle_calc out = calc(calc_in, Work) ^^^^^^^^^^^^^^^^^^^ File "/global/cfs/cdirs/m4272/mgarten/sw/perlmutter/gpu/venvs/optimas-wake-t/lib/python3.11/site-packages/libensemble/utils/runners.py", line 40, in run return self._result(calc_in, Work["persis_info"], Work["libE_info"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/global/cfs/cdirs/m4272/mgarten/sw/perlmutter/gpu/venvs/optimas-wake-t/lib/python3.11/site-packages/libensemble/utils/runners.py", line 34, in _result return self.f(*args) ^^^^^^^^^^^^^ File "/global/cfs/cdirs/m4272/mgarten/sw/perlmutter/gpu/venvs/optimas-wake-t/lib/python3.11/site-packages/optimas/sim_functions.py", line 131, in run_function evaluation_func(input_values, libE_output[0]) File "/pscratch/sd/m/mgarten/optimas/01_test_error_propagation/run_test.py", line 18, in eval_func_sf_moo print(not_initialized_var) ^^^^^^^^^^^^^^^^^^^ NameError: name 'not_initialized_var' is not defined [0] 2024-07-30 14:46:18,775 libensemble.libE (ERROR): Manager exception raised .. aborting ensemble: [0] 2024-07-30 14:46:18,775 libensemble.libE (ERROR): Dumping ensemble history with 0 sims evaluated: Traceback (most recent call last): File "/pscratch/sd/m/mgarten/optimas/01_test_error_propagation/run_test.py", line 42, in exploration.run() File "/global/cfs/cdirs/m4272/mgarten/sw/perlmutter/gpu/venvs/optimas-wake-t/lib/python3.11/site-packages/optimas/explorations/base.py", line 212, in run history, persis_info, flag = libE( ^^^^^ File "/global/cfs/cdirs/m4272/mgarten/sw/perlmutter/gpu/venvs/optimas-wake-t/lib/python3.11/site-packages/pydantic/validate_call_decorator.py", line 59, in wrapper_function return validate_call_wrapper(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/global/cfs/cdirs/m4272/mgarten/sw/perlmutter/gpu/venvs/optimas-wake-t/lib/python3.11/site-packages/pydantic/_internal/_validate_call.py", line 81, in __call__ res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/global/cfs/cdirs/m4272/mgarten/sw/perlmutter/gpu/venvs/optimas-wake-t/lib/python3.11/site-packages/libensemble/libE.py", line 262, in libE return libE_funcs[libE_specs.get("comms", "mpi")]( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/global/cfs/cdirs/m4272/mgarten/sw/perlmutter/gpu/venvs/optimas-wake-t/lib/python3.11/site-packages/libensemble/libE.py", line 509, in libE_local return manager( ^^^^^^^^ File "/global/cfs/cdirs/m4272/mgarten/sw/perlmutter/gpu/venvs/optimas-wake-t/lib/python3.11/site-packages/libensemble/libE.py", line 310, in manager raise LoggedException(*e.args, "See error details above and in ensemble.log") from None libensemble.manager.LoggedException: ('Received error message from worker 1', "NameError: name 'not_initialized_var' is not defined", 'See error details above and in ensemble.log') ```