optimas-org / optimas

Optimization at scale, powered by libEnsemble
https://optimas.readthedocs.io
Other
22 stars 13 forks source link

Something wrong with Parameter types #218

Closed delaossa closed 1 month ago

delaossa commented 3 months ago

Optimas was not running with a previously working case. The only info that I found was in libE_stats.txt, which said something like:

Manager     : Starting ensemble at: 2024-06-03 16:57:21.195
Worker     2: sim_id     1: sim Time: 0.011 Start: 2024-06-03 16:57:21.269 End: 2024-06-03 16:57:21.280 Status: Exception occurred
Worker     4: sim_id     3: sim Time: 0.010 Start: 2024-06-03 16:57:21.269 End: 2024-06-03 16:57:21.279 Status: Exception occurred
Manager     : Gen no     1: gen Time: 0.075 Start: 2024-06-03 16:57:21.212 End: 2024-06-03 16:57:21.287 Status: Persis gen finished

The program didn't really crash. It just stayed idle at this point. After many tries, I figured out that the problem was caused by the definition of one parameter with type int in the analyzed_parameters list given to AxSingleFidelityGenerator:

Parameter('iteration', int)

Once I skip this parameter (or I change the type to float) everything works normally.

delaossa commented 2 months ago

To reproduce the problem one just need to add a parameter with type integer to the Generator, e.g.:

    var1 = VaryingParameter("x0", -50.0, 5.0)
    var2 = VaryingParameter("x1", -5.0, 15.0)
    obj = Objective("f", minimize=False)
    p_int = Parameter('p_int', dtype=int)
    gen = AxSingleFidelityGenerator(
        varying_parameters=[var1, var2],
        objectives=[obj],
        analyzed_parameters=[p_int]
    )

This issue is difficult to deal with as the program will be hanging and won't give any error message. Also, I am not sure if this is an Optimas issue or rather libEnsemble.

shuds13 commented 1 month ago

@delaossa I strongly suspect this is the same bug. Please see if fixed by by https://github.com/Libensemble/libensemble/pull/1348

delaossa commented 1 month ago

Thanks @shuds13! This https://github.com/Libensemble/libensemble/pull/1348 seems to fix the problem with lacking an error message. Now the example above dumps the following error and exits:

[INFO 07-15 11:31:57] optimas.generators.base: Generated trial 0 with parameters {'x0': -33.0, 'x1': 2.9736173152923584}
[0]  2024-07-15 11:31:57,221 libensemble.manager (ERROR): ---- Received error message from worker 1 ----
[0]  2024-07-15 11:31:57,221 libensemble.manager (ERROR): Message: ValueError: cannot convert float NaN to integer
[0]  2024-07-15 11:31:57,221 libensemble.manager (ERROR): Traceback (most recent call last):
  File "/Users/delaossa/local/libensemble/build/__editable__.libensemble-1.3.0+dev-py3-none-any/libensemble/worker.py", line 405, in run
    response = self._handle(Work)
               ^^^^^^^^^^^^^^^^^^
  File "/Users/delaossa/local/libensemble/build/__editable__.libensemble-1.3.0+dev-py3-none-any/libensemble/worker.py", line 351, in _handle
    calc_out, persis_info, calc_status = self._handle_calc(Work, calc_in)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/delaossa/local/libensemble/build/__editable__.libensemble-1.3.0+dev-py3-none-any/libensemble/worker.py", line 271, in _handle_calc
    out = calc(calc_in, Work)
          ^^^^^^^^^^^^^^^^^^^
  File "/Users/delaossa/local/libensemble/build/__editable__.libensemble-1.3.0+dev-py3-none-any/libensemble/utils/runners.py", line 40, in run
    return self._result(calc_in, Work["persis_info"], Work["libE_info"])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/delaossa/local/libensemble/build/__editable__.libensemble-1.3.0+dev-py3-none-any/libensemble/utils/runners.py", line 34, in _result
    return self.f(*args)
           ^^^^^^^^^^^^^
  File "/Users/delaossa/local/optimas/build/__editable__.optimas-0.6.0-py3-none-any/optimas/sim_functions.py", line 128, in run_function
    libE_output[name].fill(np.nan)
ValueError: cannot convert float NaN to integer
...

which helps to find out where the actual problem is.

The initialization of libE_output assigns np.nan values to all the elements, whereas this would be only valid for float types.

A fix to this issue is implemented here https://github.com/optimas-org/optimas/pull/231.