[Bug] Tune crashes with "RuntimeError: Trial#8161 has already finished and can not be updated" in ray 1.7.0

Search before asking

[X] I searched the issues and found no similar issues.

Ray Component

Ray Tune

What happened + What you expected to happen

After upgrade to ray 1.7.0 (from 1.6.0), my script exits with an exception (previously only warnings were there).

Reproduction script

Script is using:

os.environ["TUNE_DISABLE_AUTO_CALLBACK_LOGGERS"] = "1"  # https://github.com/ray-project/ray/issues/18903
os.environ["TUNE_DISABLE_AUTO_CALLBACK_SYNCER"] = "1"  # https://github.com/ray-project/ray/issues/18903
os.environ["TUNE_RESULT_BUFFER_LENGTH"] = "0"  # if 0 report trial result immediately so that trials don't run speculatively

Warnings and exception from the script:

2021-10-09 19:49:42,867 WARNING ray_trial_executor.py:772 -- Over the last 60 seconds, the Tune event loop has been backlogged processing new results. Consider increasing your period of result reporting to improve performance.                                                                                           
2021-10-09 19:49:43,407 WARNING util.py:166 -- The `on_step_end` operation took 0.535 s, which may be a performance bottleneck.                                                                                                                                                                                              
2021-10-09 19:50:01,586 WARNING util.py:166 -- The `on_step_end` operation took 0.530 s, which may be a performance bottleneck.                                                                                                                                                                                              
2021-10-09 19:50:07,729 WARNING util.py:166 -- The `on_step_end` operation took 0.527 s, which may be a performance bottleneck.                                                                                                                                                                                              
2021-10-09 19:50:13,967 WARNING util.py:166 -- The `on_step_end` operation took 0.530 s, which may be a performance bottleneck.                                                                                                                                                                                              
2021-10-09 19:50:20,777 WARNING util.py:166 -- The `on_step_end` operation took 0.541 s, which may be a performance bottleneck.                                                                                                                                                                                              
2021-10-09 19:50:33,234 WARNING util.py:166 -- The `on_step_end` operation took 0.545 s, which may be a performance bottleneck.                                                                                                                                                                                              
2021-10-09 19:50:43,765 WARNING ray_trial_executor.py:772 -- Over the last 60 seconds, the Tune event loop has been backlogged processing new results. Consider increasing your period of result reporting to improve performance.                                                                                           
2021-10-09 19:50:51,181 WARNING util.py:166 -- The `on_step_end` operation took 0.552 s, which may be a performance bottleneck.                                                                                                                                                                                              
2021-10-09 19:50:57,041 WARNING util.py:166 -- The `on_step_end` operation took 0.544 s, which may be a performance bottleneck.                                                                                                                                                                                              
2021-10-09 19:51:03,231 WARNING util.py:166 -- The `on_step_end` operation took 0.551 s, which may be a performance bottleneck.                                                                                                                                                                                              
2021-10-09 19:51:09,011 WARNING util.py:166 -- The `on_step_end` operation took 0.538 s, which may be a performance bottleneck.                                                                                                                                                                                              
2021-10-09 19:51:14,703 WARNING util.py:166 -- The `on_step_end` operation took 0.565 s, which may be a performance bottleneck.                                                                                                                                                                                              
2021-10-09 19:51:37,931 WARNING util.py:166 -- The `on_step_end` operation took 0.533 s, which may be a performance bottleneck.                                                                                                                                                                                              
2021-10-09 19:51:45,019 WARNING ray_trial_executor.py:772 -- Over the last 60 seconds, the Tune event loop has been backlogged processing new results. Consider increasing your period of result reporting to improve performance.                                                                                           
/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/trial/_trial.py:592: UserWarning: The reported value is ignored because this `step` 1 is already reported.
/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/trial/_trial.py:592: UserWarning: The reported value is ignored because this `step` 1 is already reported.
  step
2021-10-09 19:51:57,036 ERROR trial_runner.py:924 -- Trial trainable_71ecb376: Error processing event.
Traceback (most recent call last):
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 898, in _process_trial
    decision = self._process_trial_result(trial, result)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 951, in _process_trial_result
    trial.trial_id, result=flat_result)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/suggest/search_generator.py", line 132, in on_trial_complete
    trial_id=trial_id, result=result, error=error)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/suggest/optuna.py", line 400, in on_trial_complete
    self._ot_study.tell(ot_trial, val, state=ot_trial_state)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/study/study.py", line 662, in tell
    self._storage.set_trial_values(trial_id, values)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/storages/_in_memory.py", line 330, in set_trial_values
    self.check_trial_is_updatable(trial_id, trial.state)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/storages/_base.py", line 723, in check_trial_is_updatable
    "Trial#{} has already finished and can not be updated.".format(trial.number)
RuntimeError: Trial#8161 has already finished and can not be updated.
Traceback (most recent call last):
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 898, in _process_trial
    decision = self._process_trial_result(trial, result)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 951, in _process_trial_result
    trial.trial_id, result=flat_result)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/suggest/search_generator.py", line 132, in on_trial_complete
    trial_id=trial_id, result=result, error=error)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/suggest/optuna.py", line 400, in on_trial_complete
    self._ot_study.tell(ot_trial, val, state=ot_trial_state)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/study/study.py", line 662, in tell
    self._storage.set_trial_values(trial_id, values)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/storages/_in_memory.py", line 330, in set_trial_values
    self.check_trial_is_updatable(trial_id, trial.state)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/storages/_base.py", line 723, in check_trial_is_updatable
    "Trial#{} has already finished and can not be updated.".format(trial.number)
RuntimeError: Trial#8161 has already finished and can not be updated.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "notebooks/factor/price_prediction.py", line 168, in <module>
    reuse_actors=True
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/tune.py", line 581, in run
    runner.step()
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 705, in step
    self._process_events(timeout=timeout)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 863, in _process_events
    self._process_trial(trial)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 925, in _process_trial
    self._process_trial_failure(trial, traceback.format_exc())
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 1139, in _process_trial_failure
    self._search_alg.on_trial_complete(trial.trial_id, error=True)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/suggest/search_generator.py", line 132, in on_trial_complete
    trial_id=trial_id, result=result, error=error)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/suggest/optuna.py", line 400, in on_trial_complete
    self._ot_study.tell(ot_trial, val, state=ot_trial_state)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/study/study.py", line 664, in tell
    self._storage.set_trial_state(trial_id, state)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/storages/_in_memory.py", line 223, in set_trial_state
    self.check_trial_is_updatable(trial_id, trial.state)
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/storages/_base.py", line 723, in check_trial_is_updatable
    "Trial#{} has already finished and can not be updated.".format(trial.number)
RuntimeError: Trial#8161 has already finished and can not be updated.

Anything else

Conda's env.yaml:

name: puma-lab
channels:
  - pyviz
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_gnu
  - abseil-cpp=20210324.2=h9c3ff4c_0
  - alembic=1.7.3=pyhd8ed1ab_0
  - alsa-lib=1.2.3=h516909a_0
  - anyio=3.3.0=py37h89c1867_0
  - argcomplete=1.12.3=pyhd8ed1ab_2
  - argon2-cffi=20.1.0=py37h5e8e339_2
  - arrow-cpp=5.0.0=py37hdf48254_5_cpu
  - async_generator=1.10=py_0
  - attrs=21.2.0=pyhd8ed1ab_0
  - autopage=0.4.0=pyhd8ed1ab_0
  - aws-c-cal=0.5.11=h95a6274_0
  - aws-c-common=0.6.2=h7f98852_0
  - aws-c-event-stream=0.2.7=h3541f99_13
  - aws-c-io=0.10.5=hfb6a706_0
  - aws-checksums=0.1.11=ha31a3da_7
  - aws-sdk-cpp=1.8.186=hb4091e7_3
  - babel=2.9.1=pyh44b312d_0
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=py_2
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - backports.zoneinfo=0.2.1=py37h5e8e339_4
  - bleach=4.1.0=pyhd8ed1ab_0
  - bokeh=2.3.3=py37h89c1867_0
  - brotlipy=0.7.0=py37h5e8e339_1001
  - bzip2=1.0.8=h7f98852_4
  - c-ares=1.17.2=h7f98852_0
  - ca-certificates=2021.5.30=ha878542_0
  - certifi=2021.5.30=py37h89c1867_0
  - cffi=1.14.6=py37hc58025e_0
  - chardet=4.0.0=py37h89c1867_1
  - charset-normalizer=2.0.0=pyhd8ed1ab_0
  - click=8.0.1=py37h89c1867_0
  - clickhouse-cityhash=1.0.2.3=py37h3340039_2
  - clickhouse-driver=0.2.1=py37h5e8e339_0
  - cliff=3.9.0=pyhd8ed1ab_0
  - cloudpickle=2.0.0=pyhd8ed1ab_0
  - cmaes=0.8.2=pyh44b312d_0
  - cmd2=2.2.0=py37h89c1867_0
  - colorama=0.4.4=pyh9f0ad1d_0
  - colorcet=2.0.6=pyhd8ed1ab_0
  - colorlog=6.4.1=py37h89c1867_0
  - conda=4.10.3=py37h89c1867_1
  - conda-package-handling=1.7.3=py37h5e8e339_0
  - cramjam=2.3.1=py37h5e8e339_1
  - cryptography=3.4.7=py37h5d9358c_0
  - cycler=0.10.0=py_2
  - cytoolz=0.11.0=py37h5e8e339_3
  - dask=2021.9.0=pyhd8ed1ab_0
  - dask-core=2021.9.0=pyhd8ed1ab_0
  - datashader=0.13.0=pyh6c4a22f_0
  - datashape=0.5.4=py_1
  - dbus=1.13.6=h48d8840_2
  - debugpy=1.4.1=py37hcd2ae1e_0
  - decorator=5.1.0=pyhd8ed1ab_0
  - defusedxml=0.7.1=pyhd8ed1ab_0
  - distributed=2021.9.0=py37h89c1867_0
  - entrypoints=0.3=py37hc8dfbb8_1002
  - expat=2.4.1=h9c3ff4c_0
  - fastparquet=0.7.1=py37hb1e94ed_0
  - filelock=3.0.12=pyh9f0ad1d_0
  - fontconfig=2.13.1=hba837de_1005
  - freetype=2.10.4=h0708190_1
  - fsspec=2021.8.1=pyhd8ed1ab_0
  - gettext=0.19.8.1=h0b5b191_1005
  - gflags=2.2.2=he1b5a44_1004
  - gitdb=4.0.7=pyhd8ed1ab_0
  - gitpython=3.1.23=pyhd8ed1ab_1
  - glib=2.68.4=h9c3ff4c_0
  - glib-tools=2.68.4=h9c3ff4c_0
  - glog=0.5.0=h48cff8f_0
  - greenlet=1.1.1=py37hcd2ae1e_0
  - grpc-cpp=1.40.0=h850795e_0
  - gst-plugins-base=1.18.5=hf529b03_0
  - gstreamer=1.18.5=h76c114f_0
  - heapdict=1.0.1=py_0
  - holoviews=1.14.5=py_0
  - hvplot=0.7.3=py_0
  - icu=68.1=h58526e2_0
  - idna=3.1=pyhd3deb0d_0
  - importlib-metadata=4.8.1=py37h89c1867_0
  - importlib_metadata=4.8.1=hd8ed1ab_0
  - importlib_resources=5.2.2=pyhd8ed1ab_0
  - ipykernel=6.4.1=py37h6531663_0
  - ipympl=0.7.0=pyhd8ed1ab_0
  - ipython=7.27.0=py37h6531663_0
  - ipython_genutils=0.2.0=py_1
  - ipywidgets=7.6.5=pyhd8ed1ab_0
  - jbig=2.1=h7f98852_2003
  - jedi=0.18.0=py37h89c1867_2
  - jinja2=3.0.1=pyhd8ed1ab_0
  - joblib=1.0.1=pyhd8ed1ab_0
  - jpeg=9d=h36c2ea0_0
  - json5=0.9.5=pyh9f0ad1d_0
  - jsonschema=3.2.0=py37hc8dfbb8_1
  - jupyter-server-mathjax=0.2.3=pyhd8ed1ab_0
  - jupyter_client=7.0.2=pyhd8ed1ab_0
  - jupyter_contrib_core=0.3.3=py_2
  - jupyter_contrib_nbextensions=0.5.1=py37hc8dfbb8_1
  - jupyter_core=4.7.1=py37h89c1867_0
  - jupyter_highlight_selected_word=0.2.0=py37h89c1867_1002
  - jupyter_latex_envs=1.4.6=py37h89c1867_1001
  - jupyter_nbextensions_configurator=0.4.1=py37h89c1867_2
  - jupyter_server=1.11.0=pyhd8ed1ab_0
  - jupyterlab=3.1.11=pyhd8ed1ab_0
  - jupyterlab-git=0.32.2=pyhd8ed1ab_0
  - jupyterlab_pygments=0.1.2=pyh9f0ad1d_0
  - jupyterlab_server=2.8.1=pyhd8ed1ab_0
  - jupyterlab_widgets=1.0.2=pyhd8ed1ab_0
  - kiwisolver=1.3.2=py37h2527ec5_0
  - krb5=1.19.2=hcc1bbae_0
  - lcms2=2.12=hddcbb42_0
  - ld_impl_linux-64=2.36.1=hea4e1c9_2
  - lerc=2.2.1=h9c3ff4c_0
  - libarchive=3.5.2=hccf745f_0
  - libblas=3.9.0=11_linux64_openblas
  - libbrotlicommon=1.0.9=h7f98852_5
  - libbrotlidec=1.0.9=h7f98852_5
  - libbrotlienc=1.0.9=h7f98852_5
  - libcblas=3.9.0=11_linux64_openblas
  - libclang=11.1.0=default_ha53f305_1
  - libcurl=7.78.0=h2574ce0_0
  - libdeflate=1.7=h7f98852_5
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libevent=2.1.10=hcdb4288_3
  - libffi=3.3=h58526e2_2
  - libgcc-ng=11.1.0=hc902ee8_8
  - libgfortran-ng=11.1.0=h69a702a_8
  - libgfortran5=11.1.0=h6c583b3_8
  - libglib=2.68.4=h3e27bee_0
  - libgomp=11.1.0=hc902ee8_8
  - libiconv=1.16=h516909a_0
  - liblapack=3.9.0=11_linux64_openblas
  - libllvm11=11.1.0=hf817b99_2
  - libnghttp2=1.43.0=h812cca2_0
  - libogg=1.3.4=h7f98852_1
  - libopenblas=0.3.17=pthreads_h8fe5266_1
  - libopus=1.3.1=h7f98852_1
  - libpng=1.6.37=h21135ba_2
  - libpq=13.3=hd57d9b9_0
  - libprotobuf=3.16.0=h780b84a_0
  - libsodium=1.0.18=h36c2ea0_1
  - libsolv=0.7.19=h780b84a_5
  - libssh2=1.10.0=ha56f1ee_0
  - libstdcxx-ng=11.1.0=h56837e0_8
  - libta-lib=0.4.0=h516909a_0
  - libthrift=0.14.2=he6d91bd_1
  - libtiff=4.3.0=hf544144_1
  - libutf8proc=2.6.1=h7f98852_0
  - libuuid=2.32.1=h7f98852_1000
  - libuv=1.42.0=h7f98852_0
  - libvorbis=1.3.7=h9c3ff4c_0
  - libwebp-base=1.2.1=h7f98852_0
  - libxcb=1.13=h7f98852_1003
  - libxkbcommon=1.0.3=he3ba5ed_0
  - libxml2=2.9.12=h72842e0_0
  - libxslt=1.1.33=h15afd5d_2
  - llvmlite=0.37.0=py37h9d7f4d0_0
  - locket=0.2.0=py_2
  - lxml=4.6.3=py37h77fd288_0
  - lz4-c=1.9.3=h9c3ff4c_1
  - lzo=2.10=h516909a_1000
  - mako=1.1.5=pyhd8ed1ab_0
  - mamba=0.15.3=py37h7f483ca_0
  - markdown=3.3.4=pyhd8ed1ab_0
  - markupsafe=2.0.1=py37h5e8e339_0
  - matplotlib=3.4.3=py37h89c1867_0
  - matplotlib-base=3.4.3=py37h1058ff1_0
  - matplotlib-inline=0.1.3=pyhd8ed1ab_0
  - mistune=0.8.4=py37h5e8e339_1004
  - modin-core=0.10.2=py37h89c1867_3
  - modin-ray=0.10.2=py37h89c1867_3
  - msgpack-python=1.0.2=py37h2527ec5_1
  - multipledispatch=0.6.0=py_0
  - mysql-common=8.0.25=ha770c72_2
  - mysql-libs=8.0.25=hfa10184_2
  - nb_conda_kernels=2.3.1=py37h89c1867_0
  - nbclassic=0.3.1=pyhd8ed1ab_1
  - nbclient=0.5.4=pyhd8ed1ab_0
  - nbconvert=6.1.0=py37h89c1867_0
  - nbdime=3.1.0=pyhd8ed1ab_0
  - nbformat=5.1.3=pyhd8ed1ab_0
  - ncurses=6.2=h58526e2_4
  - nest-asyncio=1.5.1=pyhd8ed1ab_0
  - notebook=6.4.3=pyha770c72_0
  - nspr=4.30=h9c3ff4c_0
  - nss=3.69=hb5efdd6_0
  - numba=0.54.0=py37h2d894fd_0
  - numpy=1.20.3=py37h038b26d_1
  - olefile=0.46=pyh9f0ad1d_1
  - openjpeg=2.4.0=hb52868f_1
  - openssl=1.1.1l=h7f98852_0
  - optuna=2.9.1=pyhd8ed1ab_0
  - orc=1.6.10=h58a87f1_0
  - packaging=21.0=pyhd8ed1ab_0
  - pandas=1.3.2=py37he8f5f7f_0
  - pandoc=2.14.2=h7f98852_0
  - pandocfilters=1.4.2=py_1
  - panel=0.12.1=py_0
  - param=1.11.1=pyh6c4a22f_0
  - parquet-cpp=1.5.1=1
  - parso=0.8.2=pyhd8ed1ab_0
  - partd=1.2.0=pyhd8ed1ab_0
  - patsy=0.5.2=pyhd8ed1ab_0
  - pbr=5.6.0=pyhd8ed1ab_0
  - pcre=8.45=h9c3ff4c_0
  - pexpect=4.8.0=py37hc8dfbb8_1
  - pickle5=0.0.11=py37h5e8e339_0
  - pickleshare=0.7.5=py37hc8dfbb8_1002
  - pillow=8.3.2=py37h0f21c89_0
  - pip=21.2.4=pyhd8ed1ab_0
  - prettytable=2.2.0=pyhd8ed1ab_0
  - prometheus_client=0.11.0=pyhd8ed1ab_0
  - prompt-toolkit=3.0.20=pyha770c72_0
  - psutil=5.8.0=py37h5e8e339_1
  - pthread-stubs=0.4=h36c2ea0_1001
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pyarrow=5.0.0=py37h58331f5_5_cpu
  - pycosat=0.6.3=py37h5e8e339_1006
  - pycparser=2.20=pyh9f0ad1d_2
  - pyct=0.4.6=py_0
  - pyct-core=0.4.6=py_0
  - pygments=2.10.0=pyhd8ed1ab_0
  - pykalman=0.9.5=py_1
  - pyopenssl=20.0.1=pyhd8ed1ab_0
  - pyparsing=2.4.7=pyh9f0ad1d_0
  - pyperclip=1.8.2=pyhd8ed1ab_2
  - pyqt=5.12.3=py37h89c1867_7
  - pyqt-impl=5.12.3=py37he336c9b_7
  - pyqt5-sip=4.19.18=py37hcd2ae1e_7
  - pyqtchart=5.12=py37he336c9b_7
  - pyqtwebengine=5.12.1=py37he336c9b_7
  - pyrsistent=0.17.3=py37h5e8e339_2
  - pysocks=1.7.1=py37h89c1867_3
  - python=3.7.10=hffdb5ce_100_cpython
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python_abi=3.7=2_cp37m
  - pytz=2021.1=pyhd8ed1ab_0
  - pyviz_comms=2.1.0=py_0
  - pyyaml=5.4.1=py37h5e8e339_1
  - pyzmq=22.2.1=py37h336d617_0
  - qt=5.12.9=hda022c4_4
  - re2=2021.09.01=h9c3ff4c_0
  - readline=8.1=h46c0cb4_0
  - redis-py=3.5.3=pyh9f0ad1d_0
  - reproc=14.2.3=h7f98852_0
  - reproc-cpp=14.2.3=h9c3ff4c_0
  - requests=2.26.0=pyhd8ed1ab_0
  - requests-unixsocket=0.2.0=py_0
  - ruamel_yaml=0.15.80=py37h5e8e339_1004
  - s2n=1.0.10=h9b69904_0
  - scikit-learn=0.24.2=py37hf0f1638_1
  - send2trash=1.8.0=pyhd8ed1ab_0
  - setproctitle=1.1.10=py37h5e8e339_1004
  - setuptools=58.0.4=py37h89c1867_0
  - six=1.16.0=pyh6c4a22f_0
  - smmap=3.0.5=pyh44b312d_0
  - snappy=1.1.8=he1b5a44_3
  - sniffio=1.2.0=py37h89c1867_1
  - sortedcontainers=2.4.0=pyhd8ed1ab_0
  - sqlalchemy=1.4.25=py37h5e8e339_0
  - sqlite=3.36.0=h9cd32fc_1
  - statsmodels=0.12.2=py37hb1e94ed_0
  - stevedore=3.4.0=py37h89c1867_0
  - ta-lib=0.4.19=py37ha21ca33_2
  - tabulate=0.8.9=pyhd8ed1ab_0
  - tblib=1.7.0=pyhd8ed1ab_0
  - tensorboardx=2.4=pyhd8ed1ab_0
  - terminado=0.12.1=py37h89c1867_0
  - testpath=0.5.0=pyhd8ed1ab_0
  - threadpoolctl=2.2.0=pyh8a188c0_0
  - thrift=0.13.0=py37hcd2ae1e_2
  - tk=8.6.11=h27826a3_1
  - toolz=0.11.1=py_0
  - tornado=6.1=py37h5e8e339_1
  - tqdm=4.62.2=pyhd8ed1ab_0
  - traitlets=5.1.0=pyhd8ed1ab_0
  - typing_extensions=3.10.0.0=pyha770c72_0
  - tzdata=2021a=he74cb21_1
  - tzlocal=3.0=py37h89c1867_2
  - urllib3=1.26.6=pyhd8ed1ab_0
  - wcwidth=0.2.5=pyh9f0ad1d_2
  - webencodings=0.5.1=py_1
  - websocket-client=0.57.0=py37h89c1867_4
  - wheel=0.37.0=pyhd8ed1ab_1
  - widgetsnbextension=3.5.1=py37h89c1867_4
  - xarray=0.19.0=pyhd8ed1ab_1
  - xeus=2.0.0=h7d0c39e_0
  - xeus-python=0.13.0=py37h4b46df4_1
  - xeus-python-shell=0.1.5=pyhd8ed1ab_0
  - xorg-libxau=1.0.9=h7f98852_0
  - xorg-libxdmcp=1.1.3=h7f98852_0
  - xz=5.2.5=h516909a_1
  - yaml=0.2.5=h516909a_0
  - zeromq=4.3.4=h9c3ff4c_1
  - zict=2.0.0=py_0
  - zipp=3.5.0=pyhd8ed1ab_0
  - zlib=1.2.11=h516909a_1010
  - zstandard=0.15.2=py37h5e8e339_0
  - zstd=1.5.0=ha95c52a_0
  - pip:
    - absl-py==0.13.0
    - aiohttp==3.7.4.post0
    - aiohttp-cors==0.7.0
    - aioredis==1.3.1
    - async-timeout==3.0.1
    - autograd==1.3
    - bayesian-optimization==1.2.0
    - blessings==1.7
    - cachetools==4.2.2
    - cma==2.7.0
    - colorful==0.5.4
    - cython==0.29.24
    - future==0.18.2
    - google-api-core==1.31.2
    - google-auth==1.35.0
    - google-auth-oauthlib==0.4.6
    - googleapis-common-protos==1.53.0
    - gpustat==0.6.0
    - gpy==1.10.0
    - gpytorch==1.5.1
    - grpcio==1.40.0
    - hebo==0.1.0
    - hiredis==2.0.0
    - multidict==5.1.0
    - nevergrad==0.4.3.post8
    - nvidia-ml-py3==7.352.0
    - oauthlib==3.1.1
    - opencensus==0.7.13
    - opencensus-context==0.1.2
    - paramz==0.9.5
    - protobuf==3.17.3
    - py-spy==0.3.9
    - pyasn1==0.4.8
    - pyasn1-modules==0.2.8
    - pymoo==0.4.2.2
    - ray==1.7.0
    - requests-oauthlib==1.3.0
    - rsa==4.7.2
    - scipy==1.5.4
    - sklearn==0.0
    - tensorboard==2.6.0
    - tensorboard-data-server==0.6.1
    - tensorboard-plugin-wit==1.8.0
    - torch==1.9.1
    - werkzeug==2.0.1
    - yarl==1.6.3

Are you willing to submit a PR?

[ ] Yes I am willing to submit a PR!

@krfricke here you go:

import os
import random

os.environ["TUNE_DISABLE_AUTO_CALLBACK_LOGGERS"] = "1"  # https://github.com/ray-project/ray/issues/18903
os.environ["TUNE_DISABLE_AUTO_CALLBACK_SYNCER"] = "1"  # https://github.com/ray-project/ray/issues/18903
os.environ["TUNE_RESULT_BUFFER_LENGTH"] = "0"  

import numpy as np
import ray
from ray import tune
from ray.tune.suggest import optuna

def evaluation_fn():
    return random.randint(1, 10_000)

def easy_objective(config, data):
    intermediate_score = evaluation_fn()
    tune.report(mean_loss=intermediate_score)

if __name__ == "__main__":
    ray.init(address='auto', _redis_password='xxx')
    df = np.zeros(10_000_000)
    search_optuna = optuna.OptunaSearch()
    analysis = tune.run(
        tune.with_parameters(easy_objective, data=df),
        name="test",
        metric="mean_loss",
        mode="max",
        search_alg=search_optuna,
        num_samples=-1,
        config={
            "width": tune.uniform(0, 20),
            "height": tune.uniform(-100, 100)
        },
        reuse_actors=True,
        fail_fast=True,
        verbose=1
    )

On my 3 node cluster it crashes with:

== Status ==                                                                   
Memory usage on this node: 15.1/31.3 GiB                                                                                                                      
Using FIFO scheduling algorithm.                                               
Resources requested: 51.0/52 CPUs, 0/2 GPUs, 0.0/102.14 GiB heap, 0.0/47.77 GiB objects (0.0/1.0 accelerator_type:GT, 0.0/1.0 accelerator_type:G)                                                                                                                                                                            
Current best trial: a485397c with mean_loss=10000 and parameters={'width': 7.240691732613056, 'height': -58.746246442990405}                                                                                                                                                                                                 
Result logdir: /home/toaster/ray_results/test                                                                                                                 
Number of trials: 27650/infinite (1 PENDING, 51 RUNNING, 27598 TERMINATED)                                                                                    

2021-10-14 20:42:07,566 ERROR trial_runner.py:846 -- Trial easy_objective_57d58564: Error processing event.                                                                                                                                                                                                                  
Traceback (most recent call last):                                             
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 820, in _process_trial                                                                                                                                                                                      
    decision = self._process_trial_result(trial, result)                                                                                                      
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 873, in _process_trial_result                                                                                                                                                                               
    trial.trial_id, result=flat_result)                                        
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/suggest/search_generator.py", line 132, in on_trial_complete                                                                                                                                                                       
    trial_id=trial_id, result=result, error=error)                                                                                                            
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/suggest/optuna.py", line 385, in on_trial_complete                                                                                                                                                                                 
    self._ot_study.tell(ot_trial, val, state=ot_trial_state)                                                                                                  
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/study/study.py", line 662, in tell                                                                                                                                                                                                   
    self._storage.set_trial_values(trial_id, values)                                                                                                          
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/storages/_in_memory.py", line 330, in set_trial_values                                                                                                                                                                               
    self.check_trial_is_updatable(trial_id, trial.state)                                                                                                      
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/storages/_base.py", line 723, in check_trial_is_updatable                                                                                                                                                                            
    "Trial#{} has already finished and can not be updated.".format(trial.number)                                                                              
RuntimeError: Trial#5562 has already finished and can not be updated.                                                                                         
Traceback (most recent call last):                                             
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 820, in _process_trial                                                                                                                                                                                      
    decision = self._process_trial_result(trial, result)                                                                                                      
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 873, in _process_trial_result                                                                                                                                                                               
    trial.trial_id, result=flat_result)                                        
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/suggest/search_generator.py", line 132, in on_trial_complete                                                                                                                                                                       
    trial_id=trial_id, result=result, error=error)                                                                                                            
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/suggest/optuna.py", line 385, in on_trial_complete                                                                                                                                                                                 
    self._ot_study.tell(ot_trial, val, state=ot_trial_state)                                                                                                  
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/study/study.py", line 662, in tell                                                                                                                                                                                                   
    self._storage.set_trial_values(trial_id, values)                                                                                                          
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/storages/_in_memory.py", line 330, in set_trial_values                                                                                                                                                                               
    self.check_trial_is_updatable(trial_id, trial.state)                                                                                                      
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/storages/_base.py", line 723, in check_trial_is_updatable                                                                                                                                                                            
    "Trial#{} has already finished and can not be updated.".format(trial.number)                                                                              
RuntimeError: Trial#5562 has already finished and can not be updated.                                                                                         

During handling of the above exception, another exception occurred:                                                                                           

Traceback (most recent call last):                                             
  File "test.py", line 44, in <module>                                         
    verbose=1                                                                  
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/tune.py", line 588, in run                                                                                                                                                                                                         
    runner.step()                                                              
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 627, in step                                                                                                                                                                                                
    self._process_events(timeout=timeout)                                                                                                                     
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 785, in _process_events                                                                                                                                                                                     
    self._process_trial(trial)                                                 
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 847, in _process_trial                                                                                                                                                                                      
    self._process_trial_failure(trial, traceback.format_exc())                                                                                                
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 1058, in _process_trial_failure                                                                                                                                                                             
    self._search_alg.on_trial_complete(trial.trial_id, error=True)                                                                                            
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/suggest/search_generator.py", line 132, in on_trial_complete                                                                                                                                                                       
    trial_id=trial_id, result=result, error=error)                                                                                                            
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/tune/suggest/optuna.py", line 385, in on_trial_complete                                                                                                                                                                                 
    self._ot_study.tell(ot_trial, val, state=ot_trial_state)                                                                                                  
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/study/study.py", line 664, in tell                                                                                                                                                                                                   
    self._storage.set_trial_state(trial_id, state)                                                                                                            
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/storages/_in_memory.py", line 223, in set_trial_state                                                                                                                                                                                
    self.check_trial_is_updatable(trial_id, trial.state)                                                                                                      
  File "/home/toaster/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/optuna/storages/_base.py", line 723, in check_trial_is_updatable                                                                                                                                                                            
    "Trial#{} has already finished and can not be updated.".format(trial.number)                                                                              
RuntimeError: Trial#5562 has already finished and can not be updated.

ray-project / ray

[Bug] Tune crashes with "RuntimeError: Trial#8161 has already finished and can not be updated" in ray 1.7.0 #19274