ray-project / tune-sklearn

A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.
https://docs.ray.io/en/master/tune/api_docs/sklearn.html
Apache License 2.0
465 stars 52 forks source link

Neverending ray tuning #153

Closed DanielAtKrypton closed 3 years ago

DanielAtKrypton commented 3 years ago

Problem description

I am setting up a test framework for ray tune but unfortunately I got stuck when I was trying to tune the learning rate hyperparameter of a pipelined network.

The test code can be found here.

I noticed when debugging the test that the tuning spawns many threads as can be seen at the call stack to the left and below:

Imgur

Despite I have already installed gpustat by running pip install gpustat, there is stihl a message warning me to install it. These threads stay open for hours and there is no other feedback in the terminal.

Is there anything I am missing to make the learning rate hyperparameter tuning work smoothly here?

Environment information

Vs Code

Imgur

Python dependencies:

requiremets lock

Yard1 commented 3 years ago

What happens if you set n_jobs=1 in TuneGridSearchCV?

DanielAtKrypton commented 3 years ago

What happens if you set n_jobs=1 in TuneGridSearchCV?

The same behaviour as a result...

Yard1 commented 3 years ago

Can you try cv=2?

DanielAtKrypton commented 3 years ago

Can you try cv=2?

Sure. I got still the same behavior with n_jobs=1 and cv=2:

best_score, best_params = tsp.tune_grid_search(
    selected_tunable_params,
    flights_dataset,
    n_jobs=1,
    cv=2,
    scoring="accuracy",
    verbose=2,
    use_gpu=False)
DanielAtKrypton commented 3 years ago

My pip list within the virtual environment:

Package                        Version     Location
------------------------------ ----------- -------------------------------------------------------
aiohttp                        3.7.3
aiohttp-cors                   0.7.0
aioredis                       1.3.1
alabaster                      0.7.12
argon2-cffi                    20.1.0
astroid                        2.4.2
async-generator                1.10
async-timeout                  3.0.1
atomicwrites                   1.4.0
attrs                          20.3.0
autopep8                       1.5.4
Babel                          2.9.0
backcall                       0.2.0
beautifulsoup4                 4.9.3
bleach                         3.2.1
blessings                      1.7
bump2version                   1.0.1
bumpversion                    0.6.0
cachetools                     4.1.1
certifi                        2020.11.8
cffi                           1.14.4
chardet                        3.0.4
click                          7.1.2
colorama                       0.4.4
colorful                       0.5.4
commonmark                     0.9.1
coverage                       5.3
cycler                         0.10.0
dataclasses                    0.6
decorator                      4.4.2
defusedxml                     0.6.0
docutils                       0.16
entrypoints                    0.3
filelock                       3.0.12
flights-time-series-dataset    1.0.0
future                         0.18.2
google                         3.0.0
google-api-core                1.23.0
google-auth                    1.23.0
googleapis-common-protos       1.52.0
gpustat                        0.6.0
grpcio                         1.33.2
hiredis                        1.1.0
idna                           2.10
imagesize                      1.2.0
importlib-metadata             3.1.0
iniconfig                      1.1.1
ipykernel                      5.3.4
ipython                        7.19.0
ipython-genutils               0.2.0
isort                          5.6.4
jedi                           0.17.2
Jinja2                         2.11.2
joblib                         0.17.0
json5                          0.9.5
jsonschema                     3.2.0
jupyter-client                 6.1.7
jupyter-core                   4.7.0
jupyterlab                     2.2.9
jupyterlab-pygments            0.1.2
jupyterlab-server              1.2.0
keyring                        21.5.0
kiwisolver                     1.3.1
lazy-object-proxy              1.4.3
lxml                           4.6.2
MarkupSafe                     1.1.1
matplotlib                     3.3.3
mccabe                         0.6.1
mistune                        0.8.4
msgpack                        1.0.0
multidict                      5.0.2
nbclient                       0.5.1
nbconvert                      6.0.7
nbformat                       5.0.8
nbsphinx                       0.8.0
nest-asyncio                   1.4.3
notebook                       6.1.5
numpy                          1.19.0
nvidia-ml-py3                  7.352.0
opencensus                     0.7.11
opencensus-context             0.1.2
oze-dataset                    1.0.0
packaging                      20.7
pandas                         1.1.4
pandocfilters                  1.4.3
parameterized                  0.7.4
parso                          0.7.1
pickleshare                    0.7.5
Pillow                         8.0.1
pip                            20.2.4
pip-tools                      5.4.0
pkginfo                        1.6.1
pluggy                         0.13.1
prometheus-client              0.9.0
prompt-toolkit                 3.0.8
protobuf                       3.14.0
psutil                         5.7.3
py                             1.9.0
py-spy                         0.3.3
pyasn1                         0.4.8
pyasn1-modules                 0.2.8
pycodestyle                    2.6.0
pycparser                      2.20
Pygments                       2.7.2
pylint                         2.6.0
pyparsing                      2.4.7
pyrsistent                     0.17.3
pytest                         6.1.2
pytest-cov                     2.10.1
python-dateutil                2.8.1
python-dotenv                  0.15.0
pytz                           2020.4
pywin32                        300
pywin32-ctypes                 0.2.0
pywinpty                       0.5.7
PyYAML                         5.3.1
pyzmq                          20.0.0
ray                            1.0.1.post1
readme-renderer                28.0
recommonmark                   0.6.0
redis                          3.4.1
requests                       2.25.0
requests-toolbelt              0.9.1
rfc3986                        1.4.0
rsa                            4.6
rstcheck                       3.3.1
scikit-learn                   0.23.2
scipy                          1.5.4
seaborn                        0.11.0
Send2Trash                     1.5.0
setuptools                     41.2.0
six                            1.15.0
sklearn                        0.0
skorch                         0.9.0
snowballstemmer                2.0.0
soupsieve                      2.0.1
Sphinx                         3.3.1
sphinx-autodoc-typehints       1.11.1
sphinx-rtd-theme               0.5.0
sphinxcontrib-applehelp        1.0.2
sphinxcontrib-devhelp          1.0.2
sphinxcontrib-htmlhelp         1.0.3
sphinxcontrib-jsmath           1.0.1
sphinxcontrib-qthelp           1.0.3
sphinxcontrib-serializinghtml  1.1.4
sphinxcontrib-svg2pdfconverter 1.1.0
tabulate                       0.8.7
tensorboardX                   2.1
terminado                      0.9.1
testpath                       0.4.4
threadpoolctl                  2.1.0
time-series-dataset            0.0.2
time-series-models             0.1.1
time-series-predictor          2.2.0       c:\users\daniel\workspaces\python\time_series_predictor
toml                           0.10.2
torch                          1.7.0+cu110
tornado                        6.1
tqdm                           4.54.0
traitlets                      5.0.5
tune-sklearn                   0.1.0
twine                          3.2.0
typed-ast                      1.4.1
typing-extensions              3.7.4.3
urllib3                        1.26.2
wcwidth                        0.2.5
webencodings                   0.5.1
wheel                          0.35.1
wrapt                          1.12.1
yarl                           1.6.3
zipp                           3.4.0
Yard1 commented 3 years ago

Can you try updating tune-sklearn to the version on github? pip install -U git+https://github.com/ray-project/tune-sklearn.git And also please make sure that your Ray version is up to date.

DanielAtKrypton commented 3 years ago

Can you try updating tune-sklearn to the version on github? pip install -U git+https://github.com/ray-project/tune-sklearn.git And also please make sure that your Ray version is up to date.

After I updated with the command above, it went to version 0.0.8. Now the test crashes with the following info:

Windows fatal exception: stack overflow

Thread 0x00003438 (most recent call first):
  File "C:\Python37\lib\threading.py", line 300 in wait
  File "C:\Python37\lib\threading.py", line 552 in wait
  File "c:\Users\Daniel\.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy\_vendored\pydevd\pydevd.py", line 232 in _on_run
  File "c:\Users\Daniel\.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_daemon_thread.py", line 46 in run
  File "C:\Python37\lib\threading.py", line 926 in _bootstrap_inner
  File "C:\Python37\lib\threading.py", line 890 in _bootstrap

Thread 0x00004a5c (most recent call first):
  File "C:\Python37\lib\threading.py", line 300 in wait
  File "C:\Python37\lib\threading.py", line 552 in wait
  File "c:\Users\Daniel\.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy\_vendored\pydevd\pydevd.py", line 186 in _on_run
  File "c:\Users\Daniel\.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_daemon_thread.py", line 46 in run
  File "C:\Python37\lib\threading.py", line 926 in _bootstrap_inner
  File "C:\Python37\lib\threading.py", line 890 in _bootstrap

Thread 0x00005e20 (most recent call first):
  File "C:\Python37\lib\threading.py", line 296 in wait
  File "C:\Python37\lib\threading.py", line 552 in wait
  File "c:\Users\Daniel\.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_timeout.py", line 43 in _on_run
  File "c:\Users\Daniel\.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_daemon_thread.py", line 46 in run
  File "C:\Python37\lib\threading.py", line 926 in _bootstrap_inner
  File "C:\Python37\lib\threading.py", line 890 in _bootstrap

Thread 0x000067e4 (most recent call first):
  File "c:\Users\Daniel\.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_comm.py", line 210 in _read_line
  File "c:\Users\Daniel\.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_comm.py", line 228 in _on_run
  File "c:\Users\Daniel\.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_daemon_thread.py", line 46 in run
  File "C:\Python37\lib\threading.py", line 926 in _bootstrap_inner
  File "C:\Python37\lib\threading.py", line 890 in _bootstrap

Thread 0x0000337c (most recent call first):
  File "C:\Python37\lib\threading.py", line 300 in wait
  File "C:\Python37\lib\queue.py", line 179 in get
  File "c:\Users\Daniel\.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_comm.py", line 339 in _on_run
  File "c:\Users\Daniel\.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_daemon_thread.py", line 46 in run
  File "C:\Python37\lib\threading.py", line 926 in _bootstrap_inner
  File "C:\Python37\lib\threading.py", line 890 in _bootstrap

Current thread 0x00005d54 (most recent call first):
  File "c:\Users\Daniel\.vscode\extensions\ms-python.python-2020.11.371526539\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_trace_dispatch_regular.py", line 364 in __call__
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\.env\lib\site-packages\sklearn\utils\_pprint.py", line 335 in _safe_repr
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\.env\lib\site-packages\sklearn\utils\_pprint.py", line 172 in format
  File "C:\Python37\lib\pprint.py", line 393 in _repr
  File "C:\Python37\lib\pprint.py", line 161 in _format
  File "C:\Python37\lib\pprint.py", line 144 in pformat
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\time_series_predictor\sklearn\base.py", line 281 in __repr__
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\.env\lib\site-packages\sklearn\utils\_pprint.py", line 437 in _safe_repr
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\.env\lib\site-packages\sklearn\utils\_pprint.py", line 172 in format
  File "C:\Python37\lib\pprint.py", line 393 in _repr
  File "C:\Python37\lib\pprint.py", line 161 in _format
  File "C:\Python37\lib\pprint.py", line 144 in pformat
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\time_series_predictor\sklearn\base.py", line 281 in __repr__
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\.env\lib\site-packages\sklearn\utils\_pprint.py", line 437 in _safe_repr
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\.env\lib\site-packages\sklearn\utils\_pprint.py", line 172 in format
  File "C:\Python37\lib\pprint.py", line 393 in _repr
  File "C:\Python37\lib\pprint.py", line 161 in _format
  File "C:\Python37\lib\pprint.py", line 144 in pformat
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\time_series_predictor\sklearn\base.py", line 281 in __repr__
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\.env\lib\site-packages\sklearn\utils\_pprint.py", line 437 in _safe_repr
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\.env\lib\site-packages\sklearn\utils\_pprint.py", line 172 in format
  File "C:\Python37\lib\pprint.py", line 393 in _repr
  File "C:\Python37\lib\pprint.py", line 161 in _format
  File "C:\Python37\lib\pprint.py", line 144 in pformat
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\time_series_predictor\sklearn\base.py", line 281 in __repr__
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\.env\lib\site-packages\sklearn\utils\_pprint.py", line 437 in _safe_repr
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\.env\lib\site-packages\sklearn\utils\_pprint.py", line 172 in format
  File "C:\Python37\lib\pprint.py", line 393 in _repr
  File "C:\Python37\lib\pprint.py", line 161 in _format
  File "C:\Python37\lib\pprint.py", line 144 in pformat
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\time_series_predictor\sklearn\base.py", line 281 in __repr__
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\.env\lib\site-packages\sklearn\utils\_pprint.py", line 437 in _safe_repr
  File "c:\Users\Daniel\Workspaces\Python\time_series_predictor\.env\lib\site-packages\sklearn\utils\_pprint.py", line 172 in format

I am using ray version 1.0.1.post1

Yard1 commented 3 years ago

That's quite odd. @richardliaw, @inventormc any ideas?

You can revert back to the previous version by u installing tune-sklearn and installing it normally again.

DanielAtKrypton commented 3 years ago

I reinstalled tune-sklearn. It got the version tune-sklearn-0.1.0. The behaviours is now the previous I reported here.

richardliaw commented 3 years ago

Hey @DanielAtKrypton, what are the commands to reproduce your stack?

Also, can you try running this outside vscode (i.e., just using a terminal)?

DanielAtKrypton commented 3 years ago

Hey @DanielAtKrypton, what are the commands to reproduce your stack?

Also, can you try running this outside vscode (i.e., just using a terminal)?

Sure, I just started the test:

Imgur

I will leave it processing for now...

richardliaw commented 3 years ago

can you try instead with pytest -s -v ?

DanielAtKrypton commented 3 years ago

can you try instead with pytest -s -v ?

Sure. Here is the output: Imgur

richardliaw commented 3 years ago

OK got it; can you now try, in a python terminal:


import ray
ray.init()
@ray.remote
def hello_world():
    print("hi")
    return "hi"

print(ray.get(hello_world.remote()))
DanielAtKrypton commented 3 years ago

There you go:

Imgur

richardliaw commented 3 years ago

awesome, so now we know that the fundamental problem seems to be in ray core.

Can you try ray stop and run it again?

DanielAtKrypton commented 3 years ago

Still running. I will update as soon I have other output from the terminal...

Imgur

richardliaw commented 3 years ago

OK got it, so ray.init() is just hanging forever?

DanielAtKrypton commented 3 years ago

OK got it, so ray.init() is just hanging forever?

Yes, unfortunately it is.

richardliaw commented 3 years ago

OK. Can you try:

pip install -U [latest wheel link for windows] as found here:

https://docs.ray.io/en/master/installation.html#daily-releases-nightlies

and if that doesn't work, try downgrading to pip install ray==1.0.0?

DanielAtKrypton commented 3 years ago

I installed the latest wheel for windows and python 3.7.

Now it is behaving like this:

Imgur

I tried to open the dashboard in my browser but the browser was unable to connect there.

DanielAtKrypton commented 3 years ago

And ray status reports:

Imgur

richardliaw commented 3 years ago

try ray stop a couple times, then try the hello world again?

DanielAtKrypton commented 3 years ago

try ray stop a couple times, then try the hello world again?

I tried a couple times. It starts and hangs forever...

Imgur

richardliaw commented 3 years ago

Unfortunately this is a ray issue, and I'll close this and continue discussion on the ray side.

DanielAtKrypton commented 3 years ago

Source of this problem is being considered here.