microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
13.97k stars 1.81k forks source link

Dispatcher crash with TPE KeyError #5798

Open Fripplebubby opened 2 months ago

Fripplebubby commented 2 months ago

Describe the issue: It seems the dispatcher crashes for me from unknown causes, and when this happens, my experiment stops running.

Environment:

Configuration:

Log message:



**How to reproduce it?**: 

It happens not just once for me, but occasionally with different experiments. I tried lowering concurrency to 1 in order to avoid it, but it appears nonetheless.

In this example, it was trial 45 evidently which caused the crash. In the web ui, I can see that trial 45 succeeded and there is a recorded metric value for it. Yet, when TPE goes to find its parameters, it seems it cannot find them?
sertreet commented 1 month ago

plus one.me too

Lionelsy commented 3 weeks ago

Same problem while using the TPE.

redLinmumu commented 1 week ago

Same question.