microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
13.88k stars 1.81k forks source link

ERROR (nni.runtime.msg_dispatcher_base/Thread-1) #5739

Open Moreau14 opened 5 months ago

Moreau14 commented 5 months ago

dispacher log:

ERROR (nni.runtime.msg_dispatcher_base/Thread-1) 68 Traceback (most recent call last): File "/home/mm/anaconda3/envs/CLNode_env/lib/python3.8/site-packages/nni/runtime/msg_dispatcher_base.py", line 108, in command_queue_worker self.process_command(command, data) File "/home/mm/anaconda3/envs/CLNode_env/lib/python3.8/site-packages/nni/runtime/msg_dispatcher_base.py", line 154, in process_command command_handlerscommand File "/home/mm/anaconda3/envs/CLNode_env/lib/python3.8/site-packages/nni/runtime/msg_dispatcher.py", line 148, in handle_report_metric_data self._handle_final_metric_data(data) File "/home/mm/anaconda3/envs/CLNode_env/lib/python3.8/site-packages/nni/runtime/msg_dispatcher.py", line 201, in _handle_final_metric_data self.tuner.receive_trialresult(id, _trialparams[id], value, customized=customized, File "/home/mm/anaconda3/envs/CLNode_env/lib/python3.8/site-packages/nni/algorithms/hpo/tpe_tuner.py", line 197, in receive_trial_result params = self._running_params.pop(parameter_id) KeyError: 68

NNImanager log:

ERROR (WsChannel.default) Channel closed. Ignored command { type: 'GE', content: '1' } [2024-01-25 11:08:42] WARNING (WsConnection.default) Missing pong [2024-01-25 11:08:47] WARNING (WsConnection.default) Missing pong [2024-01-25 11:08:47] ERROR (WsConnection.default) Failed sending command. Drop connection: Error: WebSocket is not open: readyState 3 (CLOSED) at sendAfterClose (/home/mm/anaconda3/envs/CLNode_env/lib/python3.8/site-packages/nni_node/node_modules/express-ws/node_modules/ws/lib/websocket.js:988:17) at WebSocket.send (/home/mm/anaconda3/envs/CLNode_env/lib/python3.8/site-packages/nni_node/node_modules/express-ws/node_modules/ws/lib/websocket.js:405:7) at node:internal/util:375:7 at new Promise () at bound send (node:internal/util:361:12) at WsConnection.sendAsync (/home/mm/anaconda3/envs/CLNode_env/lib/python3.8/site-packages/nni_node/common/command_channel/websocket/connection.js:92:16) at WsConnection.heartbeat (/home/mm/anaconda3/envs/CLNode_env/lib/python3.8/site-packages/nni_node/common/command_channel/websocket/connection.js:144:18) at listOnTimeout (node:internal/timers:569:17) at process.processTimers (node:internal/timers:512:7)

v-JiangNan commented 2 weeks ago

I ran into the same problem, ran several trials and got this error. I found out that it was due to running NNI in the screen command.

Xiuchen519 commented 1 day ago

I ran into the same problem, ran several trials and got this error. I found out that it was due to running NNI in the screen command.

Yes, you can't run NNI in the tmux either.