Open msuzen opened 1 year ago
pushing this, as I have the same issue
Hello, I am also encountering the same issue, with the exact same error message. From looking at the logs, it looks like this happens exactly when the first trial is over.
Me too, any way to fix it?
Had the same issue.
Has anyone solved the problem?
i have same problem
I have the same problem.
I think I've found a way around this issue. In _nni/nni/runtime/command_channel/websocket/connection.py_, find the class WsConnection its receive function, and then for the function nni.load inside, pass _ignore_comments=False_
I think I've found a way around this issue. In _nni/nni/runtime/command_channel/websocket/connection.py_, find the class WsConnection its receive function, and then for the function nni.load inside, pass _ignore_comments=False_
Thank you very much!
I think I've found a way around this issue. In _nni/nni/runtime/command_channel/websocket/connection.py_, find the class WsConnection its receive function, and then for the function nni.load inside, pass _ignore_comments=False_
Does it look like this?
` def receive(self) -> Command | None:
"""
Return received message;
or return None
if the connection has been closed by peer.
"""
try:
msg = _wait(self._ws.recv())
_logger.debug(f'Received {msg}')
except websockets.ConnectionClosed: # type: ignore
_logger.debug('Connection closed by server.')
self._ws = None
_decrease_refcnt()
raise
if msg is None:
return None
# seems the library will inference whether it's text or binary, so we don't have guarantee
if isinstance(msg, bytes):
msg = msg.decode()
return nni.load(msg, ignore_comments=False)`
I think I've found a way around this issue. In _nni/nni/runtime/command_channel/websocket/connection.py_, find the class WsConnection its receive function, and then for the function nni.load inside, pass _ignore_comments=False_
Does it look like this?
def receive(self) -> Command | None: """ Return received message; or return
None` if the connection has been closed by peer. """ try: msg = _wait(self._ws.recv()) _logger.debug(f'Received {msg}') except websockets.ConnectionClosed: # type: ignore _logger.debug('Connection closed by server.') self._ws = None _decrease_refcnt() raiseif msg is None: return None # seems the library will inference whether it's text or binary, so we don't have guarantee if isinstance(msg, bytes): msg = msg.decode() return nni.load(msg, ignore_comments=False)`
Yes. Exactly. For my case, there are some strings that probably are not comments, but are regarded as comments in the json decoding phase, which leads to the failure. I just set the ignore_comments to be False and then it works.
Describe the issue:
Custom NAS job with Pytorch models gives command error from NNI runtime, see below for the message. Job only completes if
exp.config.max_trial_number
is equal toexp.config.trial_concurrency
.Environment:
Error Message: