mila-iqia / milatools

Tools to connect to and interact with the Mila cluster
MIT License
63 stars 12 forks source link

[v0.0.18] Issue running the command `mila serve` #116

Open MinaArzaghi opened 6 months ago

MinaArzaghi commented 6 months ago

Make sure you can reproduce the issue with the latest version available

pip install milatools --upgrade
[milatools command e.g. mila code ...]

done

What command did you run?

[e.g. mila code ...] mila serve lab --node cn-a007

Describe the bug

A clear and concise description of what the bug is. If there is an error traceback, please paste it here.

I'm not sure how to explain the error, I'm trying to connect to cn-a007 and I get:

Exception (client): Error reading SSH protocol banner
Traceback (most recent call last):
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/paramiko/transport.py", line 2292, in _check_banner
    buf = self.packetizer.readline(timeout)
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/paramiko/packet.py", line 374, in readline
    buf += self._read_timeout(timeout)
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/paramiko/packet.py", line 611, in _read_timeout
    raise socket.timeout()
socket.timeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/paramiko/transport.py", line 2113, in run
    self._check_banner()
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/paramiko/transport.py", line 2296, in _check_banner
    raise SSHException(
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner

Traceback (most recent call last):
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/paramiko/transport.py", line 2292, in _check_banner
    buf = self.packetizer.readline(timeout)
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/paramiko/packet.py", line 374, in readline
    buf += self._read_timeout(timeout)
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/paramiko/packet.py", line 611, in _read_timeout
    raise socket.timeout()
socket.timeout

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/milatools/cli/commands.py", line 43, in main
    auto_cli(milatools)
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/coleo/cli.py", line 656, in auto_cli
    result = run_cli(entry, args, **kwargs)
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/coleo/cli.py", line 628, in run_cli
    return call(opts=opts, args=args)
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/coleo/cli.py", line 587, in thunk
    result = fn(*args)
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/milatools/cli/commands.py", line 431, in lab
    _standard_server(
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/milatools/cli/commands.py", line 561, in _standard_server
    remote = Remote("mila")
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/milatools/cli/remote.py", line 84, in __init__
    connection.open()
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/fabric/connection.py", line 636, in open
    self.client.connect(**kwargs)
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/paramiko/client.py", line 451, in connect
    t.start_client(timeout=timeout)
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/paramiko/transport.py", line 722, in start_client
    raise e
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/paramiko/transport.py", line 2113, in run
    self._check_banner()
  File "/Users/minaz/opt/anaconda3/lib/python3.9/site-packages/paramiko/transport.py", line 2296, in _check_banner
    raise SSHException(
paramiko.ssh_exception.SSHException: Error reading SSH protocol banner

An error occured during the execution of the command `serve`. Please try updating milatools by running
  pip install milatools --upgrade
in the terminal. If the issue persists, consider filling a bug report at
  https://github.com/mila-iqia/milatools/issues/new?labels=serve%2C0.0.18&template=bug_report.md&title=%5Bv0.0.18%5D+Issue+running+the+command+%60mila+serve%60
Please provide the error traceback with the report (the red text above).

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

lebrice commented 6 months ago

This is the common "Error reading SSH protocol banner" error from paramiko, which I don't yet have a fix for (at least not until #107 or #105 are merged). For the moment, I'm sorry to say that your best bet might very well be to just try again, or with mila code (which will most likely give you the same error), or to connect to a compute node manually with the "remote-ssh" extension of VsCode