mila-iqia / milatools

Tools to connect to and interact with the Mila cluster
MIT License
60 stars 11 forks source link

[v0.0.16] Issue running the command `mila code`: socket.gaierror: [Errno 8] nodename nor servname provided, or not known #38

Open JACKHAHA363 opened 1 year ago

JACKHAHA363 commented 1 year ago

Make sure you can reproduce the issue with the latest version available

pip install milatools --upgrade
[milatools command e.g. mila code ...]

What command did you run?

 mila code --job 3034514 /home/mila/l/luyuchen/alpaca-lora

Describe the bug

I had an interactive job runing with

salloc --gres=gpu:a100l --cpus-per-task=8 --time=12:00:00 --mem=64G

But when I try to run mila code from local laptop, I got

(base) ➜  ~ mila code --job 3034514 /home/mila/l/luyuchen/alpaca-lora
(mila) $ squeue --jobs 3034514 -ho %N
cn-g016
Traceback (most recent call last):
  File "/Users/yuchen/miniconda3/lib/python3.10/site-packages/milatools/cli/commands.py", line 42, in main
    auto_cli(milatools)
  File "/Users/yuchen/miniconda3/lib/python3.10/site-packages/coleo/cli.py", line 656, in auto_cli
    result = run_cli(entry, args, **kwargs)
  File "/Users/yuchen/miniconda3/lib/python3.10/site-packages/coleo/cli.py", line 628, in run_cli
    return call(opts=opts, args=args)
  File "/Users/yuchen/miniconda3/lib/python3.10/site-packages/coleo/cli.py", line 587, in thunk
    result = fn(*args)
  File "/Users/yuchen/miniconda3/lib/python3.10/site-packages/milatools/cli/commands.py", line 311, in code
    cnode = _find_allocation(remote)
  File "/Users/yuchen/miniconda3/lib/python3.10/site-packages/milatools/cli/commands.py", line 726, in _find_allocation
    return Remote(node_name)
  File "/Users/yuchen/miniconda3/lib/python3.10/site-packages/milatools/cli/remote.py", line 83, in __init__
    connection.open()
  File "/Users/yuchen/miniconda3/lib/python3.10/site-packages/fabric/connection.py", line 636, in open
    self.client.connect(**kwargs)
  File "/Users/yuchen/miniconda3/lib/python3.10/site-packages/paramiko/client.py", line 356, in connect
    to_try = list(self._families_and_addresses(hostname, port))
  File "/Users/yuchen/miniconda3/lib/python3.10/site-packages/paramiko/client.py", line 202, in _families_and_addresses
    addrinfos = socket.getaddrinfo(
  File "/Users/yuchen/miniconda3/lib/python3.10/socket.py", line 955, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

An error occured during the execution of the command `code`. Please try updating milatools by running
  pip install milatools --upgrade
in the terminal. If the issue persists, consider filling a bug report at https://github.com/mila-iqia/milatools/issues/new?labels=code%2C0.0.16&template=bug_report.md&title=%5Bv0.0.16%5D+Issue+running+the+command+%60mila+code%60

Desktop (please complete the following information):