mila-iqia / milatools

Tools to connect to and interact with the Mila cluster
MIT License
63 stars 12 forks source link

[v0.0.16] Issue when using the --persist flag for Mila code #34

Open anna-richter opened 1 year ago

anna-richter commented 1 year ago

Make sure you can reproduce the issue with the latest version available

pip install milatools --upgrade
[milatools command e.g. mila code ...]

What command did you run?

mila code /network/scratch/a/anna.richter/ --persist --alloc --gres=gpu:1 --partition=long --mem=32G --time=0-10:00:00

Describe the bug

A clear and concise description of what the bug is. If there is an error traceback, please paste it here.

The error solely occurs when using the --persist flag

Traceback (most recent call last):
  File "C:\Users\Anna Richter\Documents\GitHub\Biasly_Mila\venv\lib\site-packages\milatools\cli\commands.py", line 42, in main
    auto_cli(milatools)
  File "C:\Users\Anna Richter\Documents\GitHub\Biasly_Mila\venv\lib\site-packages\coleo\cli.py", line 656, in auto_cli
    result = run_cli(entry, args, **kwargs)
  File "C:\Users\Anna Richter\Documents\GitHub\Biasly_Mila\venv\lib\site-packages\coleo\cli.py", line 628, in run_cli
    return call(opts=opts, args=args)
  File "C:\Users\Anna Richter\Documents\GitHub\Biasly_Mila\venv\lib\site-packages\coleo\cli.py", line 587, in thunk
    result = fn(*args)
  File "C:\Users\Anna Richter\Documents\GitHub\Biasly_Mila\venv\lib\site-packages\milatools\cli\commands.py", line 314, in code
    data, proc = cnode.ensure_allocation()
  File "C:\Users\Anna Richter\Documents\GitHub\Biasly_Mila\venv\lib\site-packages\milatools\cli\remote.py", line 251, in ensure_allocation     
    proc, results = self.extract(
  File "C:\Users\Anna Richter\Documents\GitHub\Biasly_Mila\venv\lib\site-packages\milatools\cli\remote.py", line 139, in extract
    proc = self.run(cmd, asynchronous=True, out_stream=qio, **kwargs)
  File "C:\Users\Anna Richter\Documents\GitHub\Biasly_Mila\venv\lib\site-packages\milatools\cli\remote.py", line 127, in run
    cmd = transform(cmd)
  File "C:\Users\Anna Richter\Documents\GitHub\Biasly_Mila\venv\lib\site-packages\milatools\cli\remote.py", line 234, in srun_transform_persist
    self.puttext(batch, batch_file)
  File "C:\Users\Anna Richter\Documents\GitHub\Biasly_Mila\venv\lib\site-packages\milatools\cli\remote.py", line 178, in puttext
    self.put(f.name, dest)
  File "C:\Users\Anna Richter\Documents\GitHub\Biasly_Mila\venv\lib\site-packages\milatools\cli\remote.py", line 170, in put
    return self.connection.put(src, dest)
  File "C:\Users\Anna Richter\Documents\GitHub\Biasly_Mila\venv\lib\site-packages\fabric\connection.py", line 870, in put
    return Transfer(self).put(*args, **kwargs)
  File "C:\Users\Anna Richter\Documents\GitHub\Biasly_Mila\venv\lib\site-packages\fabric\transfer.py", line 311, in put
    self.sftp.put(localpath=local, remotepath=remote)
  File "C:\Users\Anna Richter\Documents\GitHub\Biasly_Mila\venv\lib\site-packages\paramiko\sftp_client.py", line 758, in put
    with open(localpath, "rb") as fl:
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\ANNARI~1\\AppData\\Local\\Temp\\tmp79b62pzx'
An error occured during the execution of the command `code`. Please try updating milatools by running
  pip install milatools --upgrade
in the terminal. If the issue persists, consider filling a bug report at https://github.com/mila-iqia/milatools/issues/new?labels=code%2C0.0.16&template=bug_report.md&title=%5Bv0.0.16%5D+Issue+running+the+command+%60mila+code%60

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Windows 10 pro Version 22H2

Additional context

Add any other context about the problem here.

lebrice commented 1 year ago

This seems relevant:

https://stackoverflow.com/questions/6416782/what-is-namedtemporaryfile-useful-for-on-windows

Especially because of these lines here: https://github.com/mila-iqia/milatools/blob/1e5f211abe94ad2abdc3d10143bb5d26f08c33ea/milatools/cli/remote.py#L175-L178

It seems like self.put uses the Connection object from fabric, which opens the file a second time. This behaviour is different on Windows, apparently we're not guaranteed that the file can be opened multiple times within the scope (:question: :exclamation: )

lebrice commented 1 year ago

Update: Using WSL on Windows (instead of PowerShell) is a way to step around this issue.