Open manuel-delverme opened 3 years ago
/home/esac/projects/venv/bin/python /home/esac/projects/constrained_nn/train.py experiment_id: [CLUSTER] 10xlry!! HEAD is now at 969412d vanishing gradient To github.com:manuel-delverme/constrained_nn.git * [new tag] snapshot/master/98f67cdbd30f42e01b933020d0d26bfa3bda1fb2 -> snapshot/master/98f67cdbd30f42e01b933020d0d26bfa3bda1fb2 monitor your run on https://wandb.ai/ Switched to branch 'master' /tmp/experiment_buddy-za1Cijk4ek Slurmctld(primary) at slurm is UP Slurmctld(backup) at slurmctl is DOWN bash -l /tmp/experiment_buddy-za1Cijk4ek/run_experiment.sh git@github.com:manuel-delverme/constrained_nn.git train.py 98f67cdbd30f42e01b933020d0d26bfa3bda1fb2 0%| | 0/1 [00:00<?, ?it/s][DEPLOY LOG] Refreshing modules... The following modules were not unloaded: (Use "module --force purge" to unload all): 1) gcc/7.4.0 2) Mila [=== Module python/3.7 loaded ===] [DEPLOY LOG] script realpath: /tmp/experiment_buddy-za1Cijk4ek/run_experiment.sh [DEPLOY LOG] scripts home: /tmp/experiment_buddy-za1Cijk4ek [DEPLOY LOG] cd /home/mila/d/delvermm/experiments/ [DEPLOY LOG] EXPERIMENT_FOLDER=./tmp.vxs375WLcC [DEPLOY LOG] downloading source code from git@github.com:manuel-delverme/constrained_nn.git to ./tmp.vxs375WLcC Cloning into './tmp.vxs375WLcC'... Note: checking out '98f67cdbd30f42e01b933020d0d26bfa3bda1fb2'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b <new-branch-name> HEAD is now at 98f67cd [CLUSTER] 10xlry!! [DEPLOY LOG] pwd is now /home/mila/d/delvermm/experiments/tmp.vxs375WLcC [DEPLOY LOG] Using shared venv @ /home/mila/d/delvermm/venv Requirement already satisfied: pip in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (21.1.2) [DEPLOY LOG] installing experiment_buddy WARNING: Ignoring invalid distribution -xperiment-buddy (/home/mila/d/delvermm/venv/lib/python3.7/site-packages) Obtaining experiment_buddy from git+https://github.com/ministry-of-silly-code/experiment_buddy#egg=experiment_buddy Updating /home/mila/d/delvermm/venv/src/experiment-buddy clone Running command git fetch -q --tags Running command git reset --hard -q 094e277a4d2396d0c40cb183c18771c900496886 Requirement already satisfied: GitPython in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy) (3.1.14) Requirement already satisfied: tensorboardX in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy) (2.2) Requirement already satisfied: matplotlib in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy) (3.4.1) Requirement already satisfied: wandb in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy) (0.10.24) Requirement already satisfied: fabric in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy) (2.6.0) Requirement already satisfied: cloudpickle in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy) (1.2.2) Requirement already satisfied: PyYaml in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy) (5.4.1) Requirement already satisfied: paramiko in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy) (2.7.2) Requirement already satisfied: tqdm in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy) (4.59.0) Requirement already satisfied: aiohttp in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy) (3.7.4.post0) Requirement already satisfied: attrs>=17.3.0 in /cvmfs/ai.mila.quebec/apps/x86_64/debian/python/3.7/lib/python3.7/site-packages (from aiohttp->experiment_buddy) (20.2.0) Requirement already satisfied: yarl<2.0,>=1.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from aiohttp->experiment_buddy) (1.6.3) Requirement already satisfied: multidict<7.0,>=4.5 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from aiohttp->experiment_buddy) (5.1.0) Requirement already satisfied: chardet<5.0,>=2.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from aiohttp->experiment_buddy) (3.0.4) Requirement already satisfied: typing-extensions>=3.6.5 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from aiohttp->experiment_buddy) (3.7.4.3) Requirement already satisfied: async-timeout<4.0,>=3.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from aiohttp->experiment_buddy) (3.0.1) Requirement already satisfied: idna>=2.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from yarl<2.0,>=1.0->aiohttp->experiment_buddy) (2.10) Requirement already satisfied: pathlib2 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from fabric->experiment_buddy) (2.3.5) Requirement already satisfied: invoke<2.0,>=1.3 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from fabric->experiment_buddy) (1.5.0) Requirement already satisfied: cryptography>=2.5 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from paramiko->experiment_buddy) (3.4.7) Requirement already satisfied: pynacl>=1.0.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from paramiko->experiment_buddy) (1.4.0) Requirement already satisfied: bcrypt>=3.1.3 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from paramiko->experiment_buddy) (3.2.0) Requirement already satisfied: cffi>=1.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from bcrypt>=3.1.3->paramiko->experiment_buddy) (1.14.5) Requirement already satisfied: six>=1.4.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from bcrypt>=3.1.3->paramiko->experiment_buddy) (1.15.0) Requirement already satisfied: pycparser in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from cffi>=1.1->bcrypt>=3.1.3->paramiko->experiment_buddy) (2.20) Requirement already satisfied: gitdb<5,>=4.0.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from GitPython->experiment_buddy) (4.0.7) Requirement already satisfied: smmap<5,>=3.0.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from gitdb<5,>=4.0.1->GitPython->experiment_buddy) (4.0.0) Requirement already satisfied: pyparsing>=2.2.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from matplotlib->experiment_buddy) (2.4.7) Requirement already satisfied: cycler>=0.10 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from matplotlib->experiment_buddy) (0.10.0) Requirement already satisfied: kiwisolver>=1.0.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from matplotlib->experiment_buddy) (1.3.1) Requirement already satisfied: numpy>=1.16 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from matplotlib->experiment_buddy) (1.20.2) Requirement already satisfied: python-dateutil>=2.7 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from matplotlib->experiment_buddy) (2.8.1) Requirement already satisfied: pillow>=6.2.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from matplotlib->experiment_buddy) (7.2.0) Requirement already satisfied: protobuf>=3.8.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from tensorboardX->experiment_buddy) (3.15.7) Requirement already satisfied: sentry-sdk>=0.4.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy) (1.0.0) Requirement already satisfied: configparser>=3.8.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy) (5.0.2) Requirement already satisfied: Click>=7.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy) (7.1.2) Requirement already satisfied: subprocess32>=3.5.3 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy) (3.5.4) Requirement already satisfied: pathtools in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy) (0.1.2) Requirement already satisfied: promise<3,>=2.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy) (2.3) Requirement already satisfied: psutil>=5.0.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy) (5.8.0) Requirement already satisfied: shortuuid>=0.5.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy) (1.0.1) Requirement already satisfied: requests<3,>=2.0.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy) (2.25.1) Requirement already satisfied: docker-pycreds>=0.4.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy) (0.4.0) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from requests<3,>=2.0.0->wandb->experiment_buddy) (1.26.4) Requirement already satisfied: certifi>=2017.4.17 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from requests<3,>=2.0.0->wandb->experiment_buddy) (2020.12.5) WARNING: Ignoring invalid distribution -xperiment-buddy (/home/mila/d/delvermm/venv/lib/python3.7/site-packages) WARNING: Error parsing requirements for experiment-buddy: [Errno 2] No such file or directory: '/home/mila/d/delvermm/venv/lib/python3.7/site-packages/experiment_buddy-0.0.1.dist-info/METADATA' Installing collected packages: experiment-buddy Attempting uninstall: experiment-buddy Found existing installation: experiment-buddy 0.0.1 ERROR: Exception: Traceback (most recent call last): File "/home/mila/d/delvermm/venv/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 180, in _main status = self.run(options, args) File "/home/mila/d/delvermm/venv/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 204, in wrapper return func(self, options, args) File "/home/mila/d/delvermm/venv/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 402, in run pycompile=options.compile, File "/home/mila/d/delvermm/venv/lib/python3.7/site-packages/pip/_internal/req/__init__.py", line 71, in install_given_reqs auto_confirm=True File "/home/mila/d/delvermm/venv/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 671, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/mila/d/delvermm/venv/lib/python3.7/site-packages/pip/_internal/req/req_uninstall.py", line 537, in from_dist link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/mila/d/delvermm/venv/src/experiment-buddy does not match installed location of experiment-buddy (at /home/mila/d/delvermm/venv/lib/python3.7/site-packages) 0%| | 0/1 [02:51<?, ?it/s] Traceback (most recent call last): File "/home/esac/projects/constrained_nn/train.py", line 11, in <module> import config File "/home/esac/projects/constrained_nn/config.py", line 65, in <module> tb = experiment_buddy.deploy( File "/home/esac/projects/experiment_buddy/experiment_buddy/utils.py", line 46, in wrapped_f retr = f(*args, **kwargs) File "/home/esac/projects/experiment_buddy/experiment_buddy/experiment_buddy.py", line 209, in deploy _commit_and_sendjob(host, experiment_id, sweep_yaml, git_repo, project_name, proc_num, extra_slurm_headers, wandb_kwargs) File "/home/esac/projects/experiment_buddy/experiment_buddy/experiment_buddy.py", line 343, in _commit_and_sendjob ssh_session.run(ssh_command) File "<decorator-gen-3>", line 2, in run File "/home/esac/projects/venv/lib/python3.8/site-packages/fabric/connection.py", line 30, in opens return method(self, *args, **kwargs) File "/home/esac/projects/venv/lib/python3.8/site-packages/fabric/connection.py", line 723, in run return self._run(self._remote_runner(), command, **kwargs) File "/home/esac/projects/venv/lib/python3.8/site-packages/invoke/context.py", line 101, in _run return runner.run(command, **kwargs) File "/home/esac/projects/venv/lib/python3.8/site-packages/invoke/runners.py", line 363, in run return self._run_body(command, **kwargs) File "/home/esac/projects/venv/lib/python3.8/site-packages/invoke/runners.py", line 422, in _run_body return self.make_promise() if self._asynchronous else self._finish() File "/home/esac/projects/venv/lib/python3.8/site-packages/invoke/runners.py", line 489, in _finish raise UnexpectedExit(result) invoke.exceptions.UnexpectedExit: Encountered a bad command exit code! Command: 'bash -l /tmp/experiment_buddy-za1Cijk4ek/run_experiment.sh git@github.com:manuel-delverme/constrained_nn.git train.py 98f67cdbd30f42e01b933020d0d26bfa3bda1fb2' Exit code: 2 Stdout: already printed Stderr: already printed Process finished with exit code 1
I tried to run two single deploys at the same time