ministry-of-silly-code / experiment_buddy

GNU Affero General Public License v3.0
9 stars 5 forks source link

(boring) life of a deployment #52

Open manuel-delverme opened 3 years ago

manuel-delverme commented 3 years ago

ran a timed run_experiment.sh

/home/esac/projects/venv/bin/python /home/esac/projects/constrained_nn/train.py
experiment_id: [CLUSTER] faster_theta
HEAD is now at 969412d vanishing gradient
To github.com:manuel-delverme/constrained_nn.git
 * [new tag]         snapshot/master/0d90a073af91b2c9061fb66e753f7cfdde2d96ab -> snapshot/master/0d90a073af91b2c9061fb66e753f7cfdde2d96ab
Switched to branch 'master'
monitor your run on https://wandb.ai/
/tmp/experiment_buddy-X3wiF1mTAh
Slurmctld(primary) at slurm is UP
Slurmctld(backup) at slurmctl is DOWN
bash -l /tmp/experiment_buddy-X3wiF1mTAh/run_experiment.sh git@github.com:manuel-delverme/constrained_nn.git train.py 0d90a073af91b2c9061fb66e753f7cfdde2d96ab
  0%|          | 0/1 [00:00<?, ?it/s][DEPLOY LOG] Refreshing modules...
The following modules were not unloaded:
  (Use "module --force purge" to unload all):

  1) gcc/7.4.0   2) Mila

real    0m0.221s
user    0m0.081s
sys 0m0.016s
[=== Module python/3.7 loaded ===]

real    0m0.170s
user    0m0.095s
sys 0m0.000s
[DEPLOY LOG] script realpath: /tmp/experiment_buddy-X3wiF1mTAh/run_experiment.sh
[DEPLOY LOG] scripts home: /tmp/experiment_buddy-X3wiF1mTAh
[DEPLOY LOG] cd /home/mila/d/delvermm/experiments/
[DEPLOY LOG] EXPERIMENT_FOLDER=./tmp.2ZZ2eN9Fc5
[DEPLOY LOG] downloading source code from git@github.com:manuel-delverme/constrained_nn.git to ./tmp.2ZZ2eN9Fc5
Cloning into './tmp.2ZZ2eN9Fc5'...

real    1m2.310s
user    0m0.444s
sys 0m1.290s
Note: checking out '0d90a073af91b2c9061fb66e753f7cfdde2d96ab'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 0d90a07 [CLUSTER] faster_theta
[DEPLOY LOG] pwd is now /home/mila/d/delvermm/experiments/tmp.2ZZ2eN9Fc5

real    0m0.563s
user    0m0.002s
sys 0m0.014s
[DEPLOY LOG] Using shared venv @ /home/mila/d/delvermm/venv
Requirement already satisfied: pip in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (21.1.2)

real    0m39.978s
user    0m1.965s
sys 0m0.688s
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting experiment_buddy
  Cloning ssh://****@github.com/ministry-of-silly-code/experiment_buddy.git to /tmp/pip-install-3oofsdh2/experiment-buddy_f3489c084dc9444bb379c52920d27f08
  Running command git clone -q 'ssh://****@github.com/ministry-of-silly-code/experiment_buddy.git' /tmp/pip-install-3oofsdh2/experiment-buddy_f3489c084dc9444bb379c52920d27f08
Requirement already satisfied: tqdm in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from -r requirements.txt (line 1)) (4.59.0)
Requirement already satisfied: torchvision==0.8.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from -r requirements.txt (line 2)) (0.8.0)
Requirement already satisfied: GitPython in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy->-r requirements.txt (line 3)) (3.1.14)
Requirement already satisfied: tensorboardX in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy->-r requirements.txt (line 3)) (2.2)
Requirement already satisfied: matplotlib in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy->-r requirements.txt (line 3)) (3.4.1)
Requirement already satisfied: wandb in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy->-r requirements.txt (line 3)) (0.10.24)
Requirement already satisfied: fabric in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy->-r requirements.txt (line 3)) (2.6.0)
Requirement already satisfied: cloudpickle in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy->-r requirements.txt (line 3)) (1.2.2)
Requirement already satisfied: PyYaml in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy->-r requirements.txt (line 3)) (5.4.1)
Requirement already satisfied: paramiko in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy->-r requirements.txt (line 3)) (2.7.2)
Requirement already satisfied: aiohttp in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from experiment_buddy->-r requirements.txt (line 3)) (3.7.4.post0)
Requirement already satisfied: torch==1.7.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from torchvision==0.8.0->-r requirements.txt (line 2)) (1.7.0+cu92)
Requirement already satisfied: numpy in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from torchvision==0.8.0->-r requirements.txt (line 2)) (1.20.2)
Requirement already satisfied: pillow>=4.1.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from torchvision==0.8.0->-r requirements.txt (line 2)) (7.2.0)
Requirement already satisfied: dataclasses in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from torch==1.7.0->torchvision==0.8.0->-r requirements.txt (line 2)) (0.6)
Requirement already satisfied: future in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from torch==1.7.0->torchvision==0.8.0->-r requirements.txt (line 2)) (0.18.2)
Requirement already satisfied: typing-extensions in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from torch==1.7.0->torchvision==0.8.0->-r requirements.txt (line 2)) (3.7.4.3)
Requirement already satisfied: yarl<2.0,>=1.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from aiohttp->experiment_buddy->-r requirements.txt (line 3)) (1.6.3)
Requirement already satisfied: multidict<7.0,>=4.5 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from aiohttp->experiment_buddy->-r requirements.txt (line 3)) (5.1.0)
Requirement already satisfied: chardet<5.0,>=2.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from aiohttp->experiment_buddy->-r requirements.txt (line 3)) (3.0.4)
Requirement already satisfied: async-timeout<4.0,>=3.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from aiohttp->experiment_buddy->-r requirements.txt (line 3)) (3.0.1)
Requirement already satisfied: attrs>=17.3.0 in /cvmfs/ai.mila.quebec/apps/x86_64/debian/python/3.7/lib/python3.7/site-packages (from aiohttp->experiment_buddy->-r requirements.txt (line 3)) (20.2.0)
Requirement already satisfied: idna>=2.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from yarl<2.0,>=1.0->aiohttp->experiment_buddy->-r requirements.txt (line 3)) (2.10)
Requirement already satisfied: invoke<2.0,>=1.3 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from fabric->experiment_buddy->-r requirements.txt (line 3)) (1.5.0)
Requirement already satisfied: pathlib2 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from fabric->experiment_buddy->-r requirements.txt (line 3)) (2.3.5)
Requirement already satisfied: cryptography>=2.5 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from paramiko->experiment_buddy->-r requirements.txt (line 3)) (3.4.7)
Requirement already satisfied: pynacl>=1.0.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from paramiko->experiment_buddy->-r requirements.txt (line 3)) (1.4.0)
Requirement already satisfied: bcrypt>=3.1.3 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from paramiko->experiment_buddy->-r requirements.txt (line 3)) (3.2.0)
Requirement already satisfied: six>=1.4.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from bcrypt>=3.1.3->paramiko->experiment_buddy->-r requirements.txt (line 3)) (1.15.0)
Requirement already satisfied: cffi>=1.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from bcrypt>=3.1.3->paramiko->experiment_buddy->-r requirements.txt (line 3)) (1.14.5)
Requirement already satisfied: pycparser in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from cffi>=1.1->bcrypt>=3.1.3->paramiko->experiment_buddy->-r requirements.txt (line 3)) (2.20)
Requirement already satisfied: gitdb<5,>=4.0.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from GitPython->experiment_buddy->-r requirements.txt (line 3)) (4.0.7)
Requirement already satisfied: smmap<5,>=3.0.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from gitdb<5,>=4.0.1->GitPython->experiment_buddy->-r requirements.txt (line 3)) (4.0.0)
Requirement already satisfied: cycler>=0.10 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from matplotlib->experiment_buddy->-r requirements.txt (line 3)) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from matplotlib->experiment_buddy->-r requirements.txt (line 3)) (1.3.1)
Requirement already satisfied: python-dateutil>=2.7 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from matplotlib->experiment_buddy->-r requirements.txt (line 3)) (2.8.1)
Requirement already satisfied: pyparsing>=2.2.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from matplotlib->experiment_buddy->-r requirements.txt (line 3)) (2.4.7)
Requirement already satisfied: protobuf>=3.8.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from tensorboardX->experiment_buddy->-r requirements.txt (line 3)) (3.15.7)
Requirement already satisfied: psutil>=5.0.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy->-r requirements.txt (line 3)) (5.8.0)
Requirement already satisfied: pathtools in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy->-r requirements.txt (line 3)) (0.1.2)
Requirement already satisfied: Click>=7.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy->-r requirements.txt (line 3)) (7.1.2)
Requirement already satisfied: promise<3,>=2.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy->-r requirements.txt (line 3)) (2.3)
Requirement already satisfied: sentry-sdk>=0.4.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy->-r requirements.txt (line 3)) (1.0.0)
Requirement already satisfied: docker-pycreds>=0.4.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy->-r requirements.txt (line 3)) (0.4.0)
Requirement already satisfied: requests<3,>=2.0.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy->-r requirements.txt (line 3)) (2.25.1)
Requirement already satisfied: configparser>=3.8.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy->-r requirements.txt (line 3)) (5.0.2)
Requirement already satisfied: shortuuid>=0.5.0 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy->-r requirements.txt (line 3)) (1.0.1)
Requirement already satisfied: subprocess32>=3.5.3 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from wandb->experiment_buddy->-r requirements.txt (line 3)) (3.5.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from requests<3,>=2.0.0->wandb->experiment_buddy->-r requirements.txt (line 3)) (1.26.4)
Requirement already satisfied: certifi>=2017.4.17 in /home/mila/d/delvermm/venv/lib/python3.7/site-packages (from requests<3,>=2.0.0->wandb->experiment_buddy->-r requirements.txt (line 3)) (2020.12.5)

real    1m6.293s
user    0m2.629s
sys 0m1.062s
[DEPLOY LOG] /opt/slurm/bin/sbatch /tmp/experiment_buddy-X3wiF1mTAh/srun_python.sh train.py
Submitted batch job 944856
100%|██████████| 1/1 [02:52<00:00, 172.24s/it]

Process finished with exit code 0
manuel-delverme commented 3 years ago

1 min to clone, maybe we can use shallow clones? get only the snapshot version? 1m6.293s to pip install, should be reduced by @DrTtnk PR

manuel-delverme commented 3 years ago

this is without the double buddy install whihc makes everything much much slower

DrTtnk commented 3 years ago

@manuel-delverme Do you have some big project to fork? The bigger, the better