iterative / terraform-provider-iterative

☁️ Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes
https://registry.terraform.io/providers/iterative/iterative/latest/docs
Apache License 2.0
287 stars 27 forks source link

leo delete doesn't sync data back? #712

Closed dacbd closed 1 year ago

dacbd commented 1 year ago

it looks like leo delete with --output and --workdir doesn't sync changes back.

example main.tf (uses: https://github.com/iterative/magnetic-tiles-defect):

terraform {
  required_providers { iterative = { source = "iterative/iterative" } }
}
provider "iterative" {}

resource "iterative_task" "gpu-runner" {
  cloud = "aws"
  machine = "m+t4"
  timeout = 7200 #2 hrs
  region = "us-west-1"
  image = "nvidia"
  disk_size = 100
  permission_set = "arn:aws:iam::342840881361:instance-profile/tpi-vscode-example"
  storage {
    workdir = "."
    output = "."
  }
  environment = {
    "REPO_TOKEN" = ""
  }
  script = <<-END
    #!/bin/bash
    # setup project requirments
    apt-get update
    apt-get install python3.9 -y

    nvidia-smi

    # Run project
    pipenv install --skip-lock
    pipenv run dvc pull
    pipenv run dvc repro --force

    git status
  END
}

executing commands

#!/bin/bash

leo_id=$(leo create \
  --cloud aws \
  --region us-west-1)

echo "id: $leo_id"

./leo read \
  --cloud aws \
  --region us-west-1 \
  --follow "$leo_id"

leo delete \
  --cloud aws \
  --region us-west-1 \
  --workdir . \
  --output . \
  "$leo_id"
logs

``` $ ./tpi-run.sh INFO Using identifier tpi-absolutely-square-skunk-49ay0tcn-92xo2t3b INFO Creating resources... INFO [1/12] Parsing PermissionSet... INFO [2/12] Importing DefaultVPC... INFO [3/12] Importing DefaultVPCSubnets... INFO [4/12] Reading Image... INFO [5/12] Creating Bucket... INFO [6/12] Creating SecurityGroup... INFO [7/12] Creating KeyPair... INFO [8/12] Reading Credentials... INFO [9/12] Creating LaunchTemplate... INFO [10/12] Creating AutoScalingGroup... INFO [11/12] Uploading Directory... INFO Transferring 99.55MB (853 files)... INFO 9.079 MiB / 94.938 MiB, 10%, 929.045 KiB/s, ETA 1m34s (xfr#153/853) INFO 19.638 MiB / 94.938 MiB, 21%, 1018.368 KiB/s, ETA 1m15s (xfr#296/853) INFO 26.135 MiB / 94.938 MiB, 28%, 850.263 KiB/s, ETA 1m22s (xfr#390/853) INFO 34.420 MiB / 94.938 MiB, 36%, 841.356 KiB/s, ETA 1m13s (xfr#497/853) INFO 44.787 MiB / 94.938 MiB, 47%, 933.029 KiB/s, ETA 55s (xfr#635/853) INFO 58.284 MiB / 94.939 MiB, 61%, 1.113 MiB/s, ETA 32s (xfr#756/853) INFO 71.340 MiB / 94.939 MiB, 75%, 1.206 MiB/s, ETA 19s (xfr#844/853) INFO 84.857 MiB / 94.939 MiB, 89%, 1.283 MiB/s, ETA 7s INFO [12/12] Starting task... INFO Creation completed id: tpi-absolutely-square-skunk-49ay0tcn-92xo2t3b INFO Reading resources... (this may happen several times) INFO [1/9] Reading DefaultVPC... INFO [2/9] Reading DefaultVPCSubnets... INFO [3/9] Reading Image... INFO [4/9] Reading Bucket... INFO [5/9] Reading SecurityGroup... INFO [6/9] Reading KeyPair... INFO [7/9] Reading Credentials... INFO [8/9] Reading LaunchTemplate... INFO [9/9] Reading AutoScalingGroup... INFO Read completed Waiting for instance...................... Started tpi-task.service. Get:1 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal InRelease [265 kB] Get:2 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB] Get:3 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB] Hit:4 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 InRelease Hit:5 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 InRelease Hit:6 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 InRelease Get:7 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB] Get:8 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/main amd64 Packages [970 kB] Get:9 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/main Translation-en [506 kB] Get:10 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/main amd64 c-n-f Metadata [29.5 kB] Get:11 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [22.0 kB] Get:12 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/restricted Translation-en [6212 B] Get:13 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/restricted amd64 c-n-f Metadata [392 B] Get:14 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/universe amd64 Packages [8628 kB] Get:15 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/universe Translation-en [5124 kB] Get:16 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [1822 kB] Get:17 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/universe amd64 c-n-f Metadata [265 kB] Get:18 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [144 kB] Get:19 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/multiverse Translation-en [104 kB] Get:20 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/multiverse amd64 c-n-f Metadata [9136 B] Get:21 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2197 kB] Get:22 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/main Translation-en [385 kB] Get:23 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/main amd64 c-n-f Metadata [16.0 kB] Get:24 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1381 kB] Get:25 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/restricted Translation-en [196 kB] Get:26 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/restricted amd64 c-n-f Metadata [600 B] Get:27 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [973 kB] Get:28 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe Translation-en [222 kB] Get:29 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 c-n-f Metadata [21.8 kB] Get:30 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [29.9 kB] Get:31 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/multiverse Translation-en [7940 B] Get:32 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 c-n-f Metadata [664 B] Get:33 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [45.7 kB] Get:34 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/main Translation-en [16.3 kB] Get:35 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/main amd64 c-n-f Metadata [1420 B] Get:36 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/restricted amd64 c-n-f Metadata [116 B] Get:37 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [24.0 kB] Get:38 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/universe Translation-en [16.0 kB] Get:39 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/universe amd64 c-n-f Metadata [864 B] Get:40 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/multiverse amd64 c-n-f Metadata [116 B] Get:41 http://security.ubuntu.com/ubuntu focal-security/main Translation-en [301 kB] Get:42 http://security.ubuntu.com/ubuntu focal-security/main amd64 c-n-f Metadata [11.2 kB] Get:43 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1289 kB] Get:44 http://security.ubuntu.com/ubuntu focal-security/restricted Translation-en [183 kB] Get:45 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [743 kB] Get:46 http://security.ubuntu.com/ubuntu focal-security/universe Translation-en [137 kB] Get:47 http://security.ubuntu.com/ubuntu focal-security/universe amd64 c-n-f Metadata [15.3 kB] Fetched 26.4 MB in 5s (5247 kB/s) Reading package lists... Reading package lists... Building dependency tree... Reading state information... The following packages were automatically installed and are no longer required: python-pip-whl python3-wheel Use 'apt autoremove' to remove them. The following additional packages will be installed: libpython3.9-minimal libpython3.9-stdlib python3.9-minimal Suggested packages: python3.9-venv python3.9-doc binfmt-support The following NEW packages will be installed: libpython3.9-minimal libpython3.9-stdlib python3.9 python3.9-minimal 0 upgraded, 4 newly installed, 0 to remove and 33 not upgraded. Need to get 4979 kB of archives. After this operation, 19.9 MB of additional disk space will be used. Get:1 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 libpython3.9-minimal amd64 3.9.5-3ubuntu0~20.04.1 [756 kB] Get:2 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 python3.9-minimal amd64 3.9.5-3ubuntu0~20.04.1 [2022 kB] Get:3 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 libpython3.9-stdlib amd64 3.9.5-3ubuntu0~20.04.1 [1778 kB] Get:4 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 python3.9 amd64 3.9.5-3ubuntu0~20.04.1 [423 kB] debconf: unable to initialize frontend: Dialog debconf: (TERM is not set, so the dialog frontend is not usable.) debconf: falling back to frontend: Readline debconf: unable to initialize frontend: Readline debconf: (This frontend requires a controlling tty.) debconf: falling back to frontend: Teletype dpkg-preconfigure: unable to re-open stdin: Fetched 4979 kB in 0s (34.1 MB/s) Selecting previously unselected package libpython3.9-minimal:amd64. (Reading database ... 147341 files and directories currently installed.) Preparing to unpack .../libpython3.9-minimal_3.9.5-3ubuntu0~20.04.1_amd64.deb ... Unpacking libpython3.9-minimal:amd64 (3.9.5-3ubuntu0~20.04.1) ... Selecting previously unselected package python3.9-minimal. Preparing to unpack .../python3.9-minimal_3.9.5-3ubuntu0~20.04.1_amd64.deb ... Unpacking python3.9-minimal (3.9.5-3ubuntu0~20.04.1) ... Selecting previously unselected package libpython3.9-stdlib:amd64. Preparing to unpack .../libpython3.9-stdlib_3.9.5-3ubuntu0~20.04.1_amd64.deb ... Unpacking libpython3.9-stdlib:amd64 (3.9.5-3ubuntu0~20.04.1) ... Selecting previously unselected package python3.9. Preparing to unpack .../python3.9_3.9.5-3ubuntu0~20.04.1_amd64.deb ... Unpacking python3.9 (3.9.5-3ubuntu0~20.04.1) ... Setting up libpython3.9-minimal:amd64 (3.9.5-3ubuntu0~20.04.1) ... Setting up python3.9-minimal (3.9.5-3ubuntu0~20.04.1) ... Setting up libpython3.9-stdlib:amd64 (3.9.5-3ubuntu0~20.04.1) ... Setting up python3.9 (3.9.5-3ubuntu0~20.04.1) ... Processing triggers for man-db (2.9.1-1) ... Processing triggers for mime-support (3.64ubuntu1) ... Tue Nov 8 01:24:00 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 | | N/A 40C P8 14W / 70W | 0MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ Creating a virtualenv for this project... Pipfile: /opt/task/directory/Pipfile Using /usr/bin/python3.9 (3.9.5) to create virtualenv... ⠴ Creating virtual environment...created virtual environment CPython3.9.5.final.0-64 in 1731ms creator Venv(dest=/root/.local/share/virtualenvs/directory-6uwWda-_, clear=False, no_vcs_ignore=False, global=False, describe=CPython3Posix) seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv) added seed packages: pip==22.2.2, setuptools==65.3.0, wheel==0.37.1 activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator ✔ Successfully created virtual environment! Virtualenv location: /root/.local/share/virtualenvs/directory-6uwWda-_ Installing dependencies from Pipfile... To activate this project's virtualenv, run pipenv shell. Alternatively, run a command inside the virtualenv with pipenv run. A models/ A data/MAGNETIC_TILE_SURFACE_DEFECTS/images/ A data/MAGNETIC_TILE_SURFACE_DEFECTS/masks/ A data/MAGNETIC_TILE_SURFACE_DEFECTS/train_images/ A data/MAGNETIC_TILE_SURFACE_DEFECTS/train_masks/ A data/MAGNETIC_TILE_SURFACE_DEFECTS/test_images/ A data/MAGNETIC_TILE_SURFACE_DEFECTS/test_masks/ 7 files added and 785 files fetched Running stage 'data_load': > python src/stages/data_load.py --config=params.yaml Matplotlib is building the font cache; this may take a moment. 100%|██████████| 392/392 [00:00<00:00, 495.12it/s] Running stage 'data_split': > python src/stages/data_split.py --config=params.yaml Updating lock file 'dvc.lock' Running stage 'train': > python src/stages/train.py --config=params.yaml /root/.local/share/virtualenvs/directory-6uwWda-_/lib/python3.9/site-packages/torch/_tensor.py:1142: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). ret = func(*args, **kwargs) Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to /root/.cache/torch/hub/checkpoints/resnet34-b627a593.pth 100%|██████████| 83.3M/83.3M [00:00<00:00, 191MB/s] INFO:dvclive:Report path (if generated): /opt/task/directory/training_metrics/report.html epoch train_loss valid_loss time 0 0.395255 0.593410 00:13 epoch train_loss valid_loss time 0 0.345424 0.327121 00:11 1 0.304969 0.256029 00:10 2 0.306849 0.338111 00:10 3 0.291200 0.235138 00:10 4 0.313439 0.274647 00:10 5 0.282574 0.608300 00:10 6 0.235368 0.123192 00:11 7 0.190397 0.102500 00:11 8 0.158510 0.095538 00:11 9 0.143658 0.094383 00:11 Updating lock file 'dvc.lock' Running stage 'evaluate': > python src/stages/eval.py --config=params.yaml /opt/task/directory/src/eval_utils.py:57: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). Consider using `matplotlib.pyplot.close()`. fig, axarr = plt.subplots(1, 3) 100%|██████████| 78/78 [00:50<00:00, 1.56it/s] Updating lock file 'dvc.lock' To track the changes with git, run: git add dvc.lock To enable auto staging, run: dvc config core.autostage true Use `dvc push` to send your updates to remote storage. On branch temp Changes not staged for commit: (use "git add/rm ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) modified: dvc.lock deleted: main.tf modified: metrics.json modified: tpi-run.sh modified: training_metrics.json modified: training_metrics/report.html modified: training_metrics/scalars/epoch.tsv modified: training_metrics/scalars/eval/loss.tsv modified: training_metrics/scalars/train/loss.tsv Untracked files: (use "git add ..." to include in what will be committed) leo test.tf no changes added to commit (use "git add" and/or "git commit -a") tpi-task.service: Succeeded. INFO Deleting resources... INFO Reading resources... (this may happen several times) INFO [1/9] Reading DefaultVPC... INFO [2/9] Reading DefaultVPCSubnets... INFO [3/9] Reading Image... INFO [1/6] Deleting AutoScalingGroup... INFO [2/6] Deleting LaunchTemplate... INFO [3/6] Deleting KeyPair... INFO [4/6] Deleting SecurityGroup... INFO [5/6] Reading Credentials... INFO [6/6] Deleting Bucket... INFO Deletion completed $ git status On branch temp nothing to commit, working tree clean ```