iterative / cml

♾️ CML - Continuous Machine Learning | CI/CD for ML
http://cml.dev
Apache License 2.0
3.99k stars 333 forks source link

Runner creating instance with ubuntu 18.04. How to get ubuntu 20? #1469

Closed hugochinchilla closed 2 months ago

hugochinchilla commented 2 months ago

Hello, please I need help, I've been trying to figure this out and I cannot understand why I'm getting a runner being created with ubuntu 18.04.

I've followed the code from cml to the terraform-provider-iterative project and tried to make use of the image parameter in TPI. I've seen an undocumented --cloud-image flag in CML and tried to use it, but when I do, the task never completes as the instance never becomes ready.

This is how I'm creating the runner.

name: Run project on cloud

on: push

jobs:
  create-runner:
    runs-on: ubuntu-latest
    steps:
      - uses: iterative/setup-cml@v2
      - name: Deploy runner on EC2
        run: |
          cml runner launch \
              --cloud=aws \
              --cloud-region=eu-central-1 \
              --cloud-type=m4.large \
              --labels=cml-gpu \
              --reuse \
              --log=debug

  train-and-report:
    needs: create-runner
    runs-on: [self-hosted, cml-gpu]
    timeout-minutes: 120
    steps:
      - name: Get system info
        run: |
          ldd --version | head -n 1
          lsb_release -a

When I've tried with the --cloud-image flag I've tried using --cloud-image="nvidia" and --cloud-image="ubuntu@898082745236:x86_64:Deep Learning AMI GPU CUDA 11.3.* (Ubuntu 20.04) *" but neither worked.

I got the values from here https://github.com/iterative/terraform-provider-iterative/blob/main/task/aws/resources/data_source_image.go#L40-L41

hugochinchilla commented 2 months ago

After reading the code more carefully I realized I only had to pass the image name to the --cloud-image flag.

So after passing --cloud-image="Deep Learning AMI GPU CUDA 11.2.1 (Ubuntu 20.04) 20220626" it finally started an instance with ubuntu 20.04.

hugochinchilla commented 2 months ago

sorry, forgot to close the Issue with the previous comment