loft-sh / devpod

Codespaces but open-source, client-only and unopinionated: Works with any IDE and lets you use any cloud, kubernetes or just localhost docker.
https://devpod.sh
Mozilla Public License 2.0
8.44k stars 310 forks source link

GPU Provisiong with GCP Provider fails #973

Closed hrittikhere closed 3 months ago

hrittikhere commented 3 months ago

What happened?
I am trying to create a simple GPU-based GCP VM (g2-standard-4) with GCP Provider (v.0.0.7 ). It fails and with the error code 400 and message “Instances with guest accelerators do not support live migration.”

image

What did you expect to happen instead?
Create A VM successfully with GPU. The way I found it can be bypassed is this flag. --maintenance-policy TERMINATE --restart-on-failure ( Link for troubleshooting steps: https://groups.google.com/g/gce-discussion/c/e9K3h3fQuJk/m/UxyKqskLAQAJ )

~ ❯ gcloud compute instances create vm  --machine-type=g2-standard-4 --zone=us-central1-a                G hrittik-project
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
 - Instances with guest accelerators do not support live migration.

~ ❯ gcloud compute instances create vm  --machine-type=g2-standard-4 --zone=us-central1-a   --maintenance-policy TERMINATE --restart-on-failure
Created [https://www.googleapis.com/compute/v1/projects/hrittik-project/zones/us-central1-a/instances/vm].
NAME  ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP     STATUS
vm    us-central1-a  g2-standard-4               10.128.0.16  34.136.130.135  RUNNING

The purpose is to tryOlama with GPU and CUDA features. With CPU-based machines, it’s working, but I wanted to push it with a GPU during inference.

How can we reproduce the bug? (as minimally and precisely as possible)

Create g2-standard-4 VM on GCP or others with GPU.

My devcontainer.json:

{
    "name": "Python 3",
    "image": "mcr.microsoft.com/devcontainers/python:1-3.12-bullseye",
    "features": {
        "ghcr.io/devcontainers/features/docker-in-docker:2": {},
        "ghcr.io/devcontainers/features/nvidia-cuda:1": {},
        "ghcr.io/rio/features/k3d:1": {}
    },
    "customizations": {
        "vscode": {
            "extensions": [
                "ms-azuretools.vscode-docker",
                "GitHub.copilot"
            ]
        }
    }
}

Local Environment:

DevPod Provider:

Anything else we need to know?
Internal Chat