usegalaxy-eu / infrastructure

All of Galaxyproject EU's cloud infrastructure.
8 stars 25 forks source link

Update image and the flavor of Dilmurat's VM #173

Closed sanjaysrikakulam closed 1 year ago

sanjaysrikakulam commented 1 year ago

The previous image has issues with installing CUDA and NVIDIA drivers due to the broken CUDA repo GPG key. Manual debugging and installation attempts do not fix due to the lack of storage on the root disk so this PR updates the flavor along with the GPU image (using the same image as the one we use for our worker nodes).

bgruening commented 1 year ago

Q: we do have a GPU image with prebuild CUDA do we?

usegalaxy-eu-bot commented 1 year ago


Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # openstack_compute_instance_v2.dilmurat must be replaced
-/+ resource "openstack_compute_instance_v2" "dilmurat" {
      ~ access_ip_v4        = "192.52.42.241" -> (known after apply)
      + access_ip_v6        = (known after apply)
      ~ all_metadata        = {} -> (known after apply)
      ~ all_tags            = [] -> (known after apply)
      ~ availability_zone   = "nova" -> (known after apply)
      ~ flavor_id           = "14e797a2-3146-42ab-aa78-218e247cad7a" -> (known after apply)
      ~ flavor_name         = "g1.c8m20g1" -> "g1.c8m20g1d50"
      ~ id                  = "980400a5-7d12-4da8-b7aa-46a259707a5f" -> (known after apply)
      ~ image_id            = "5f3fc2b3-0803-44cc-abe5-40335a6e6bd6" -> "f5b82cb0-03b4-44f0-8ce5-33f15c53f89b" # forces replacement
      ~ image_name          = "vggp-gpu-v60-j310-1fad751e0150-main" -> (known after apply)
        name                = "dilmurat dedicated VM"
      ~ region              = "Freiburg" -> (known after apply)
      - tags                = [] -> null
        # (6 unchanged attributes hidden)

      ~ network {
          ~ fixed_ip_v4    = "192.52.42.241" -> (known after apply)
          + fixed_ip_v6    = (known after apply)
          + floating_ip    = (known after apply)
          ~ mac            = "fa:16:3e:57:9a:a4" -> (known after apply)
            name           = "public"
          + port           = (known after apply)
          ~ uuid           = "60775850-0c04-4a6d-b607-ad1d75ee2900" -> (known after apply)
            # (1 unchanged attribute hidden)
        }
    }

  # openstack_compute_volume_attach_v2.dilmurat-va must be replaced
-/+ resource "openstack_compute_volume_attach_v2" "dilmurat-va" {
      ~ device      = "/dev/vdb" -> (known after apply)
      ~ id          = "980400a5-7d12-4da8-b7aa-46a259707a5f/981a98dc-cc05-4ed9-9a2b-e3018ccd627d" -> (known after apply)
      ~ instance_id = "980400a5-7d12-4da8-b7aa-46a259707a5f" -> (known after apply) # forces replacement
      ~ region      = "Freiburg" -> (known after apply)
        # (1 unchanged attribute hidden)
    }

Plan: 2 to add, 0 to change, 2 to destroy.

─────────────────────────────────────────────────────────────────────────────

Saved the plan to: tf.plan

To perform exactly these actions, run the following command to apply:
    terraform apply "tf.plan" ```
sanjaysrikakulam commented 1 year ago

Q: we do have a GPU image with prebuild CUDA do we?

The VGCN repo shows that the installation is turned off: https://github.com/usegalaxy-eu/vgcn/blob/00829423b35b5da3b7fbe4d49dfacea85d554d57/ansible-roles/group_vars/gpu.yml#L2-L5

Not sure, why though. So we inject the installation through cloud-init in our TF files.