siderolabs / contrib

talos/sidero setup examples
Mozilla Public License 2.0
40 stars 29 forks source link

feat: make the aws tf code more modular #20

Closed frezbo closed 1 year ago

frezbo commented 1 year ago

Make the aws code more modular so that it can be used in CI.

frezbo commented 1 year ago

An example usage with a var file:

{
    "cluster_name": "talos-nvidia-test",
    "num_control_planes": 1,
    "num_workers": 0,
    "ami_id": "ami-034f35c36088696a8",
    "instance_type_control_plane": "t3.medium",
    "config_patch_files_worker": [
        "patch.yaml"
    ],
    "extra_tags": {
        "Project": "talos-nvidia-test",
        "Environment": "ci test",
        "Owner": "frezbo"
    },
    "node_groups": [
        {
            "name": "nvidia-t4",
            "num_instances": 2,
            "instance_type": "g4dn.xlarge",
            "tags": {
                "Type": "nvidia-t4"
            }
        },
        {
            "name": "nvidia-a100",
            "num_instances": 1,
            "instance_type": "p4d.24xlarge",
            "tags": {
                "Type": "nvidia-a100"
            },
            "config_patch_files": [
                "patch-a100.yaml"
            ]
        }
    ]
}

patch.yaml

machine:
  kernel:
    modules:
      - name: nvidia
      - name: nvidia_uvm
      - name: nvidia_drm
      - name: nvidia_modeset
  sysctls:
    net.core.bpf_jit_harden: 1
  install:
    extensions:
      - image: ghcr.io/frezbo/nvidia-container-toolkit:535.54.03-v1.13.5
      - image: ghcr.io/frezbo/nvidia-open-gpu-kernel-modules:535.54.03-v1.5.0-alpha.3-2-gc59245d-dirty

patch-a100.yaml

machine:
  install:
    extensions:
      - image: ghcr.io/frezbo/nvidia-fabricmanager:535.54.03
rsmitty commented 1 year ago

Generally fine with this, but I think that calling them "node groups" is a bit confusing, being that that's a construct of EKS. "worker groups" feels more appropriate imo.

frezbo commented 1 year ago

makes sense I'll update

frezbo commented 1 year ago

/m

DmitriyMV commented 1 year ago

/m