InstaSlice uses stable APIs and works with GPU operator to create mig slices on demand.
Partitionable accelerators provided by vendors need partition to be created at node boot-time or to change partitions one would have to evict all the workloads at the node level to create new set of partitions.
InstaSlice will help if
Integration with Kubernetes quota management
Integration with project Kueue
Emulator mode to run test InstaSlice firstfit placement strategy
Integration with vLLM, Kserve, Deployments, Jobs, and Statefulsets
Install the NVIDIA GPU drivers and CUDA toolkit on the host.
Install the NVIDIA Container Toolkit (CTK).
Configure the NVIDIA Container Runtime as the default Docker runtime:
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default
Restart Docker to apply the changes:
sudo systemctl restart docker
Configure the NVIDIA Container Runtime to use volume mounts to select devices to inject into a container:
sudo nvidia-ctk config --set accept-nvidia-visible-devices-as-volume-mounts=true --in-place
This sets accept-nvidia-visible-devices-as-volume-mounts=true
in the /etc/nvidia-container-runtime/config.toml
file.
Enabled
in the third row of the table:nvidia-smi
Sun Aug 18 09:41:46 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.28.03 Driver Version: 560.28.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB Off | 00000000:07:00.0 Off | On |
| N/A 27C P0 31W / 250W | 1MiB / 40960MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| No MIG devices found |
+-----------------------------------------------------------------------------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
nvidia-smi -i <gpu-id> -mig 1
Example:
nvidia-smi -i 0 -mig 1
Note: You may need to reboot the node for the changes to take effect. An asterisk beside MIG status (e.g. Enabled*
)
means the changes are pending and will be applied after a reboot.
Create a Kind cluster and install the NVIDIA GPU Operator:
bash ./deploy/setup.sh
Note: The validator pods nvidia-cuda-validator-*
and nvidia-operator-validator-*
of the GPU operator are expected to
fail to initialize. This is because with MIG enabled, but without a MIG partition they effectively have no GPU to run on.
kubectl get pod -n gpu-operator
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-lzcpv 2/2 Running 0 5m48s
gpu-operator-7b5587d878-vq2gw 1/1 Running 0 6m59s
gpu-operator-node-feature-discovery-gc-8478d46f4c-wvx29 1/1 Running 0 6m59s
gpu-operator-node-feature-discovery-master-688bb86496-cn97b 1/1 Running 0 6m59s
gpu-operator-node-feature-discovery-worker-7twxt 1/1 Running 0 6m52s
nvidia-container-toolkit-daemonset-gpn22 1/1 Running 0 6m13s
nvidia-cuda-validator-sjqgk 0/1 Init:CrashLoopBackOff 5 (111s ago) 4m54s
nvidia-dcgm-exporter-tlcpv 1/1 Running 0 6m7s
nvidia-device-plugin-daemonset-wbbhx 2/2 Running 0 5m53s
nvidia-operator-validator-h7ngh 0/1 Init:2/4 0 6m10s
IMG=<registry>/<controller-image>:<tag> IMG_DMST=<registry>/<daemonset-image>:<tag> make docker-build docker-push
Example:
IMG=quay.io/example/instaslice2-controller:1.0 IMG_DMST=quay.io/example/instaslice2-daemonset:1.0 make docker-build docker-push
Note: You can use Podman instead of Docker to build images, just set CONTAINER_TOOL=podman
before the image-related make targets.
Cross-platform or multi-arch images can be built and pushed using make docker-buildx
. When using Docker as your container tool, make
sure to create a builder instance. Refer to Multi-platform images
for documentation on building mutli-platform images with Docker. You can change the destination platform(s) by setting PLATFORMS
, e.g.:
PLATFORMS=linux/arm64,linux/amd64 make docker-buildx
make deploy
or with custom-build images:
IMG=<registry>/<controller-image>:<tag> IMG_DMST=<registry>/<daemonset-image>:<tag> make deploy
Example:
IMG=quay.io/example/instaslice2-controller:1.0 IMG_DMST=quay.io/example/instaslice2-daemonset:1.0 make deploy
The all-in-one command for building and deploying InstaSlice:
# make docker-build docker-push deploy
Or with custom images:
IMG=<registry>/<controller-image>:<tag> IMG_DMST=<registry>/<daemonset-image>:<tag> make docker-build docker-push deploy
Example:
IMG=quay.io/example/instaslice2-controller:1.0 IMG_DMST=quay.io/example/instaslice2-daemonset:1.0 make docker-build docker-push deploy
kubectl get pod -n instaslice-system
NAME READY STATUS RESTARTS AGE
instaslice-operator-controller-daemonset-5lbqg 1/1 Running 0 101s
instaslice-operator-controller-manager-57b549784c-wkqq2 2/2 Running 0 101s
Note: If you encounter RBAC errors, you may need to grant yourself cluster-admin privileges or be logged in as admin.
kubectl apply -f ./samples/test-pod.yaml
pod/cuda-vectoradd-1 created
kubectl get pods
NAME READY STATUS RESTARTS AGE
cuda-vectoradd-1 1/1 Running 0 15s
and
kubectl logs cuda-vectoradd-1
GPU 0: NVIDIA A100-PCIE-40GB (UUID: GPU-1785aa6b-6edf-f58e-2e29-f6ccd30f306f)
MIG 1g.5gb Device 0: (UUID: MIG-2cc7f78c-04eb-5a3c-92c7-f423e3572bb8)
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
While the pod is running, you can observe the MIG slice created for it automatically:
nvidia-smi
Sun Aug 18 11:48:20 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.28.03 Driver Version: 560.28.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB Off | 00000000:07:00.0 Off | On |
| N/A 32C P0 63W / 250W | 13MiB / 40960MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 11 0 0 | 13MiB / 4864MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 8191MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
...
kubectl delete -f ./samples/test-pod.yaml
nvidia-smi
Sun Aug 18 13:34:55 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.28.03 Driver Version: 560.28.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB Off | 00000000:07:00.0 Off | On |
| N/A 32C P0 61W / 250W | 1MiB / 40960MiB | N/A Default |
| | | Enabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| No MIG devices found |
+-----------------------------------------------------------------------------------------+
...
You can apply the samples (examples) from the sample
directory:
kubectl apply -k samples/
NOTE: Ensure that the samples use the default values to test it out.
kubectl delete -k samples/
make uninstall
make undeploy
kind delete cluster
Users (mainly developers) can leverage running the InstaSlice operator using the emulator mode as described here This has been tested on a single node cluster as of now.
To run the e2e tests locally, run the following command:
make test-e2e-kind-emulated ; make cleanup-test-e2e-kind-emulated
These e2e tests would be performed by creating a kind
cluster locally.
InstaSlice has been published on OperatorHub.
High level overview of the main priorities for 2024/2025:
Future tasks:
Copyright 2024.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.