Closed chubei-urus closed 1 month ago
Welcome @chubei-urus!
It looks like this is your first PR to kubernetes/minikube 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes/minikube has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @chubei-urus. Thanks for your PR.
I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
Can one of the admins verify this patch?
I'm new to the repo and don't know how this feature should be tested. Many thanks to anyone who can give some pointers!
Thank you @chubei-urus for creating this PR, do you mind sharing a Before After this PR Example of running a workload and how did you verify that it was NOT using the graphic card before this PR ?
Thank you for your quick reply. I'll create a minimal example.
/ok-to-test
kvm2 driver with docker runtime
+----------------+----------+---------------------+
| COMMAND | MINIKUBE | MINIKUBE (PR 19345) |
+----------------+----------+---------------------+
| minikube start | 49.8s | 49.4s |
| enable ingress | 26.5s | 25.0s |
+----------------+----------+---------------------+
docker driver with docker runtime
+----------------+----------+---------------------+
| COMMAND | MINIKUBE | MINIKUBE (PR 19345) |
+----------------+----------+---------------------+
| minikube start | 23.1s | 22.2s |
| enable ingress | 21.4s | 22.1s |
+----------------+----------+---------------------+
docker driver with containerd runtime
+----------------+----------+---------------------+
| COMMAND | MINIKUBE | MINIKUBE (PR 19345) |
+----------------+----------+---------------------+
| minikube start | 21.3s | 21.6s |
| enable ingress | 48.2s | 48.1s |
+----------------+----------+---------------------+
Here are the number of top 10 failed tests in each environments with lowest flake rate.
Environment | Test Name | Flake Rate |
---|
Besides the following environments also have failed tests:
KVM_Linux_crio: 30 failed (gopogh)
Docker_Cloud_Shell: 5 failed (gopogh)
Docker_Linux_crio: 2 failed (gopogh)
QEMU_macOS: 97 failed (gopogh)
Docker_Linux_containerd_arm64: 1 failed (gopogh)
Docker_Linux_crio_arm64: 2 failed (gopogh)
To see the flake rates of all tests by environment, click here.
minikube start --gpus all
vulkan.yaml
with following content.
apiVersion: v1
kind: Pod
metadata:
name: vulkan
spec:
containers:
- name: vulkan
env:
- name: NVIDIA_DRIVER_CAPABILITIES
value: "graphics"
image: dualvtable/vulkan-sample
resources:
limits:
nvidia.com/gpu: 1
restartPolicy: Never
kubectl apply -f vulkan.yaml
kubectl logs vulkan
The logs look like:
computeheadless: /build/Vulkan/examples/computeheadless/computeheadless.cpp:181: VulkanExample::VulkanExample(): Assertion `res == VK_SUCCESS' failed.
/build/entrypoint.sh: line 4: 14 Done echo 'y'
15 Aborted (core dumped) | ${EXAMPLES}/$i
\n
renderheadless: /build/Vulkan/examples/renderheadless/renderheadless.cpp:211: VulkanExample::VulkanExample(): Assertion `res == VK_SUCCESS' failed.
/build/entrypoint.sh: line 4: 16 Done echo 'y'
17 Aborted (core dumped) | ${EXAMPLES}/$i
\n
The logs look like:
Running headless compute example
GPU: NVIDIA GeForce RTX 4060 Laptop GPU
Compute input:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Compute output:
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269
Finished. Press enter to terminate...\n
Running headless rendering example
GPU: NVIDIA GeForce RTX 4060 Laptop GPU
Framebuffer image saved to headless.ppm
Finished. Press enter to terminate...\n
(base) bei@bei-urus:~/minikube$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 24.04 LTS
Release: 24.04
Codename: noble
(base) bei@bei-urus:~/minikube$ nvidia-smi
Tue Jul 30 10:08:20 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4060 ... Off | 00000000:01:00.0 On | N/A |
| N/A 42C P4 10W / 35W | 827MiB / 8188MiB | 2% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2613 G /usr/lib/xorg/Xorg 271MiB |
| 0 N/A N/A 2942 G /usr/bin/gnome-shell 172MiB |
| 0 N/A N/A 3914 G ...yOnDemand --variations-seed-version 91MiB |
| 0 N/A N/A 4839 G ...seed-version=20240729-050126.230000 109MiB |
| 0 N/A N/A 6026 G ...erProcess --variations-seed-version 137MiB |
+---------------------------------------------------------------------------------------+
Note that this is not the workload I was running, but I believe it shows the same issue.
@chubei-urus I could merge this PR and if you like I would love to see a follow up adding integraiton test chubei-urus. https://github.com/kubernetes/minikube/issues/19486
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: chubei-urus, medyagh
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Thank you I'd like an integration test but have been busy with other things.
fixes #19318