Introduce `KUBEVIRT_NUM_NUMA_NODES`, `KUBEVIRT_NUM_VCPU` and `KUBEVIRT_CPU_MANAGER_POLICY`

lyarwood commented 4 months ago

What this PR does / why we need it:

These new env variables being useful when testing dedicatedCpuPlacement and guestMappingPassthrough without requiring a physical host with multiple NUMA nodes etc.

$ sudo dnf install numactl -y && numactl --hardware
[..]
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 0 size: 63432 MB
node 0 free: 619 MB
node distances:
node   0 
  0:  10 
$ env | grep KUBEVIRT
KUBEVIRT_PROVIDER=k8s-1.28
KUBEVIRT_MEMORY_SIZE=16384M
KUBEVIRT_HUGEPAGES_2M=1024
KUBEVIRT_DEPLOY_CDI=true
KUBEVIRT_CPU_MANAGER_POLICY=static
KUBEVIRT_NUM_NUMA_NODES=2
KUBEVIRT_NUM_VCPU=8
KUBEVIRTCI_CONTAINER_SUFFIX=latest
KUBEVIRTCI_GOCLI_CONTAINER=quay.io/kubevirtci/gocli:latest
$ make cluster-up
[..]
$ ./cluster-up/ssh.sh node01
[..]
[vagrant@node01 ~]$ sudo dnf install numactl && numactl --hardware
[..]
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3
node 0 size: 7997 MB
node 0 free: 5610 MB
node 1 cpus: 4 5 6 7
node 1 size: 8018 MB
node 1 free: 5932 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 
$ cd ../kubevirt
$ rsync -av ../kubevirtci/_ci-configs/ ./_ci-configs/
$ rsync -av ../kubevirtci/cluster-up/ ./cluster-up
$ make cluster-sync
[..]
$ ./cluster-up/kubectl.sh patch kv/kubevirt -n kubevirt --type merge -p '{"spec":{"configuration":{"developerConfiguration":{"featureGates": ["CPUManager","NUMA"]}}}}'
$ ./cluster-up/kubectl.sh apply -f -<<EOF
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
  name: example
spec:
  domain:
    cpu:
      cores: 5
      dedicatedCpuPlacement: true
      numa:
        guestMappingPassthrough: {}
    devices:
      disks:
        - disk:
            bus: virtio
          name: containerdisk
        - disk:
            bus: virtio
          name: cloudinitdisk
    resources:
      requests:
        memory: 1Gi
    memory:
      hugepages:
        pageSize: 2Mi
  volumes:
    - containerDisk:
        image: quay.io/containerdisks/fedora:39
      name: containerdisk
    - cloudInitNoCloud:
        userData: |
          #!/bin/sh
          mkdir -p  /home/fedora/.ssh
          curl https://github.com/lyarwood.keys > /home/fedora/.ssh/authorized_keys
          chown -R fedora: /home/fedora/.ssh
      name: cloudinitdisk
EOF
[..]
$ ./cluster-up/virtctl.sh ssh -lfedora example
fedora@example $ sudo dnf install numactl -y && numactl --hardware
[..]
available: 2 nodes (0-1)
node 0 cpus: 0
node 0 size: 502 MB
node 0 free: 293 MB
node 1 cpus: 1 2 3 4
node 1 size: 444 MB
node 1 free: 44 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 
$ ./cluster-up/kubectl.sh exec pods/virt-launcher-example-glcfz -- virsh vcpuinfo 1
selecting podman as container runtime
VCPU:           0
CPU:            1
State:          running
CPU time:       14.3s
CPU Affinity:   -y------

VCPU:           1
CPU:            4
State:          running
CPU time:       5.6s
CPU Affinity:   ----y---

VCPU:           2
CPU:            5
State:          running
CPU time:       2.8s
CPU Affinity:   -----y--

VCPU:           3
CPU:            6
State:          running
CPU time:       3.8s
CPU Affinity:   ------y-

VCPU:           4
CPU:            7
State:          running
CPU time:       5.2s
CPU Affinity:   -------y
$ ./cluster-up/kubectl.sh get pods/virt-launcher-example-glcfz -o json | jq '.spec.containers
[] | select(.name == "compute") | .resources'
selecting podman as container runtime
{
  "limits": {
    "cpu": "5",
    "devices.kubevirt.io/kvm": "1",
    "devices.kubevirt.io/tun": "1",
    "devices.kubevirt.io/vhost-net": "1",
    "hugepages-2Mi": "1Gi",
    "memory": "394264576"
  },
  "requests": {
    "cpu": "5",
    "devices.kubevirt.io/kvm": "1",
    "devices.kubevirt.io/tun": "1",
    "devices.kubevirt.io/vhost-net": "1",
    "ephemeral-storage": "50M",
    "hugepages-2Mi": "1Gi",
    "memory": "394264576"
  }
}

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #

Special notes for your reviewer:

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR. Approvers are expected to review this list.

[ ] Design: A design document was considered and is present (link) or not required
[ ] PR: The PR description is expressive enough and will help future contributors
[ ] Code: Write code that humans can understand and Keep it simple
[ ] Refactor: You have left the code cleaner than you found it (Boy Scout Rule)
[ ] Upgrade: Impact of this change on upgrade flows was considered and addressed if required
[ ] Testing: New code requires new unit tests. New features and bug fixes require at least on e2e test
[ ] Documentation: A user-guide update was considered and is present (link) or not required. You want a user-guide update if it's a user facing feature / API change.
[ ] Community: Announcement to kubevirt-dev was considered

Release note:

NONE

kubevirt-bot commented 4 months ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

lyarwood commented 4 months ago

/hold cancel

lyarwood commented 4 months ago

/retest-required

+ ./cluster-up/ssh.sh node01 -- ip l show eth1
selecting podman as container runtime
Device "eth1" does not exist.

I can't reproduce this locally so trying another run to ensure it isn't transient in CI.

lyarwood commented 4 months ago

/retest-required
+ ./cluster-up/ssh.sh node01 -- ip l show eth1
selecting podman as container runtime
Device "eth1" does not exist.
I can't reproduce this locally so trying another run to ensure it isn't transient in CI.

+ ../gocli/build/cli provision 1.30 --phases k8s
time="2024-04-16T17:06:15Z" level=info msg="Using remote image quay.io/kubevirtci/centos9:2404020400-62457f2"

Ah I see it now, the issue is the tests don't also rebuild centos so the vm.sh script within is old?

I assume I need to break these changes out of https://github.com/kubevirt/kubevirtci/pull/1171/commits/2618f6633d8a7698a116f4b893b4c5cc32653c0e into their own PR, land that and have a fresh centos image created?

lyarwood commented 4 months ago

/hold

lyarwood commented 4 months ago

/hold cancel

brianmcarey commented 4 months ago

/cc

kubevirt-bot commented 3 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brianmcarey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubevirt/kubevirtci/blob/main/OWNERS)~~ [brianmcarey] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment

kubevirt-bot commented 3 months ago

@lyarwood: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
check-up-kind-1.27-vgpu	4151e66316dfab1d8bf22f9c999c6620ddd09f7d	link	false	`/test check-up-kind-1.27-vgpu`

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).

kubevirt / kubevirtci

Introduce `KUBEVIRT_NUM_NUMA_NODES`, `KUBEVIRT_NUM_VCPU` and `KUBEVIRT_CPU_MANAGER_POLICY` #1171