kubevirt / kubevirtci

Contains cluster definitions and client tools to quickly spin up and destroy ephemeral and scalable k8s and ocp clusters for testing
Apache License 2.0
82 stars 119 forks source link

Convert the vm.sh script into a go binary #1164

Open Kuruyia opened 7 months ago

Kuruyia commented 7 months ago

What this PR does / why we need it: This adds a new Go CLI based on Cobra that will replace the vm.sh script used when provisioning a cluster to set up the VMs that will contain the k8s cluster nodes. This is done in order to increase the maintainability of this script in the future.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #1160

Special notes for your reviewer: The CLI flags were kept as-is so this can be used as a drop-in replacement.

I've already started to split the functions into multiple files so it is (hopefully) already a bit cleaner than the current shell script.

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR. Approvers are expected to review this list.

Release note:

NONE
kubevirt-bot commented 7 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign awels for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/kubevirt/kubevirtci/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
kubevirt-bot commented 7 months ago

Hi @Kuruyia. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
alicefr commented 6 months ago

@Kuruyia thanks for the work! Can you implement a couple of unit tests for the functions?

alicefr commented 6 months ago

@dhiller @brianmcarey do you have any idea how we could test this?

Kuruyia commented 6 months ago

@Kuruyia thanks for the work! Can you implement a couple of unit tests for the functions?

Hey, thanks for your review!

I'll address your change requests and add the unit tests this week. I guess we're only testing the functions that don't have any side effects for now?

It'll be more involved to test functions that execute qemu-img, try to create a /dev/kvm device, ...

alicefr commented 6 months ago

@Kuruyia thanks for the work! Can you implement a couple of unit tests for the functions?

Hey, thanks for your review!

I'll address your change requests and add the unit tests this week. I guess we're only testing the functions that don't have any side effects for now?

It'll be more involved to test functions that execute qemu-img, try to create a /dev/kvm device, ...

Since we didn't have any tests for the bash script, I would just add some simple ones. For example, some that check the qemu command line. The rest is a bit difficult because this run multiple binaries, for example, the part that creates the the iptables, I would skip it for now.

Another place where you can add unit tests is CalcNextDisk. In this case, you could for example declare type and a function type and overwrite it in the unit tests for files, err := os.ReadDir(searchDir). Something like

type readDirFunc func(n int) ([]os.DirEntry, error)
type DiskUtil struct {
  readDirFunc readDir
}

func NewDiskUtil() *DiskUtil {
  return &DiskUtil{
      readDir = os.ReadDir
  }
}

and in the tests something like this:

type mockDirEntry struct {
   name string
}

func newmockDirEntry(name string) os.DirEntry {
   return &mockDirEntry{name:name}
}

func (m *mockDirEntry) Name() string {
  return m.name
}
[...] // implement the rest of the function for DirEntry

func mockReadDir(n int) ([]os.DirEntry, error) {
    return os.DirEntry{
       newmockDirEntry("disk0.qcow2"),
       newmockDirEntry("disk1.qcow2"),
    }, nil
}

diskUtil := &DiskUtil{
      readDir = mockReadDir
  }
// test
diskUtil.CalcNextDisk(...)
dhiller commented 6 months ago

@dhiller @brianmcarey do you have any idea how we could test this?

@Kuruyia thank you for your work here!

@alicefr I agree with what you suggested. Since we just didn't (or rather couldn't) unit-test anything here, let's try to increase coverage of testing where we can - anything is better than nothing, as it is currently.

lyarwood commented 6 months ago

/cc

FWIW https://github.com/kubevirt/kubevirtci/pull/1174 is about to merge and will need to be covered here. I wasn't aware of this PR prior to starting this work sorry otherwise I would've extended it myself.

Kuruyia commented 6 months ago

/cc

FWIW #1174 is about to merge and will need to be covered here. I wasn't aware of this PR prior to starting this work sorry otherwise I would've extended it myself.

No problem, I was working on the unit tests today so I'll work on adding your modifications as well afterwards, thanks for letting me know.

Kuruyia commented 6 months ago

Hey! Sorry for the delay, I just added the unit tests as you asked, and ported here the NUMA feature that was recently added to vmcli.sh as well.

Here's what I think is left to do:

As this PR is getting bigger, it might be better to offload those tasks to separate PRs imho.

alicefr commented 6 months ago

Hey! Sorry for the delay, I just added the unit tests as you asked, and ported here the NUMA feature that was recently added to vmcli.sh as well.

Here's what I think is left to do:

  • run those unit tests in the CI
  • integrate this new CLI in place of the vmcli.sh shell script
  • support RHEL and RHEL-based distributions (when starting QEMU)

As this PR is getting bigger, it might be better to offload those tasks to separate PRs imho.

Yes, I think we could replace the bash script in a second PR. For example, we might need to think also how to transition form it to the golang binary. We might want to have a version with both to be able to switch to the old version if there are any problem. However, one thing that is necessary are the build of the binary and to copy it inside the provisioner image.

Kuruyia commented 6 months ago

Yes, I think we could replace the bash script in a second PR. For example, we might need to think also how to transition form it to the golang binary. We might want to have a version with both to be able to switch to the old version if there are any problem. However, one thing that is necessary are the build of the binary and to copy it inside the provisioner image.

Yup no problem, what do you think about adding a new stage at the beginning of the centos9 Dockerfile to build this binary, and then copy it in the current (single) stage?

alicefr commented 6 months ago

Yes, I think we could replace the bash script in a second PR. For example, we might need to think also how to transition form it to the golang binary. We might want to have a version with both to be able to switch to the old version if there are any problem. However, one thing that is necessary are the build of the binary and to copy it inside the provisioner image.

Yup no problem, what do you think about adding a new stage at the beginning of the centos9 Dockerfile to build this binary, and then copy it in the current (single) stage?

Yes, sound as a good idea!

Kuruyia commented 6 months ago

@alicefr I changed the tests to use Ginkgo instead, it is much cleaner now :)

I also modified the Dockerfile to split it into two stages. The first stage is in charge of building this new Go cli, and the second stage (which was the old Dockerfile) copies the binary from that first stage into /vmcli.

Kuruyia commented 5 months ago

This is looking great, some quick comments but this should be ready to come out of draft and possibly land shortly.

Can you provide a Makefile that we could then call from a presubmit as we do for gocli?

Do you want a Makefile that replaces the build.sh script?

alicefr commented 5 months ago

@Kuruyia please squash the commits that changes the vmcli into a single one, for example, the ones that inline the errors.

alicefr commented 5 months ago

@Kuruyia I think now what we need it is a plan to integrate this into main. /cc @dhiller @brianmcarey I will also bring this topic to the next community meeting

kubevirt-bot commented 5 months ago

@alicefr: GitHub didn't allow me to request PR reviews from the following users: also, community, will, bring, this, to, the, I, meeting.

Note that only kubevirt members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to [this](https://github.com/kubevirt/kubevirtci/pull/1164#issuecomment-2106949402): >@Kuruyia I think now what we need it is a plan to integrate this into main. >/cc @dhiller @brianmcarey I will also bring this topic to the next community meeting Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
alicefr commented 5 months ago

This is looking great, some quick comments but this should be ready to come out of draft and possibly land shortly. Can you provide a Makefile that we could then call from a presubmit as we do for gocli?

Do you want a Makefile that replaces the build.sh script?

@Kuruyia I don't quite understand why you need the build.sh. AFAIU, you added an additional building step and it should be automatically executed when we build the centos stream 9 node.

@lyarwood there is a difference here between gocli and vmcli. We build gocli on the host and then copy the binary, while here vmcli is built into the container. IMO, the second approach is better and we should also do the same for gocli. It is rare, but building gocli depends on the go version installed on the host. It could happen that the go version is too old. Building the binary in the container guarantees that we don't have any golang version mismatch.

Kuruyia commented 5 months ago

@Kuruyia please squash the commits that changes the vmcli into a single one, for example, the ones that inline the errors.

I squashed all the commits for the vmcli into a single one, and have another commit for the modifications done on the Dockerfile. Please tell me if you want a more granular squash.

@Kuruyia I don't quite understand why you need the build.sh. AFAIU, you added an additional building step and it should be automatically executed when we build the centos stream 9 node.

I don't really understand what targets are needed inside this Makefile. I assumed it would replace the build.sh script as the example @lyarwood provided shows the gocli Makefile used for building the container image.

alicefr commented 4 months ago

/lgtm

brianmcarey commented 4 months ago

/test ?

kubevirt-bot commented 4 months ago

@brianmcarey: The following commands are available to trigger required jobs:

The following commands are available to trigger optional jobs:

Use /test all to run the following jobs that were automatically triggered:

In response to [this](https://github.com/kubevirt/kubevirtci/pull/1164#issuecomment-2191526148): >/test ? Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
brianmcarey commented 4 months ago

/test check-gocli /test check-provision-centos-base

kubevirt-bot commented 4 months ago

New changes are detected. LGTM label has been removed.

Kuruyia commented 4 months ago

/retest

kubevirt-bot commented 4 months ago

@Kuruyia: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to [this](https://github.com/kubevirt/kubevirtci/pull/1164#issuecomment-2197067342): >/retest Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
brianmcarey commented 4 months ago

/test check-gocli /test check-provision-centos-base

brianmcarey commented 4 months ago

/ok-to-test

Kuruyia commented 3 months ago

/retest

kubevirt-bot commented 3 months ago

@Kuruyia: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to [this](https://github.com/kubevirt/kubevirtci/pull/1164#issuecomment-2200538793): >/retest Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
brianmcarey commented 3 months ago

/test check-gocli /test check-provision-centos-base

brianmcarey commented 3 months ago

/ok-to-test

Kuruyia commented 2 months ago

I fixed the name of the base golang image, and tested cluster provisioning under Fedora 40 with Podman. It built successfully, so hopefully this time's the charm for the CI as well :)

/retest

kubevirt-bot commented 2 months ago

@Kuruyia: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to [this](https://github.com/kubevirt/kubevirtci/pull/1164#issuecomment-2292412206): >I fixed the name of the base golang image, and tested cluster provisioning under Fedora 40 with Podman. It built successfully, so hopefully this time's the charm for the CI as well :) > >/retest Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
alicefr commented 2 months ago

/retest

alicefr commented 2 months ago

hi @Kuruyia thanks for the persistence. I'm in favor of this change. Kind reminder to @xpivarc @brianmcarey

kubevirt-bot commented 2 months ago

@Kuruyia: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
check-provision-centos-base 55c5735a9ecefbe2dffa3c4c8052c44a71b50221 link true /test check-provision-centos-base
check-up-kind-1.30-vgpu 0d8d7fa9fa4297105db441c6067fe174840ed6c2 link false /test check-up-kind-1.30-vgpu
check-up-kind-1.27-vgpu 0d8d7fa9fa4297105db441c6067fe174840ed6c2 link false /test check-up-kind-1.27-vgpu
Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
Kuruyia commented 2 months ago

hi @Kuruyia thanks for the persistence. I'm in favor of this change. Kind reminder to @xpivarc @brianmcarey

No problem.

It looks like the check-up-kind-1.*-vgpu tests failed, but looking at those test histories, they just tend to do that? I see that the required check-provision-* tests also didn't run on the last commit (0d8d7fa). Could you just rerun those?

oshoval commented 2 months ago

Thanks for your contribution.

It looks like the check-up-kind-1.*-vgpu tests failed, but looking at those test histories, they just tend to do that? I see that the required check-provision-* tests also didn't run on the last commit (0d8d7fa). Could you just rerun those?

Centos9 folder which is the only changed folder doesnt affect kind-X providers. Jobs are running, here for example https://prow.ci.kubevirt.io/view/gs/kubevirt-prow/pr-logs/pull/kubevirt_kubevirtci/1164/check-provision-k8s-1.28/1824330643983568896 Prow / Tide takes care about it.

Please consider that check-provision uses podman, but the post submit itself uses docker, worth to check it as well, maybe for example by creating a draft PR that use docker to run presubmits and run rehearse there.