kubevirt / kubevirtci

Contains cluster definitions and client tools to quickly spin up and destroy ephemeral and scalable k8s and ocp clusters for testing
Apache License 2.0
79 stars 119 forks source link

Opts package #1217

Open aerosouund opened 1 month ago

aerosouund commented 1 month ago

What this PR does / why we need it:

Replace all bash script calls in the run command with the golang native opts package

Part of: #1110

Special Notes This branch is based off library-based-ssh. PR: #1209 it introduces the same commits but after library based ssh gets merged it will only introduce commits of the opts package

Checklist

Release note:

Move bash scripts to the gocli
kubevirt-bot commented 1 month ago

Hi @aerosouund. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
aerosouund commented 1 month ago

cc: @dhiller @brianmcarey @xpivarc @acardace

dhiller commented 4 weeks ago

/uncc @rmohr /cc @brianmcarey

brianmcarey commented 4 weeks ago

/test ?

kubevirt-bot commented 4 weeks ago

@brianmcarey: The following commands are available to trigger required jobs:

The following commands are available to trigger optional jobs:

Use /test all to run the following jobs that were automatically triggered:

In response to [this](https://github.com/kubevirt/kubevirtci/pull/1217#issuecomment-2250120766): >/test ? Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
brianmcarey commented 4 weeks ago

/test check-gocli /test check-provision-centos-base /test check-provision-k8s-1.30

aerosouund commented 3 weeks ago

@dhiller all points addressed :) please take a look again and let me know

p.s: i swapped the implementation of copy remote file to an implementation based on github.com/bramvdbogaerde/go-scp. The whole reason we needed to do what we did originally is because you couldn't scp as root the files that require that level of permission. but since the introduction of the root key opt this is no longer the case

xpivarc commented 2 weeks ago

/hold To not merge accidentally

brianmcarey commented 1 week ago

@xpivarc including service manifests like prometheus in gocli means that we loose some flexibility in deploying different versions of services to different cluster providers. I'm not sure how much of an issue that might be.

xpivarc commented 1 week ago

@xpivarc including service manifests like prometheus in gocli means that we loose some flexibility in deploying different versions of services to different cluster providers. I'm not sure how much of an issue that might be.

@brianmcarey Do we use it now? I am not aware... but I doubt that we loose it. The version can be templated and then it is only question of where do we encode what version is tight to what provider...

brianmcarey commented 1 week ago

@xpivarc including service manifests like prometheus in gocli means that we loose some flexibility in deploying different versions of services to different cluster providers. I'm not sure how much of an issue that might be.

@brianmcarey Do we use it now? I am not aware... but I doubt that we loose it. The version can be templated and then it is only question of where do we encode what version is tight to what provider...

I have a case where I just want to update the monitoring stack on the k8s-1.31 provider and I don't think I will be able to achieve this with how the manifests are handled at the moment. Here is the PR in question - https://github.com/kubevirt/kubevirtci/pull/1228

Of course this update to the monitoring stack could probably be applied to all providers without a problem but there are performance jobs that would be sensitive to changes to this stack.

aerosouund commented 1 week ago

@brianmcarey thats a valid point for sure, can you tell me what are these performance jobs that might be sensitive to the change ? i want to measure the impact and check if there is any way to avoid the degradation or potentially explore building stack versioning into the PR

brianmcarey commented 1 week ago

@brianmcarey thats a valid point for sure, can you tell me what are these performance jobs that might be sensitive to the change ? i want to measure the impact and check if there is any way to avoid the degradation or potentially explore building stack versioning into the PR

Metric names are prone to change - which could cause these performance jobs to stop recording metrics of interest. Performance jobs like - https://testgrid.k8s.io/kubevirt-periodics#periodic-kubevirt-e2e-k8s-1.29-sig-performance

aerosouund commented 1 week ago

@brianmcarey @xpivarc it seems that (according to my very limited understanding about that part of kubevirt) that we mainly depend on just two metrics, and these metrics are generated by kubevirt itself in those tests you linked

kubevirt_vmi_phase_transition_time_from_creation_seconds_bucket rest_client_requests_total

this is according to this documentation snippet i found in docs/perf-scale-benchmarks.md:

  1. The collected metrics are categorized into two buckets, performance and scale
    1. Performance Metrics: This tells users how KubeVirt stack is performing. Examples include vmiCreationToRunningSecondsP50 and vmiCreationToRunningSecondsP95. This helps users understand how KubeVirt performance evolved over the releases; depending on the user deployment, the numbers will vary, because a real production workload could use other KubeVirt extension points like the device plugins, custom scheduler, different version of kubelet etc. These numbers are just a guidance for how the KubeVirt codebase is performing with minimal VMIs, provided all other variables(hardware, kubernetes version, cluster-size etc) remain the same.
    2. Scalability metrics: This helps users understand the KubeVirt scaling behaviors. Examples include, PATCH-pods-count for VMI, PATCH-virtualmachineinstances-count for VMI and UPDATE-virtualmachineinstances-count for VMI. These metrics are measured on the client side to understand the load generated to apiserver by the KubeVirt stack. This will help users and developers understand the cost of new features going into KubeVirt. It will also make end-users aware about the most expensive calls coming from KubeVirt in their deployment and potentially act on it.

Based on this we can conclude that its safe to update the stack to any version we want, since we rely on kubevirt native metrics But also to be safe i want to ask this to someone from sig performance, is there anyone you can refer me to ? If you also know any information contradicting to this let me know

xpivarc commented 1 week ago

We also have a monitoring lane. I think that we could merge this if there is no multiple versions maintained now and we will add the support in the follow up. @brianmcarey WDYT?

brianmcarey commented 1 week ago

/test ?

kubevirt-bot commented 1 week ago

@brianmcarey: The following commands are available to trigger required jobs:

The following commands are available to trigger optional jobs:

Use /test all to run the following jobs that were automatically triggered:

In response to [this](https://github.com/kubevirt/kubevirtci/pull/1217#issuecomment-2278571651): >/test ? Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
brianmcarey commented 1 week ago

/test check-gocli /test check-provision-k8s-1.30 /test check-provision-k8s-1.29

aerosouund commented 1 week ago

@brianmcarey this failure might have been because I updated the go version locally, I reverted it to 1.21. but locally my toolchain is 1.22.3 and I believe some packages actually require 1.22 but lets try to compile it as such now again

brianmcarey commented 1 week ago

/test check-gocli /test check-provision-k8s-1.30 /test check-provision-k8s-1.29

aerosouund commented 1 week ago

its still trying to download 1.22 for some reason

CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags "-X 'kubevirt.io/kubevirtci/cluster-provision/gocli/images.SUFFIX=:'" -o /home/prow/go/src/github.com/kubevirt/kubevirtci/cluster-provision/gocli/build/cli ./cmd/cli
go: downloading go1.22.3 (linux/amd64)
go: download go1.22.3: golang.org/toolchain@v0.0.1-go1.22.3.linux-amd64: verifying module: checksum database disabled by GOSUMDB=off
make: *** [Makefile:22: cli] Error 1

even when go mod is now

module kubevirt.io/kubevirtci/cluster-provision/gocli

go 1.21
aerosouund commented 1 week ago

@brianmcarey @xpivarc Detected the issue on a machine with no go 1.22 Some packages that require 1.22 explicitly in vendor/modules.txt made the go toolchain upgrade itself while downloading the dependencies The dependent packages are : kubevirt.io/containerized-data-importer-api v1.59.1-0.20240610172909-253d5a6e7f95 k8s.io/pod-security-admission v0.22.0-beta.0.0.20240531013614-68e02f3c6735 k8s.io/client-go v0.30.1 k8s.io/apimachinery v0.30.3 k8s.io/api v0.30.3 istio.io/api v1.22.0 github.com/rook/rook/pkg/apis v0.0.0-20240801164516-9c98467f3c32

What is the possibility of bumping the go version during build ? if there is none, i will try to downgrade these packages

aerosouund commented 1 week ago

Not fully sure as well, it might just be that the gosumdb is off, on all machines i managed to compile this on it had GOSUMDB='sum.golang.org' is there any reason why its off in the first place ?

aerosouund commented 1 week ago

@brianmcarey I have verified that not using gosumdb was the problem, i have added it to the makefile and fixed the merge conflict from my perspective.. i don't know why we wouldn't use the gosumdb atleast in building the gocli. if there are any problems with this let me know

brianmcarey commented 1 week ago

/test all

aerosouund commented 1 week ago

@brianmcarey it still fails to build because of the CDI api's required version go: kubevirt.io/containerized-data-importer-api in vendor/modules.txt requires go >= 1.22.3 (running go 1.22.2)

brianmcarey commented 1 week ago

@brianmcarey it still fails to build because of the CDI api's required version go: kubevirt.io/containerized-data-importer-api in vendor/modules.txt requires go >= 1.22.3 (running go 1.22.2)

Yes I think we need to update the go version in the bootstrap image that is running those jobs - let me take a look

aerosouund commented 1 week ago

@brianmcarey updated the monitoring stack and the scripts

brianmcarey commented 1 week ago

/test all

brianmcarey commented 1 week ago

/test check-gocli /test check-provision-centos-base

xpivarc commented 1 week ago

/retest-required

xpivarc commented 1 week ago

/retest-required

xpivarc commented 6 days ago

/retest-required

brianmcarey commented 6 days ago

/test all

xpivarc commented 6 days ago

/retest-required

xpivarc commented 6 days ago

/retest-required

xpivarc commented 6 days ago

/retest-required

xpivarc commented 6 days ago

/test check-provision-k8s-1.31

xpivarc commented 6 days ago

/test check-provision-k8s-1.29

xpivarc commented 6 days ago

/test check-provision-k8s-1.29

xpivarc commented 6 days ago

/test check-provision-k8s-1.31

xpivarc commented 6 days ago

/test check-provision-k8s-1.28 /test check-provision-k8s-1.29

brianmcarey commented 5 days ago

/test check-provision-k8s-1.30

aerosouund commented 5 days ago

@brianmcarey @xpivarc The previous issues were eliminated but there was another issue with ipv6 handling, this has now been dealt with I also managed to perfectly replicate the CI setup on my testing environment to catch those bugs. i was previously testing using make cluster-up which obviously missed a few things

I also cleaned up the commits Please let's rerun the tests

brianmcarey commented 4 days ago

/test check-provision-k8s-1.30

aerosouund commented 4 days ago

@brianmcarey 1.30 passed, can we please run the rest ?

xpivarc commented 4 days ago

/test check-provision-k8s-1.31 /test check-provision-k8s-1.29

brianmcarey commented 3 days ago

/test all

aerosouund commented 3 days ago

@brianmcarey Any other areas you think we should address before an lgtm ?

dhiller commented 3 hours ago

/test check-provision-k8s-1.31