openshift / vertical-pod-autoscaler-operator

An Operator for running the Vertical Pod Autoscaler on OpenShift
Apache License 2.0
27 stars 30 forks source link

PODAUTO-99: Updates for 4.16 #155

Closed jkyros closed 8 months ago

jkyros commented 8 months ago

Updates for 4.16

I also had to do some additional one-off surgery:

openshift-ci-robot commented 8 months ago

@jkyros: This pull request references PODAUTO-99 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/openshift/vertical-pod-autoscaler-operator/pull/155): >Will add details when I pull it out of draft, want to see if I can bump controller-runtime without breaking things. Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fvertical-pod-autoscaler-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci[bot] commented 8 months ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

jkyros commented 8 months ago

/test all

jkyros commented 8 months ago

/test all

jkyros commented 8 months ago

/test e2e-aws-operator

jkyros commented 8 months ago

/test e2e-aws-operator

jkyros commented 8 months ago

Yep, that's what I was worried about. Just configuring the default namespaces in the new controller-runtime isn't good enough, it looks like maybe we were freeloading off something else in the old behavior. I'll figure it out next week:

E0210 03:20:22.845247       1 status.go:316] Error getting VerticalPodAutoscalerController: unable to get: /default because of unknown namespace for the cache
W0210 03:20:22.845263       1 status.go:226] Operator status degraded: error checking VPA controllers status: unable to get: /default because of unknown namespace for the cache
E0210 03:20:37.844984       1 status.go:316] Error getting VerticalPodAutoscalerController: unable to get: /default because of unknown namespace for the cache
W0210 03:20:37.845001       1 status.go:226] Operator status degraded: error checking VPA controllers status: unable to get: /default because of unknown namespace for the cache
E0210 03:20:52.845005       1 status.go:316] Error getting VerticalPodAutoscalerController: unable to get: /default because of unknown namespace for the cache
W0210 03:20:52.845024       1 status.go:226] Operator status degraded: error checking VPA controllers status: unable to get: /default because of unknown namespace for the cache
E0210 03:21:07.845086       1 status.go:316] Error getting VerticalPodAutoscalerController: unable to get: /default because of unknown namespace for the cache
W0210 03:21:07.845103       1 status.go:226] Operator status degraded: error checking VPA controllers status: unable to get: /default because of unknown namespace for the cache
E0210 03:21:22.845002       1 status.go:316] Error getting VerticalPodAutoscalerController: unable to get: /default because of unknown namespace for the cache
W0210 03:21:22.845024       1 status.go:226] Operator status degraded: error checking VPA controllers status: unable to get: /default because of unknown namespace for the cache
jkyros commented 8 months ago

/test e2e-aws-operator

jkyros commented 8 months ago

Yeah, it looks like there were a couple spots where we didn't fill the namespace into the namespacedname, and the old cache used to just kind of assume the default namespace because it could only handle one, but the new cache doesn't do that anymore. /test e2e-aws-operator

jkyros commented 8 months ago

/test e2e-aws-operator

jkyros commented 8 months ago

That might be a flake, but it's entirely possible the upstream test changed:

{ failed [FAILED] timed out waiting for the condition
In [SynchronizedBeforeSuite] at: /tmp/tmp.J4s7R4xkJU/src/k8s.io/autoscaler/vertical-pod-autoscaler/e2e/v1/e2e.go:220 @ 02/10/24 09:30:58.864
}

/test e2e-aws-operator

jkyros commented 8 months ago

CI is having trouble pulling images today, I gave it a bit, let's try again /test e2e-aws-operator

jkyros commented 8 months ago

/test e2e-aws-operator

jkyros commented 8 months ago

I don't know what changed in the upstream test, but it is now upset about the 3 masters that aren't schedulable. /test e2e-aws-operator

openshift-ci-robot commented 8 months ago

@jkyros: This pull request references PODAUTO-99 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/openshift/vertical-pod-autoscaler-operator/pull/155): >Updates for 4.16 > >- Update deps > - edit (update to 4.16/0.29.0) and run hack/update-vendor.sh >- Update code and build to match updated deps >- Updated go version to go 1.21 (go.mod, Dockerfile, images/ci/Dockerfile, Makefile, vet, lint) >- Rev version from 4.15 -> 4.16: > `sed -i 's/4.15/4.16/g' $(git grep -l 4.15 manifests/) images/ci/bundle.Dockerfile hack/manifest-diff-upstream.sh hack/e2e.sh Makefile` >- Verify that Dockerfile.rhel7 & .ci-operator.yaml are up to date (they are. [Thanks, ART team](https://github.com/openshift/vertical-pod-autoscaler-operator/pull/151)!) >- Updated manifests to match upstream > >I also had to do some additional one-off surgery: >- the move to conroller-runtime 0.17.0 messed with how the cache worked, so I had to fill in some of our NamespacedNames because the namespaced cache doesn't do it automatically anymore (I held us at 0.15.2 last time to postpone it until I understood the failure, but it's handled now) >- The CVO retired some of their helper functions but we were still using them, so I pulled them into our own `lib/resourcemerge` >- The upstream test suite changed and was picker about the environment (node schedulability, directory mutability) so I had to adjust a couple of our command line options so the tests wouldn't fail immediately on setup Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fvertical-pod-autoscaler-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
jkyros commented 8 months ago

/test all

openshift-ci-robot commented 8 months ago

@jkyros: This pull request references PODAUTO-99 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/openshift/vertical-pod-autoscaler-operator/pull/155): >Updates for 4.16 > >- Update deps > - edit (update to 4.16/0.29.0) and run hack/update-vendor.sh >- Update code and build to match updated deps >- Update go version to go 1.21 (go.mod, Dockerfile, images/ci/Dockerfile, Makefile, vet, lint) >- Rev version from 4.15 -> 4.16: > `sed -i 's/4.15/4.16/g' $(git grep -l 4.15 manifests/) images/ci/bundle.Dockerfile hack/manifest-diff-upstream.sh hack/e2e.sh Makefile` >- Verify that Dockerfile.rhel7 & .ci-operator.yaml are up to date (they are. [Thanks, ART team](https://github.com/openshift/vertical-pod-autoscaler-operator/pull/151)!) >- Update manifests to match upstream > >I also had to do some additional one-off surgery: >- the move to conroller-runtime 0.17.0 messed with how the cache worked, so I had to fill in some of our NamespacedNames because the namespaced cache doesn't do it automatically anymore (I held us at 0.15.2 last time to postpone it until I understood the failure, but it's handled now) >- The CVO retired some of their helper functions but we were still using them, so I pulled them into our own `lib/resourcemerge` >- The upstream test suite changed and was picker about the environment (node schedulability, directory mutability) so I had to adjust a couple of our command line options so the tests wouldn't fail immediately on setup Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fvertical-pod-autoscaler-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci-robot commented 8 months ago

@jkyros: This pull request references PODAUTO-99 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/openshift/vertical-pod-autoscaler-operator/pull/155): >Updates for 4.16 > >- Update deps > - edit (update to 4.16/0.29.0) and run hack/update-vendor.sh >- Update code and build to match updated deps >- Update go version to go 1.21 (go.mod, Dockerfile, images/ci/Dockerfile, Makefile, vet, lint) >- Rev version from 4.15 -> 4.16: > `sed -i 's/4.15/4.16/g' $(git grep -l 4.15 manifests/) images/ci/bundle.Dockerfile hack/manifest-diff-upstream.sh hack/e2e.sh Makefile` >- Verify that Dockerfile.rhel7 & .ci-operator.yaml are up to date (they are. [Thanks, ART team](https://github.com/openshift/vertical-pod-autoscaler-operator/pull/151)!) >- Update manifests to match upstream > >I also had to do some additional one-off surgery: >- The move to conroller-runtime 0.17.0 messed with how the cache worked, so I had to fill in some of our NamespacedNames because the namespaced cache doesn't do it automatically anymore (I held us at 0.15.2 last time to postpone it until I understood the failure, but it's handled now) >- The CVO retired some of their helper functions but we were still using them, so I pulled them into our own `lib/resourcemerge` >- The upstream test suite changed and was picker about the environment (node schedulability, directory mutability) so I had to adjust a couple of our command line options so the tests wouldn't fail immediately on setup Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fvertical-pod-autoscaler-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci-robot commented 8 months ago

@jkyros: This pull request references PODAUTO-99 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/openshift/vertical-pod-autoscaler-operator/pull/155): >Updates for 4.16 > >- Update deps > - edit (update to 4.16/0.29.0) and run hack/update-vendor.sh >- Update code and build to match updated deps >- Update go version to go 1.21 (go.mod, Dockerfile, images/ci/Dockerfile, Makefile, vet, lint) >- Rev version from 4.15 -> 4.16: > `sed -i 's/4.15/4.16/g' $(git grep -l 4.15 manifests/) images/ci/bundle.Dockerfile hack/manifest-diff-upstream.sh hack/e2e.sh Makefile` >- Verify that Dockerfile.rhel7 & .ci-operator.yaml are up to date (they are. [Thanks, ART team](https://github.com/openshift/vertical-pod-autoscaler-operator/pull/151)!) >- Update manifests to match upstream > >I also had to do some additional one-off surgery: >- The move to conroller-runtime 0.17.0 messed with how the cache worked, so I had to fill in some of our NamespacedNames because the namespaced cache doesn't do it automatically anymore (I held us at 0.15.2 last time to postpone it until I understood the failure, but it's handled now) >- The CVO retired some of their helper functions ( https://github.com/openshift/cluster-version-operator/pull/1012/commits/77075ae3f665f7775ce48de91ddad76d52accda6) but we were still using them, so I pulled them into our own `lib/resourcemerge` >- The upstream test suite changed and was picker about the environment (node schedulability, directory mutability) so I had to adjust a couple of our command line options so the tests wouldn't fail immediately on setup Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fvertical-pod-autoscaler-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
jkyros commented 8 months ago

helps if I actually commit the right vendored deps :smile: /test all

openshift-ci-robot commented 8 months ago

@jkyros: This pull request references PODAUTO-99 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/openshift/vertical-pod-autoscaler-operator/pull/155): >Updates for 4.16 > >- Update deps > - edit (update to 4.16/0.29.0) and run hack/update-vendor.sh >- Update code and build to match updated deps >- Update go version to go 1.21 (go.mod, Dockerfile, images/ci/Dockerfile, Makefile, vet, lint) >- Rev version from 4.15 -> 4.16: > `sed -i 's/4.15/4.16/g' $(git grep -l 4.15 manifests/) images/ci/bundle.Dockerfile hack/manifest-diff-upstream.sh hack/e2e.sh Makefile` >- Verify that Dockerfile.rhel7 & .ci-operator.yaml are up to date (they are. [Thanks, ART team](https://github.com/openshift/vertical-pod-autoscaler-operator/pull/151)!) >- Update manifests to match upstream > >I also had to do some additional one-off surgery: >- The move to conroller-runtime 0.17.0 messed with how the cache worked, so I had to fill in some of our NamespacedNames because the namespaced cache doesn't do it automatically anymore (I held us at 0.15.2 last time to postpone it until I understood the failure, but it's handled now) >- The CVO retired some of their helper functions ( https://github.com/openshift/cluster-version-operator/pull/1012/commits/77075ae3f665f7775ce48de91ddad76d52accda6) but we were still using them, so I pulled them into our own `lib/resourcemerge` >- The upstream test suite changed and was pickier about the environment (node schedulability, directory mutability) so I had to adjust a couple of our command line options so the tests wouldn't fail immediately on setup Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fvertical-pod-autoscaler-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
jkyros commented 8 months ago

/test all

jkyros commented 8 months ago

/test unit

jkyros commented 8 months ago

e2e-aws-olm and e2e-aws-operator won't pass until https://github.com/openshift/kubernetes-autoscaler/pull/286 merges

joelsmith commented 8 months ago

Whew! This one was trickier than the average version update. Thanks for tracking everything down and cleaning up the messes!

/lgtm

openshift-ci[bot] commented 8 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jkyros, joelsmith

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/vertical-pod-autoscaler-operator/blob/master/OWNERS)~~ [jkyros,joelsmith] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-ci-robot commented 8 months ago

/retest-required

Remaining retests: 0 against base HEAD a194962f316ab288e2f114ae138300e939f219cb and 2 for PR HEAD aca250fc5f71ef3e9319a16f0ffb08c9bb51ebf5 in total

joelsmith commented 8 months ago

/retest

openshift-ci[bot] commented 8 months ago

@jkyros: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-bot commented 8 months ago

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-vertical-pod-autoscaler-operator-container-v4.16.0-202402210939.p0.g662efe3.assembly.stream.el9 for distgit vertical-pod-autoscaler-operator. All builds following this will include this PR.