openshift / vertical-pod-autoscaler-operator

An Operator for running the Vertical Pod Autoscaler on OpenShift
Apache License 2.0
27 stars 30 forks source link

Revert "disable operator" now that the v0.0.0 tag has been pushed #4

Closed joelsmith closed 5 years ago

joelsmith commented 5 years ago

This reverts commit 94f9117170b3ad279a3cc72395466f8fcb2bdf64 from #3. Also:

joelsmith commented 5 years ago

We need to let this one soak for a while to make sure the pod comes up and doesn't crash loop. /hold

joelsmith commented 5 years ago

/test all

sjenning commented 5 years ago

seeing this in the operator logs; a failed reconciliation loop every 15m

I0630 15:57:50.713914       1 status.go:328] No VerticalPodAutoscalerController deployment. Reporting unavailable.
I0630 15:57:50.713942       1 status.go:253] Operator status progressing: updating to 0.0.1-2019-06-30-153733
I0630 15:58:05.713937       1 status.go:328] No VerticalPodAutoscalerController deployment. Reporting unavailable.
I0630 15:58:05.714389       1 status.go:253] Operator status progressing: updating to 0.0.1-2019-06-30-153733
I0630 15:58:20.713909       1 status.go:328] No VerticalPodAutoscalerController deployment. Reporting unavailable.
I0630 15:58:20.713933       1 status.go:253] Operator status progressing: updating to 0.0.1-2019-06-30-153733
joelsmith commented 5 years ago

@sjenning what else do we need to do before re-enabling the operator? I think I fixed the issue you saw in the logs, and it seems to pass the e2e test now.

sjenning commented 5 years ago

@joelsmith the recommender does appear to be deployed now. Two more things:

recommender logs indicate the VPA CRD is not registered

E0702 13:32:32.903086       1 reflector.go:134] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:95: Failed to list *v1beta2.VerticalPodAutoscaler: the server could not find the requested resource (get verticalpodautoscalers.autoscaling.k8s.io)
E0702 13:32:33.906839       1 reflector.go:134] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:95: Failed to list *v1beta2.VerticalPodAutoscaler: the server could not find the requested resource (get verticalpodautoscalers.autoscaling.k8s.io)
E0702 13:32:34.938393       1 reflector.go:134] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:95: Failed to list *v1beta2.VerticalPodAutoscaler: the server could not find the requested resource (get verticalpodautoscalers.autoscaling.k8s.io)

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_vertical-pod-autoscaler-operator/4/pull-ci-openshift-vertical-pod-autoscaler-operator-master-e2e-aws/9/artifacts/e2e-aws/pods/openshift-vertical-pod-autoscaler_vertical-pod-autoscaler-operator-9f564b594-966xf_vertical-pod-autoscaler-operator.log

need to add the VPA CRD yaml to /install

additionally, the logs for the operator indicate that it is redeploying the operand when the watch is reestablished approx every 7m updated: it is not actually redeploying the operand, even though the log message, to me anyway, seems to indicate that

W0702 13:07:47.255432       1 reflector.go:289] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Deployment ended with: too old resource version: 14224 (16588)
I0702 13:07:48.258750       1 verticalpodautoscaler_controller.go:124] Reconciling VerticalPodAutoscalerController default
I0702 13:07:48.264285       1 verticalpodautoscaler_controller.go:184] Updated VerticalPodAutoscalerController deployment: openshift-vertical-pod-autoscaler/vpa-recommender-default
W0702 13:14:15.262798       1 reflector.go:289] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Deployment ended with: too old resource version: 38683 (63034)
I0702 13:14:16.265613       1 verticalpodautoscaler_controller.go:124] Reconciling VerticalPodAutoscalerController default
I0702 13:14:16.270427       1 verticalpodautoscaler_controller.go:184] Updated VerticalPodAutoscalerController deployment: openshift-vertical-pod-autoscaler/vpa-recommender-default
W0702 13:27:04.271300       1 reflector.go:289] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Deployment ended with: too old resource version: 63638 (97223)
I0702 13:27:05.274231       1 verticalpodautoscaler_controller.go:124] Reconciling VerticalPodAutoscalerController default
I0702 13:27:05.278963       1 verticalpodautoscaler_controller.go:184] Updated VerticalPodAutoscalerController deployment: openshift-vertical-pod-autoscaler/vpa-recommender-default

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_vertical-pod-autoscaler-operator/4/pull-ci-openshift-vertical-pod-autoscaler-operator-master-e2e-aws/9/artifacts/e2e-aws/pods/openshift-vertical-pod-autoscaler_vertical-pod-autoscaler-operator-9f564b594-966xf_vertical-pod-autoscaler-operator.log

sjenning commented 5 years ago

/test e2e-aws once more for good measure

sjenning commented 5 years ago

TestInvalidRoleRefs is a new test that has been flaky since inception (https://bugzilla.redhat.com/show_bug.cgi?id=1727086) and redhat-operators was crashlooping. Once more. /test e2e-aws

sjenning commented 5 years ago

/retest

sjenning commented 5 years ago

/lgtm

openshift-ci-robot commented 5 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: joelsmith, sjenning

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/vertical-pod-autoscaler-operator/blob/master/OWNERS)~~ [joelsmith,sjenning] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
sjenning commented 5 years ago

/hold cancel

openshift-bot commented 5 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot commented 5 years ago

/retest

Please review the full test history for this PR and help us cut down flakes.