knative / operator

Combined operator for Knative.
Apache License 2.0
179 stars 98 forks source link

Failed Operator will leave Serving installation in a partial state. #1756

Open mmisztal1980 opened 3 months ago

mmisztal1980 commented 3 months ago

In what area(s)?

/area autoscale

What version of Knative?

0.11.x

kn version
Version:      v1.11.0
Build Date:   2023-07-27 07:42:56
Git Revision: b7508e67
Supported APIs:
* Serving
  - serving.knative.dev/v1 (knative-serving v1.11.0)
* Eventing
  - sources.knative.dev/v1 (knative-eventing v1.11.0)
  - eventing.knative.dev/v1 (knative-eventing v1.11.0)

Expected Behavior

Using kn service create 'hello-example' --image ghcr.io/knative/helloworld-go:latest --env TARGET="First" I'm expecting to deploy a hello-wolrd example to start playing with the knative.

Actual Behavior

kn service create 'hello-example' --image ghcr.io/knative/helloworld-go:latest --env TARGET="First"
Creating service 'hello-example' in namespace 'default':

  0.072s The Route is still working to reflect the latest desired specification.
  0.072s Configuration "hello-example" is waiting for a Revision to become ready.
  0.072s ...
  1.153s Revision "hello-example-00001" failed with message: Failed to create new replica set "hello-example-00001-deployment-7b56748d46": Unauthorized.
  1.166s Configuration "hello-example" does not have any ready Revision.
  1.176s ...
  1.179s Configuration "hello-example" is waiting for a Revision to become read

The process starts but doesn't complete. The pod is successfully scheduled in the default namespace and is ready, however the kn service is not

k get pods
NAME                                              READY   STATUS    RESTARTS   AGE
hello-example-00001-deployment-7b56748d46-mt5kk   2/2     Running   0          31s

Steps to Reproduce the Problem

apiVersion: v1
kind: Namespace
metadata:
  name: knative-serving
---
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving

Any addiitional details and investigation so far can be found on CNCF slack here

dprotaso commented 3 months ago

Following up here it looks like the default installation expects Istio and when it is not installed the operator will fail with Ready=False saying the Istio resources are not present.

This halts the installation of other manifests and leaves serving in a weird state. eg. in the above example the mutating & validating webhooks are not installed. This allowed the user to create a Knative Service and it reconciled all then when it created the PodAutoscaler it didn't default a annotation required to select which autoscaler to use.

Ideally it would be good to try to apply all the resources in the manifest and then report all errors the operator installation encounters.

But since the operator did report the failure I think we could just simply document checking the installation in the docs.

I'll leave this issue open for @houshengbo close out and make a docs issue.

github-actions[bot] commented 1 week ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.