Open salaboy opened 2 years ago
The service won't get marked as failed until the progress deadline expires (default it 10m). If you set a lower progress deadline, for example:
spec:
template:
metadata:
annotations:
serving.knative.dev/progress-deadline: "10s"
networking.knative.dev/visibility: cluster-local
then the service readiness status will be marked as "false" in about 40s (10s progress deadline + 30s grace period)
$ kn service list
NAME URL LATEST AGE CONDITIONS READY REASON
hello http://hello.default.10.100.121.103.sslip.io 2m10s 0 OK / 3 False RevisionMissing : Configuration "hello" does not have any ready Revision.
$ k get ksvc hello -o yaml
<snip>
status:
conditions:
- lastTransitionTime: "2022-06-22T18:14:58Z"
message: 'Revision "hello-00001" failed with message: .'
reason: RevisionFailed
status: "False"
type: ConfigurationsReady
- lastTransitionTime: "2022-06-22T18:14:58Z"
message: Configuration "hello" does not have any ready Revision.
reason: RevisionMissing
status: "False"
type: Ready
- lastTransitionTime: "2022-06-22T18:14:58Z"
message: Configuration "hello" does not have any ready Revision.
reason: RevisionMissing
status: "False"
type: RoutesReady
latestCreatedRevisionName: hello-00001
observedGeneration: 1
url: http://hello.default.10.100.121.103.sslip.io
The ConfigurationReady
error message could be a little nicer though... I think I did something similar for failing revisions, so should be possible to piggyback off that...
@psschwei that is interesting.. Do we know why the annotation makes it never be ready when it shouldn't affect the behavior?
Also, 10 mins waiting time to be ready sounds like a lot, but I am guessing that is covering cases where downloading the container might take a lot of time. Are there any other cases where we need to wait for that?
10 mins waiting time to be ready sounds like a lot
At one point in time we did have it much shorter, but it was changed it to be in sync with the default Kubernetes value.
Do we know why the annotation makes it never be ready when it shouldn't affect the behavior?
It looks it's an issue with the deployment reconciliation... I see the following in the deployment events:
$ k describe revision
<snip>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning InternalError 10m (x2 over 10m) revision-controller failed to update deployment "hello-00001-deployment": Operation cannot be fulfilled on deployments.apps "hello-00001-deployment": the object has been modified; please apply your changes to the latest version and try again
Also looks like the incorrect annotation is failing the SKS validation. From the webhook logs:
{"severity":"ERROR","timestamp":"2022-06-23T20:43:41.248281768Z","logger":"webhook","caller":"validation/validation_admit.go:181","message":"Failed the resource specific validation","commit":"3573163","knative.dev/pod":"webhook-6fd4c9cbc4-rnv88","knative.dev/kind":"networking.internal.knative.dev/v1alpha1, Kind=ServerlessService","knative.dev/namespace":"default","knative.dev/name":"hello-00001","knative.dev/operation":"CREATE","knative.dev/resource":"networking.internal.knative.dev/v1alpha1, Resource=serverlessservices","knative.dev/subresource":"","knative.dev/userinfo":"{system:serviceaccount:knative-serving:controller 8a1a5e12-4059-46e3-b0d2-9f9ebb74aab1 [system:serviceaccounts system:serviceaccounts:knative-serving system:authenticated] map[authentication.kubernetes.io/pod-name:[autoscaler-56975b5bbb-4x625] authentication.kubernetes.io/pod-uid:[b98cf5b1-ee75-4217-b396-ef194ea82051]]}","stacktrace":"knative.dev/pkg/webhook/resourcesemantics/validation.validate\n\tknative.dev/pkg@v0.0.0-20220610014025-7d607d643ee2/webhook/resourcesemantics/validation/validation_admit.go:181\nknative.dev/pkg/webhook/resourcesemantics/validation.(reconciler).Admit\n\tknative.dev/pkg@v0.0.0-20220610014025-7d607d643ee2/webhook/resourcesemantics/validation/validation_admit.go:80\nknative.dev/pkg/webhook.admissionHandler.func1\n\tknative.dev/pkg@v0.0.0-20220610014025-7d607d643ee2/webhook/admission.go:117\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\nnet/http.(ServeMux).ServeHTTP\n\tnet/http/server.go:2462\nknative.dev/pkg/webhook.(Webhook).ServeHTTP\n\tknative.dev/pkg@v0.0.0-20220610014025-7d607d643ee2/webhook/webhook.go:262\nknative.dev/pkg/network/handlers.(Drainer).ServeHTTP\n\tknative.dev/pkg@v0.0.0-20220610014025-7d607d643ee2/network/handlers/drain.go:110\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2916\nnet/http.(*conn).serve\n\tnet/http/server.go:1966"}
note - the visibility label doesn't go on the spec.template.metadata.labels
but on the metadata
of the top-level Knative Service or Route
see: https://knative.dev/docs/serving/services/private-services
Few things to take from this issue
/triage accepted
While setting an annotation to a Knative Service (that should be a label) the service never gets ready
The revision for that new version of the Knative Service shows some issues about not being observable:
/area networking /kind bug
What version of Knative?
1.6.x With Kourier as ingress
Expected Behavior
Setting an annotation that is not expected shouldn't break a Knative Service.
An error should help us to troubleshoot the issue.
Actual Behavior
It breaks
Steps to Reproduce the Problem
Add the cluster visibility as an annotation instead of a label