Knative Service failing to become ready if label is confused with annotation

While setting an annotation to a Knative Service (that should be a label) the service never gets ready

spec:
  template:
    metadata:
      annotations:
        networking.knative.dev/visibility: cluster-local

salaboy> k get ksvc
NAME       URL                                           LATESTCREATED    LATESTREADY      READY     REASON
agenda     http://agenda.default.34.79.9.73.sslip.io     agenda-00002     agenda-00001     Unknown

The revision for that new version of the Knative Service shows some issues about not being observable:

salaboy> k describe revision agenda-00002
Name:         agenda-00002
Namespace:    default
Labels:       serving.knative.dev/configuration=agenda
              serving.knative.dev/configurationGeneration=2
              serving.knative.dev/configurationUID=9e5fb7d2-dcf5-4cf4-8a7c-b25bef148de1
              serving.knative.dev/routingState=active
              serving.knative.dev/service=agenda
              serving.knative.dev/serviceUID=0ce1df8f-e380-4c0b-a119-43d1398e9bcd
Annotations:  client.knative.dev/updateTimestamp: 2022-06-21T20:54:58Z
              client.knative.dev/user-image: ghcr.io/salaboy/fmtok8s-email-service:v0.0.1-native
              networking.knative.dev/visibility: cluster-local
              serving.knative.dev/creator: msalatino@vmware.com
              serving.knative.dev/routes: agenda
              serving.knative.dev/routingStateModified: 2022-06-22T11:52:27Z
API Version:  serving.knative.dev/v1
Kind:         Revision
Metadata:
  Creation Timestamp:  2022-06-22T11:52:27Z
  Generation:          1
  Managed Fields:
    API Version:  serving.knative.dev/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:client.knative.dev/updateTimestamp:
          f:client.knative.dev/user-image:
          f:networking.knative.dev/visibility:
          f:serving.knative.dev/creator:
          f:serving.knative.dev/routes:
          f:serving.knative.dev/routingStateModified:
        f:labels:
          .:
          f:serving.knative.dev/configuration:
          f:serving.knative.dev/configurationGeneration:
          f:serving.knative.dev/configurationUID:
          f:serving.knative.dev/routingState:
          f:serving.knative.dev/service:
          f:serving.knative.dev/serviceUID:
        f:ownerReferences:
          .:
          k:{"uid":"9e5fb7d2-dcf5-4cf4-8a7c-b25bef148de1"}:
      f:spec:
        .:
        f:containerConcurrency:
        f:containers:
        f:enableServiceLinks:
        f:timeoutSeconds:
    Manager:      Go-http-client
    Operation:    Update
    Time:         2022-06-22T11:52:27Z
    API Version:  serving.knative.dev/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:conditions:
        f:containerStatuses:
        f:observedGeneration:
    Manager:      Go-http-client
    Operation:    Update
    Subresource:  status
    Time:         2022-06-22T11:52:27Z
  Owner References:
    API Version:           serving.knative.dev/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Configuration
    Name:                  agenda
    UID:                   9e5fb7d2-dcf5-4cf4-8a7c-b25bef148de1
  Resource Version:        30637176
  UID:                     1dd8a3e1-5332-41f3-aa62-e723bf6ea91b
Spec:
  Container Concurrency:  0
  Containers:
    Image:  ghcr.io/salaboy/fmtok8s-email-service:v0.0.1-native
    Name:   user-container
    Readiness Probe:
      Success Threshold:  1
      Tcp Socket:
        Port:  0
    Resources:
  Enable Service Links:  false
  Timeout Seconds:       300
Status:
  Conditions:
    Last Transition Time:  2022-06-22T11:52:28Z
    Message:               unsuccessfully observed a new generation
    Reason:                NewObservedGenFailure
    Severity:              Info
    Status:                Unknown
    Type:                  Active
    Last Transition Time:  2022-06-22T11:52:27Z
    Reason:                Deploying
    Status:                Unknown
    Type:                  ContainerHealthy
    Last Transition Time:  2022-06-22T11:52:27Z
    Reason:                Deploying
    Status:                Unknown
    Type:                  Ready
    Last Transition Time:  2022-06-22T11:52:27Z
    Reason:                Deploying
    Status:                Unknown
    Type:                  ResourcesAvailable
  Container Statuses:
    Image Digest:       ghcr.io/salaboy/fmtok8s-email-service@sha256:86c5d010599a2d633f5dd7a75bbbff1a874008bcb3501ca3243146e4a7819adc
    Name:               user-container
  Observed Generation:  1
Events:                 <none>

/area networking /kind bug

What version of Knative?

1.6.x With Kourier as ingress

Expected Behavior

Setting an annotation that is not expected shouldn't break a Knative Service.

An error should help us to troubleshoot the issue.

Actual Behavior

It breaks

Steps to Reproduce the Problem

Add the cluster visibility as an annotation instead of a label

spec:
  template:
    metadata:
      annotations:
        networking.knative.dev/visibility: cluster-local

The service won't get marked as failed until the progress deadline expires (default it 10m). If you set a lower progress deadline, for example:

spec:
 template:
  metadata:
    annotations:
      serving.knative.dev/progress-deadline: "10s"
      networking.knative.dev/visibility: cluster-local

then the service readiness status will be marked as "false" in about 40s (10s progress deadline + 30s grace period)

$ kn service list
NAME    URL                                            LATEST   AGE     CONDITIONS   READY   REASON
hello   http://hello.default.10.100.121.103.sslip.io            2m10s   0 OK / 3     False   RevisionMissing : Configuration "hello" does not have any ready Revision.

$ k get ksvc hello -o yaml
<snip>
status:
  conditions:
  - lastTransitionTime: "2022-06-22T18:14:58Z"
    message: 'Revision "hello-00001" failed with message: .'
    reason: RevisionFailed
    status: "False"
    type: ConfigurationsReady
  - lastTransitionTime: "2022-06-22T18:14:58Z"
    message: Configuration "hello" does not have any ready Revision.
    reason: RevisionMissing
    status: "False"
    type: Ready
  - lastTransitionTime: "2022-06-22T18:14:58Z"
    message: Configuration "hello" does not have any ready Revision.
    reason: RevisionMissing
    status: "False"
    type: RoutesReady
  latestCreatedRevisionName: hello-00001
  observedGeneration: 1
  url: http://hello.default.10.100.121.103.sslip.io

The ConfigurationReady error message could be a little nicer though... I think I did something similar for failing revisions, so should be possible to piggyback off that...

@psschwei that is interesting.. Do we know why the annotation makes it never be ready when it shouldn't affect the behavior?

Also, 10 mins waiting time to be ready sounds like a lot, but I am guessing that is covering cases where downloading the container might take a lot of time. Are there any other cases where we need to wait for that?

10 mins waiting time to be ready sounds like a lot

At one point in time we did have it much shorter, but it was changed it to be in sync with the default Kubernetes value.

Do we know why the annotation makes it never be ready when it shouldn't affect the behavior?

It looks it's an issue with the deployment reconciliation... I see the following in the deployment events:

$ k describe revision 
<snip>
Events:
  Type     Reason         Age                From                 Message
  ----     ------         ----               ----                 -------
  Warning  InternalError  10m (x2 over 10m)  revision-controller  failed to update deployment "hello-00001-deployment": Operation cannot be fulfilled on deployments.apps "hello-00001-deployment": the object has been modified; please apply your changes to the latest version and try again

Also looks like the incorrect annotation is failing the SKS validation. From the webhook logs:

{"severity":"ERROR","timestamp":"2022-06-23T20:43:41.248281768Z","logger":"webhook","caller":"validation/validation_admit.go:181","message":"Failed the resource specific validation","commit":"3573163","knative.dev/pod":"webhook-6fd4c9cbc4-rnv88","knative.dev/kind":"networking.internal.knative.dev/v1alpha1, Kind=ServerlessService","knative.dev/namespace":"default","knative.dev/name":"hello-00001","knative.dev/operation":"CREATE","knative.dev/resource":"networking.internal.knative.dev/v1alpha1, Resource=serverlessservices","knative.dev/subresource":"","knative.dev/userinfo":"{system:serviceaccount:knative-serving:controller 8a1a5e12-4059-46e3-b0d2-9f9ebb74aab1 [system:serviceaccounts system:serviceaccounts:knative-serving system:authenticated] map[authentication.kubernetes.io/pod-name:[autoscaler-56975b5bbb-4x625] authentication.kubernetes.io/pod-uid:[b98cf5b1-ee75-4217-b396-ef194ea82051]]}","stacktrace":"knative.dev/pkg/webhook/resourcesemantics/validation.validate\n\tknative.dev/pkg@v0.0.0-20220610014025-7d607d643ee2/webhook/resourcesemantics/validation/validation_admit.go:181\nknative.dev/pkg/webhook/resourcesemantics/validation.(reconciler).Admit\n\tknative.dev/pkg@v0.0.0-20220610014025-7d607d643ee2/webhook/resourcesemantics/validation/validation_admit.go:80\nknative.dev/pkg/webhook.admissionHandler.func1\n\tknative.dev/pkg@v0.0.0-20220610014025-7d607d643ee2/webhook/admission.go:117\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\nnet/http.(ServeMux).ServeHTTP\n\tnet/http/server.go:2462\nknative.dev/pkg/webhook.(Webhook).ServeHTTP\n\tknative.dev/pkg@v0.0.0-20220610014025-7d607d643ee2/webhook/webhook.go:262\nknative.dev/pkg/network/handlers.(Drainer).ServeHTTP\n\tknative.dev/pkg@v0.0.0-20220610014025-7d607d643ee2/network/handlers/drain.go:110\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2916\nnet/http.(*conn).serve\n\tnet/http/server.go:1966"}

note - the visibility label doesn't go on the spec.template.metadata.labels but on the metadata of the top-level Knative Service or Route

see: https://knative.dev/docs/serving/services/private-services

Few things to take from this issue

https://github.com/knative/serving/issues/13131)
we should reject creates/updates on Knative Service Revision Template's annotations/labels we know to be problematic

/triage accepted

knative / serving