knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.53k stars 1.15k forks source link

Add ability to set cluster wide defaults for qpoptions #13947

Closed davidhadas closed 1 year ago

davidhadas commented 1 year ago

Serving is extended with qpotions - see https://knative.dev/docs/serving/services/using-queue-extensions/ Support includes setting qpoptions when a service is created e.g.:

apiVersion: serving.knative.dev/v1
kind: Service
...
spec:
  template:
    metadata:
        annotations:
          qpoption.knative.dev/testgate-activate: enable
          qpoption.knative.dev/testgate-config-response: CU
          qpoption.knative.dev/testgate-config-sender: Joe
...

Today, Security-Guard uses annotation to configure the Security-Guard using annotations such as:

        qpoption.knative.dev/guard-activate: enable

Cluster Admins need a way to set qpoptions cluster-wide to control the queue-proxy extensions (options).

It is suggested: Cluster Admins will be able to set default qpoptions values that will be set in the config-defaults

Since default values are supported the suggested change will support default values for any qpoptions - the feature defines the default while the service can still change that default and use an alternative value for the same configuration point.

For example:

# List of qpoptions used as the default for each deployment, for example  
#      qpoptions: "guard-activate: enable, guard-config-pod-monitor-interval: 2s"
qpoptions: ""
davidhadas commented 1 year ago

@dprotaso , @psschwei, @evankanderson, @rhuss
We have an ask for allowing admins to set security-guard to be active by default (when deployed)

To do that, we need Serving help to add admin ability for setting defaults to qpoptions with ease. We also need to be able to set such defaults when installing security-guard using the operator.

psschwei commented 1 year ago

Would it be possible to, for example, add a webhook to security-guard to handle this? i.e. if the admin wants to set default options cluster-wide, set the config in security-guard which would then install a webhook to add it to the pod? Not sure how feasible that would be, but if possible that might make the most sense. Reason I'm suggesting this option is I think one of the lessons we learned with the freezer is that it putting these kinds of things in serving was less than ideal, both for the extension (dependencies on others / harder to get code merged) and for serving (more code to maintain).

davidhadas commented 1 year ago

We do not add guard specifics to Serving.

We extend qpoptions support in a very natural way - it is really not a task for guard to add webhook and manage serving specifics such as qpoptions. Qpoption is a serving concept (not guard). It is a task for serving to send the right qpoptions as desired to queue proxy extensions.

When we created qpoptions, we added a generic method to extend queue-proxy with extensions. Then we added a mechanism for devops to activate each extension and configure each extension. Serving is agnostic to which extensions are used and what their configuration means. Extensions are agostic to how the decision was made to activate them or to have them configured in this or that way.

This issue asks to let the admin control the defaults of the same generic qpoptions that serving already supports.

Sadly, freezer was added before we added support for qpoptions. Therfore, freezer added its own specifics to Serving, which indeed is not advisable for the reasons mentioned by @psschwei.

psschwei commented 1 year ago

My bad, for some reason I was thinking qpoptions was a guard thing not a serving thing.

(btw, we dropped the freezer bits in v1.10)

dprotaso commented 1 year ago

qpoptions was added because of guard and I don't think it's the right mechanic because it's very visible to the user when it shouldn't be. Secondly there's nothing preventing users from abusing qpoptions since they can add it themselves.

I think @psschwei is on to something where Guard should probably have it's own config map and then use a webhook to inject what it needs into the Knative pods.

With the webhook guard could mount the pod info itself and we could even drop the feature gate queueproxy.mount-podinfo

davidhadas commented 1 year ago

qpoptions was added because of guard and I don't think it's the right mechanic because it's very visible to the user when it shouldn't be. Secondly there's nothing preventing users from abusing qpoptions since they can add it themselves.

Qpoptions was discussed and approved. There was much effort that went into creating it + adjusting Guard to work with it. It is stable and working.

Qpoption are there such that users will be able to use it and configure the queue extensions. This is not a disadventage. It is by design.

A mechanism to configure extensions should be visible to the user.

I think @psschwei is on to something where Guard should probably have it's own config map and then use a webhook to inject what it needs into the Knative pods.

With the webhook guard could mount the pod info itself and we could even drop the feature gate queueproxy.mount-podinfo

This was conaideres in the past and we went with a more integrated solution which is helpful for admins and users. Queue proxy today do not use a config map. Guard extends queue proxy and it makes no sense for guard to use a config map. I think there were othe arguments against this approach - anyhiw it we already implemented a generic mechanism to solve this and should not go backwards.

dprotaso commented 1 year ago

Qpoption are there such that users will be able to use it and configure the queue extensions. This is not a disadventage. It is by design. .. A mechanism to configure extensions should be visible to the user.

But there's nothing preventing users from disabling things that should be on by setting qpoption.knative.dev/testgate-activate: false

Queue proxy today do not use a config map. Guard extends queue proxy and it makes no sense for guard to use a config map.

The idea here is the config map would be the way to configure a guard webhook that would add the default annotations to the Pods

anyhiw it we already implemented a generic mechanism to solve this and should not go backwards.

This mechanism is very leaky and I think it was a useful experiment to help guard - but I think moving forward as we try to handle use cases for other personas (operators) it requires re-work.

Guard as a extension should probably have user facing annotations with the prefix guard.knative.dev/* - the queue proxy is an internal detail

davidhadas commented 1 year ago

But there's nothing preventing users from disabling things that should be on by setting qpoption.knative.dev/testgate-activate: false

The ask is to support cluster-wide defaults, Admin will define the defaults, and DevOps may change them for specific needs on specific services. It is fine, if a DevOp deactivates an extension, or changes its configuration. Knative uses a similar approach elsewhere, where the admin sets defaults that can be changed per service.

The idea here is the config map would be the way to configure a guard webhook that would add the default annotations to the Pods

Under Knative, Guard-gates are deployed as extensions to queue-proxy container, they get all their configs from the Main of queue-proxy - Knative deploys a webhook to services and then has a controller for revisions and deployments which then control pods. In this context, I cant see how an additional webhook for the queue-proxy extension is helpful.

This mechanism is very leaky

Leaky in what way?
Leaky in indicating that Guard is an extension to queue-proxy? (If so, why is this a problem that needs fixing? In Knative, Guard extends queue-proxy - it adds security capabilities that queue-proxy does not have without it.)

...and I think it was a useful experiment to help guard

Allow me to strongly disagree with this statement - a year ago, we designed this extensibility as the right way to extend queuer-proxy. And may I say that the Serving WG Lead was particular about the requirements of this extendibility :) and about not accepting any Guard-specific work - this was a very good thing, as it forced all of us to create a decent and generic design.

...but I think moving forward as we try to handle use cases for other personas (operators) it requires re-work.

We should not block this PR simply because we may want to do a re-work sometime in the future.

If there are some ideas on how to improve the current extendibility of queue-proxy, let's meet and discuss those ideas, create a proposal, open an issue, and at some point start working on the new design. This may take some time, since it will need to be prioritized with many other tasks we need to do in Security WG, in Serving WG, and in Guard. But if we really need to change and re-work the design, let's discuss this and initiate this.

But meanwhile, we really should not block this PR which continues on the qpoptions current design...

Guard as an extension should probably have user-facing annotations with the prefix guard.knative.dev/* - the queue proxy is an internal detail

Using qpoption.knative.dev/guard-activate: true is a reasonable user-facing annotation that we agreed on and is presently implemented. Imho, changing this to guard.knative.dev/* should not be a priority.

evankanderson commented 1 year ago

A couple thoughts:

  1. It seems like it might be generically useful for admins to be able to set or default spec.template.metadata.annotations on the Deployment created by a Revision (or possibly metadata.annotations on the Revision itself). If that mechanism gets used to set annotations that start with qpoption.knative.dev/..., then that seems no different than setting cluster-autoscaler.kubernetes.io/safe-to-evict.
  2. Even more generically, it seems like it would generally be useful to have a mechanism to customize the resources created by a Revision, particularly the Deployment. I could also see that in some cases it might be useful to be able to create an additional resource (like a PodDisruptionBudget) alongside the Deployment, so we may not want to limit this customization to only a single object / YAML document. I'm going to plug ytt here, but I think the capability is more important than the specific implementation.
davidhadas commented 1 year ago

As for option (1) - if there is a preference for having a more generic annotation extendibility instead of qptions-annotations, let's modify the PR to reflect this.

As for option (2) - this was discussed at https://github.com/knative/serving/issues/13629. Note that Serving reconciles all deployments often and the overhead we add per reconcile cycle should not be a significant one.

davidhadas commented 1 year ago

Adding default to qpotions was rejected. Adding generic default to serving annotations was also rejected. Instead, to support defaults for security-guard, a webhook will be developed as part of security-guard.