solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.09k stars 441 forks source link

admission control webhook for invalid settings #2561

Closed bdecoste closed 4 months ago

bdecoste commented 4 years ago

Note unit: minutes below vs unit: MINUTE

This is with validation enabled.

apiVersion: v1
items:
- apiVersion: gloo.solo.io/v1
  kind: Settings
  metadata:
    annotations:
      helm.sh/hook: pre-install,pre-upgrade
      helm.sh/hook-delete-policy: before-hook-creation
    creationTimestamp: "2020-02-06T14:40:43Z"
    generation: 21
    labels:
      app: gloo
      gloo: settings
    name: default
    namespace: gloo-system
    resourceVersion: "13303446"
    selfLink: /apis/gloo.solo.io/v1/namespaces/gloo-system/settings/default
    uid: a79d3ddd-48ee-11ea-956b-42010a8a00c5
  spec:
    discovery:
      fdsMode: BLACKLIST
    discoveryNamespace: gloo-system
    extauth:
      extauthzServerRef:
        name: extauth
        namespace: gloo-system
    gateway:
      validation:
        alwaysAccept: true
        proxyValidationServerAddr: gloo:9988
    gloo:
      xdsBindAddr: 0.0.0.0:9977
    kubernetesArtifactSource: {}
    kubernetesConfigSource: {}
    kubernetesSecretSource: {}
    ratelimit:
      descriptors:
      - key: generic_key
        rateLimit:
          requestsPerUnit: 3
          unit: minutes
        value: bucket
      - descriptors:
        - key: plan
          rateLimit:
            requestsPerUnit: 1
            unit: MINUTE
          value: BASIC
        - key: plan
          rateLimit:
            requestsPerUnit: 5
            unit: MINUTE
          value: PLUS
        key: account_id
    ratelimitServer:
      ratelimit_server_ref:
        name: rate-limit
        namespace: gloo-system
    refreshRate: 60s
    watchNamespaces:
    - gloo-system
$ kubectl -n gloo-system get pods
NAME                                                  READY   STATUS             RESTARTS   AGE
api-server-7b55d5747-scl8f                            3/3     Running            0          2d5h
discovery-7b5964874b-jv4lh                            0/1     CrashLoopBackOff   4          2d5h
extauth-d667869b4-sz649                               1/1     Running            0          2d5h
gateway-f98697989-dqgpg                               0/1     CrashLoopBackOff   4          2d5h
gateway-proxy-6694b97f95-nxc66                        1/1     Running            0          4h50m
gloo-585875fcb4-kjz76                                 0/1     CrashLoopBackOff   4          2d5h
glooe-grafana-675675c476-fsc5k                        1/1     Running            0          5d23h
glooe-prometheus-kube-state-metrics-749578f55-94tk5   1/1     Running            0          5d23h
glooe-prometheus-server-84465b55f9-fb8jp              2/2     Running            0          5d23h
observability-65d4877d98-2zp96                        0/1     Error              3          38m
rate-limit-d7795d4bc-8mkkm                            1/1     Running            0          2d5h
redis-89db9d595-wc8dp                                 1/1     Running            0          5d23h
rickducott commented 4 years ago

I'm wary of doing anything that masks an error like this. In general, if there are invalid settings, the behavior of these pods is undefined, and crashing seems preferable to only logging an error (as with all config errors, if the gateway proxy was previously configured, it will retain it's last known good configuration until Gloo is back online).

Even though you could say it's safe in the case of bad rate limiting config to fall back on not-rate-limited behavior, I think we'll eventually regret masking such an error, and generally will make mistakes about what settings are allowed to be invalid without degrading functionality.

All that said, this should be something that can be prevented with an admission control webhook.

rickducott commented 4 years ago

An offline solution to validate a bundle of config would resolve this any many other issues related to invalid config.

lgadban commented 4 years ago

Related: https://github.com/solo-io/gloo/issues/2797

github-actions[bot] commented 10 months ago

This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.

github-actions[bot] commented 4 months ago

This issue has been closed due to no activity in the last 12 months.