kyma-project / api-gateway

Apache License 2.0
4 stars 27 forks source link

Introduce APIRule v1beta2 #996

Closed werdes72 closed 5 months ago

werdes72 commented 6 months ago

Description

Changes proposed in this pull request:

Related issues

986

triffer commented 5 months ago

The root cause for the failing UI tests is that the rules fields are not rendered. The UI renders the fields based on the Kubernetes OpenAPI response from /openapi/v2. The OpenAPI response is influenced by setting x-kubernetes-preserve-unknown-fields: true in the CRD. We can find more information about that in the Kubernetes CRD docs:

A structural schema is an OpenAPI v3.0 validation schema which:

specifies a non-empty type (via type in OpenAPI) for the root, for each specified field of an object node (via properties or >additionalProperties in OpenAPI) and for each item in an array node (via items in OpenAPI), with the exception of: a node with x-kubernetes-int-or-string: true a node with x-kubernetes-preserve-unknown-fields: true

For the old CRD the API response for rules looks like this:

"rules": {
          "description": "Represents the array of Oathkeeper access rules to be applied.",
          "type": "array",
          "minItems": 1,
          "items": {
            "description": "Rule .",
            "type": "object",
            "required": [
              "accessStrategies",
              "methods",
              "path"
            ],
            "properties": {
              "accessStrategies": {
                "description": "Specifies the list of access strategies. All strategies listed in [Oathkeeper documentation](https://www.ory.sh/docs/oathkeeper/pipeline/authn) are supported.",
                "type": "array",
                "minItems": 1,
                "items": {
                  "description": "Represents a handler that authenticates provided credentials. See the corresponding type in the oathkeeper-maester project.",
                  "type": "object",
                  "required": [
                    "handler"
                  ],
                  "properties": {
                    "config": {
                      "description": "Configures the handler. Configuration keys vary per handler.",
                      "x-kubernetes-preserve-unknown-fields": true
                    },
                    "handler": {
                      "description": "Specifies the name of the handler.",
                      "type": "string"
                    }
                  }
                }
              },
              "methods": {
                "description": "Represents the list of allowed HTTP request methods available for the **spec.rules.path**.",
                "type": "array",
                "minItems": 1,
                "items": {
                  "description": "HttpMethod specifies the HTTP request method. The list of supported methods is defined in RFC 9910: HTTP Semantics and RFC 5789: PATCH Method for HTTP.",
                  "type": "string",
                  "enum": [
                    "GET",
                    "HEAD",
                    "POST",
                    "PUT",
                    "DELETE",
                    "CONNECT",
                    "OPTIONS",
                    "TRACE",
                    "PATCH"
                  ]
                }
              },
              "mutators": {
                "description": "Specifies the list of [Ory Oathkeeper](https://www.ory.sh/docs/oathkeeper/pipeline/mutator) mutators.",
                "type": "array",
                "items": {
                  "description": "Mutator represents a handler that transforms the HTTP request before forwarding it. See the corresponding in the oathkeeper-maester project.",
                  "type": "object",
                  "required": [
                    "handler"
                  ],
                  "properties": {
                    "config": {
                      "description": "Configures the handler. Configuration keys vary per handler.",
                      "x-kubernetes-preserve-unknown-fields": true
                    },
                    "handler": {
                      "description": "Specifies the name of the handler.",
                      "type": "string"
                    }
                  }
                }
              },
              "path": {
                "description": "Specifies the path of the exposed service.",
                "type": "string",
                "pattern": "^([0-9a-zA-Z./*()?!\\\\_-]+)"
              },
              "service": {
                "description": "Describes the service to expose. Overwrites the **spec** level service if defined.",
                "type": "object",
                "required": [
                  "name",
                  "port"
                ],
                "properties": {
                  "external": {
                    "description": "Specifies if the service is internal (in cluster) or external.",
                    "type": "boolean"
                  },
                  "name": {
                    "description": "Specifies the name of the exposed service.",
                    "type": "string"
                  },
                  "namespace": {
                    "description": "Specifies the Namespace of the exposed service. If not defined, it defaults to the APIRule Namespace.",
                    "type": "string",
                    "pattern": "^[a-z0-9]([-a-z0-9]*[a-z0-9])?$"
                  },
                  "port": {
                    "description": "Specifies the communication port of the exposed service.",
                    "type": "integer",
                    "format": "int32",
                    "maximum": 65535,
                    "minimum": 1
                  }
                }
              },
              "timeout": {
                "description": "Timeout for HTTP requests in seconds. The timeout can be configured up to 3900 seconds (65 minutes).",
                "type": "integer",
                "maximum": 3900,
                "minimum": 1
              }
            }
          }
        },

For the new CRD with x-kubernetes-preserve-unknown-fields: true the response looks like this:

"rules": {
          "description": "Represents the array of Oathkeeper access rules to be applied.",
          "type": "array",
          "minItems": 1,
          "items": {
            "description": "Rule .",
            "required": [
              "accessStrategies",
              "methods",
              "path"
            ],
            "x-kubernetes-preserve-unknown-fields": true
          }
        },

As far as I know Busola uses /openapi/v2 endpoint to load the schema and parse it. You can reproduce this by checking the response of the request to this endpoint on initial rendering of Busola or directly query the Kubernetes API server using kubectl proxy --port=8080 & and curl http://localhost:8080/openapi/v2

triffer commented 5 months ago

We should also create an ADR for the decision to handle the certificate rotation in the controller instead of using a separate job.

videlov commented 5 months ago

The root cause for the failing UI tests is that the rules fields are not rendered. The UI renders the fields based on the Kubernetes OpenAPI response from /openapi/v2. The OpenAPI response is influenced by setting x-kubernetes-preserve-unknown-fields: true in the CRD. ...`

we decided we will remove the preserveUnknownFields and for extAuths we will introduce it as a new handler in v1beta1 accessStrategies so we have it fully convertable v1beta1 <-> v1beta2

triffer commented 5 months ago

As far as I can see there is no integration tests for the init container behaviour. Was this a decision or did we forget to evaluate this?

videlov commented 5 months ago

As far as I can see there is no integration tests for the init container behaviour. Was this a decision or did we forget to evaluate this?

there is real integration test that is validating that the conversion works, so it is checked. But will think of adding some envtest to check the outcome of the init-container work.

triffer commented 5 months ago

Maybe the following error that deletion is blocked is related to the issue reported in this comment. Applying the following APIRule with invalid jwt config will set the APIRule correctly in Error:

kubectl apply -f - <<EOF
apiVersion: gateway.kyma-project.io/v1beta2
kind: APIRule
metadata:
  name: httpbin-rule
  namespace: default
  labels:
    app: httpbin
spec:
  gateway: kyma-system/kyma-gateway
  hosts:
    - httpbin
  service:
    name: httpbin
    port: 8000
  rules:
    - path: /.*
      methods: [ "GET"]
      mutators: [ ]
      jwt: {}
EOF

Updating the APIRule by removing the jwt field and applying the APIRule again will keep the previous error:

kubectl apply -f - <<EOF
apiVersion: gateway.kyma-project.io/v1beta2
kind: APIRule
metadata:
  name: httpbin-rule
  namespace: default
  labels:
    app: httpbin
spec:
  gateway: kyma-system/kyma-gateway
  hosts:
    - httpbin
  service:
    name: httpbin
    port: 8000
  rules:
    - path: /.*
      methods: [ "GET"]
      mutators: [ ]
EOF

Then deleting the APIRule will not work and deletion is blocked and there are errors in the controller logs:

api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:37Z    INFO    apirule-controller  Starting reconciliation {"namespacedName": "default/httpbin-rule"}
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:37Z    INFO    apirule-controller  Starting ApiRule reconciliation{"jwtHandler": "istio"}
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:37Z    INFO    apirule-controller  ApiRule is converted from v1beta2
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:37Z    INFO    apirule-controller  Reconciling ApiRule {"name": "httpbin-rule", "namespace": "default", "resource version": "373798"}
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:37Z    ERROR   apirule-controller  Error happened during finalizer removal {"error": "APIRule.gateway.kyma-project.io \"httpbin-rule\" is invalid: spec.rules[0].accessStrategies: Required value"}
api-gateway-controller-manager-686799c6ff-mt775 manager github.com/kyma-project/api-gateway/controllers/gateway.(*APIRuleReconciler).Reconcile
api-gateway-controller-manager-686799c6ff-mt775 manager     /workspace/controllers/gateway/apirule_controller.go:201
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
api-gateway-controller-manager-686799c6ff-mt775 manager     /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:119
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
api-gateway-controller-manager-686799c6ff-mt775 manager     /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:316
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
api-gateway-controller-manager-686799c6ff-mt775 manager     /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:266
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
api-gateway-controller-manager-686799c6ff-mt775 manager     /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:227
api-gateway-controller-manager-686799c6ff-mt775 manager 2024/05/14 08:53:37 http: TLS handshake error from 10.42.0.0:47392: EOF
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:38Z    INFO    apirule-controller  Starting reconciliation {"namespacedName": "default/httpbin-rule"}
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:38Z    INFO    apirule-controller  Starting ApiRule reconciliation{"jwtHandler": "istio"}
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:38Z    INFO    apirule-controller  ApiRule is converted from v1beta2
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:38Z    INFO    apirule-controller  Reconciling ApiRule {"name": "httpbin-rule", "namespace": "default", "resource version": "373798"}
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:38Z    ERROR   apirule-controller  Error happened during finalizer removal {"error": "APIRule.gateway.kyma-project.io \"httpbin-rule\" is invalid: spec.rules[0].accessStrategies: Required value"}
api-gateway-controller-manager-686799c6ff-mt775 manager github.com/kyma-project/api-gateway/controllers/gateway.(*APIRuleReconciler).Reconcile
api-gateway-controller-manager-686799c6ff-mt775 manager     /workspace/controllers/gateway/apirule_controller.go:201
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
api-gateway-controller-manager-686799c6ff-mt775 manager     /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:119
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
api-gateway-controller-manager-686799c6ff-mt775 manager     /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:316
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
api-gateway-controller-manager-686799c6ff-mt775 manager     /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:266
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
api-gateway-controller-manager-686799c6ff-mt775 manager     /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:227
videlov commented 5 months ago

@werdes72 for the mentioned example cases when we do not have either validation or case not support, please add / adapt unittests so we have this covered.

triffer commented 5 months ago

Accidentally (😏 ) deleting the secret api-gateway-webhook-certificate will cause a continuous failed reconciliation being logged by the manager. Shouldn't we implement some self-healing capabilities, because only a manual restart of the manager will fix this.

api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T09:23:32Z    INFO    certificate-controller  Received reconciliation request{"namespace": "kyma-system", "name": "api-gateway-webhook-certificate"}
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T09:23:32Z    ERROR   Reconciler error    {"controller": "secret", "controllerGroup": "", "controllerKind": "Secret", "Secret": {"name":"api-gateway-webhook-certificate","namespace":"kyma-system"}, "namespace": "kyma-system", "name": "api-gateway-webhook-certificate", "reconcileID": "5b40d3d6-4e80-4df6-b55d-aec82729f2af", "error": "Secret \"api-gateway-webhook-certificate\" not found"}
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
api-gateway-controller-manager-686799c6ff-mt775 manager     /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:329
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
api-gateway-controller-manager-686799c6ff-mt775 manager     /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:266
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
api-gateway-controller-manager-686799c6ff-mt775 manager     /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:227
videlov commented 5 months ago

Accidentally (😏 ) deleting the secret api-gateway-webhook-certificate will cause a continuous failed reconciliation being logged by the manager. Shouldn't we implement some self-healing capabilities, because only a manual restart of the manager will fix this.

api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T09:23:32Z  INFO    certificate-controller  Received reconciliation request{"namespace": "kyma-system", "name": "api-gateway-webhook-certificate"}
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T09:23:32Z  ERROR   Reconciler error    {"controller": "secret", "controllerGroup": "", "controllerKind": "Secret", "Secret": {"name":"api-gateway-webhook-certificate","namespace":"kyma-system"}, "namespace": "kyma-system", "name": "api-gateway-webhook-certificate", "reconcileID": "5b40d3d6-4e80-4df6-b55d-aec82729f2af", "error": "Secret \"api-gateway-webhook-certificate\" not found"}
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
api-gateway-controller-manager-686799c6ff-mt775 manager   /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:329
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
api-gateway-controller-manager-686799c6ff-mt775 manager   /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:266
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
api-gateway-controller-manager-686799c6ff-mt775 manager   /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:227

I not sure about this one, I was thinking about it and I am more of a fan to deal with the problem with the workaround (restarting the operator). Deleting Secrets is kind of harsh and this is kyma-system resource, so has to be handled with an incident. Not a fan in the self-healing in this case.

barchw commented 5 months ago

Accidentally (😏 ) deleting the secret api-gateway-webhook-certificate will cause a continuous failed reconciliation being logged by the manager. Shouldn't we implement some self-healing capabilities, because only a manual restart of the manager will fix this.

api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T09:23:32Z    INFO    certificate-controller  Received reconciliation request{"namespace": "kyma-system", "name": "api-gateway-webhook-certificate"}
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T09:23:32Z    ERROR   Reconciler error    {"controller": "secret", "controllerGroup": "", "controllerKind": "Secret", "Secret": {"name":"api-gateway-webhook-certificate","namespace":"kyma-system"}, "namespace": "kyma-system", "name": "api-gateway-webhook-certificate", "reconcileID": "5b40d3d6-4e80-4df6-b55d-aec82729f2af", "error": "Secret \"api-gateway-webhook-certificate\" not found"}
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
api-gateway-controller-manager-686799c6ff-mt775 manager     /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:329
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
api-gateway-controller-manager-686799c6ff-mt775 manager     /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:266
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
api-gateway-controller-manager-686799c6ff-mt775 manager     /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:227

I not sure about this one, I was thinking about it and I am more of a fan to deal with the problem with the workaround (restarting the operator). Deleting Secrets is kind of harsh and this is kyma-system resource, so has to be handled with an incident. Not a fan in the self-healing in this case.

Maybe we could secure it with a finalizer so it will not be "accidentally" deleted?

videlov commented 5 months ago

It is possible to create a v1beta2 rule without JWT and NoAuth (or noAuth set to false). This will result in an APIRule without a status. We should make sure that we have at least JWT configured or noAuth set to true. APIRule:

this cases including the deletion has to be solved now, could you please retest?

videlov commented 5 months ago

Maybe we could secure it with a finalizer so it will not be "accidentally" deleted?

https://github.com/kyma-project/api-gateway/pull/996/commits/806766c449fbc1b5d4f2507cd1b6f4607ab0890f

barchw commented 5 months ago

Maybe we could secure it with a finalizer so it will not be "accidentally" deleted?

806766c

Are we removing the secret as part of module deletion? If so it also needs to be cleared up

videlov commented 5 months ago

Are we removing the secret as part of module deletion? If so it also needs to be cleared up

how is this actually handled ? module deletion is kind of KLM ground, they delete static manifest that we provide ? this certificate secret is kind of bound the the operator lifecycle and not the APIGateway CR.

barchw commented 5 months ago

Are we removing the secret as part of module deletion? If so it also needs to be cleared up

how is this actually handled ? module deletion is kind of KLM ground, they delete static manifest that we provide ? this certificate secret is kind of bound the the operator lifecycle and not the APIGateway CR.

Yep, KLM will not delete it if it is not part of the manifest

barchw commented 5 months ago

Are we removing the secret as part of module deletion? If so it also needs to be cleared up

how is this actually handled ? module deletion is kind of KLM ground, they delete static manifest that we provide ? this certificate secret is kind of bound the the operator lifecycle and not the APIGateway CR.

Yep, KLM will not delete it if it is not part of the manifest

Using OwnerReference to api-gateway deployment to clean it up should work, than the secret would be deleted with the controller

videlov commented 5 months ago

Using OwnerReference to api-gateway deployment to clean it up should work, than the secret would be deleted with the controller

using OwnerReference will solve deleting it on operator removal, but having the Finalizer still blocks it (can not unblock it because operator controller that reconciles the Secret will be gone at this point). Decided to end up with OwnerReference to clean it on module removal. Accidental removal has to be handled via incident and solved with operator restart.

barchw commented 5 months ago

can not unblock it because operator controller that reconciles the Secret will be gone at this point

KLM will wait for operator to gracefully delete module resource, so it is possible to remove the finalizer as part of deletion of APIGateway CR. Athough I would leave resolving it for a follow up. @triffer @videlov wdyt?

videlov commented 5 months ago

KLM will wait for operator to gracefully delete module resource, so it is possible to remove the finalizer as part of deletion of APIGateway CR. Athough I would leave resolving it for a follow up. @triffer @videlov wdyt?

I would avoid coupling the Secret lifecycle with APIGateway CR.

triffer commented 5 months ago

Accidental removal has to be handled via incident and solved with operator restart.

@videlov Should we have a follow up to create already a troubleshooting guide for this? It's a scenario that will probably happen in some way in future 😅

videlov commented 5 months ago

@videlov Should we have a follow up to create already a troubleshooting guide for this? It's a scenario that will probably happen in some way in future 😅

did create a follow-up issue for this, but do not see it as a high prio task. lets see.

triffer commented 5 months ago

@videlov @werdes72 I just saw that we are missing release notes. Can you please go through the DoD and check it. I already saw that code coverage is lowered, but checking the coverage in the IDE shows that coverage can be considered fine.

werdes72 commented 5 months ago

@videlov @werdes72 I just saw that we are missing release notes. Can you please go through the DoD and check it. I already saw that code coverage is lowered, but checking the coverage in the IDE shows that coverage can be considered fine.

There is no milestone set for this issue. Added the release notes.