Closed werdes72 closed 5 months ago
The root cause for the failing UI tests is that the rules fields are not rendered. The UI renders the fields based on the Kubernetes OpenAPI response from /openapi/v2
. The OpenAPI response is influenced by setting x-kubernetes-preserve-unknown-fields: true
in the CRD. We can find more information about that in the Kubernetes CRD docs:
A structural schema is an OpenAPI v3.0 validation schema which:
specifies a non-empty type (via type in OpenAPI) for the root, for each specified field of an object node (via properties or >additionalProperties in OpenAPI) and for each item in an array node (via items in OpenAPI), with the exception of: a node with x-kubernetes-int-or-string: true a node with x-kubernetes-preserve-unknown-fields: true
For the old CRD the API response for rules looks like this:
"rules": {
"description": "Represents the array of Oathkeeper access rules to be applied.",
"type": "array",
"minItems": 1,
"items": {
"description": "Rule .",
"type": "object",
"required": [
"accessStrategies",
"methods",
"path"
],
"properties": {
"accessStrategies": {
"description": "Specifies the list of access strategies. All strategies listed in [Oathkeeper documentation](https://www.ory.sh/docs/oathkeeper/pipeline/authn) are supported.",
"type": "array",
"minItems": 1,
"items": {
"description": "Represents a handler that authenticates provided credentials. See the corresponding type in the oathkeeper-maester project.",
"type": "object",
"required": [
"handler"
],
"properties": {
"config": {
"description": "Configures the handler. Configuration keys vary per handler.",
"x-kubernetes-preserve-unknown-fields": true
},
"handler": {
"description": "Specifies the name of the handler.",
"type": "string"
}
}
}
},
"methods": {
"description": "Represents the list of allowed HTTP request methods available for the **spec.rules.path**.",
"type": "array",
"minItems": 1,
"items": {
"description": "HttpMethod specifies the HTTP request method. The list of supported methods is defined in RFC 9910: HTTP Semantics and RFC 5789: PATCH Method for HTTP.",
"type": "string",
"enum": [
"GET",
"HEAD",
"POST",
"PUT",
"DELETE",
"CONNECT",
"OPTIONS",
"TRACE",
"PATCH"
]
}
},
"mutators": {
"description": "Specifies the list of [Ory Oathkeeper](https://www.ory.sh/docs/oathkeeper/pipeline/mutator) mutators.",
"type": "array",
"items": {
"description": "Mutator represents a handler that transforms the HTTP request before forwarding it. See the corresponding in the oathkeeper-maester project.",
"type": "object",
"required": [
"handler"
],
"properties": {
"config": {
"description": "Configures the handler. Configuration keys vary per handler.",
"x-kubernetes-preserve-unknown-fields": true
},
"handler": {
"description": "Specifies the name of the handler.",
"type": "string"
}
}
}
},
"path": {
"description": "Specifies the path of the exposed service.",
"type": "string",
"pattern": "^([0-9a-zA-Z./*()?!\\\\_-]+)"
},
"service": {
"description": "Describes the service to expose. Overwrites the **spec** level service if defined.",
"type": "object",
"required": [
"name",
"port"
],
"properties": {
"external": {
"description": "Specifies if the service is internal (in cluster) or external.",
"type": "boolean"
},
"name": {
"description": "Specifies the name of the exposed service.",
"type": "string"
},
"namespace": {
"description": "Specifies the Namespace of the exposed service. If not defined, it defaults to the APIRule Namespace.",
"type": "string",
"pattern": "^[a-z0-9]([-a-z0-9]*[a-z0-9])?$"
},
"port": {
"description": "Specifies the communication port of the exposed service.",
"type": "integer",
"format": "int32",
"maximum": 65535,
"minimum": 1
}
}
},
"timeout": {
"description": "Timeout for HTTP requests in seconds. The timeout can be configured up to 3900 seconds (65 minutes).",
"type": "integer",
"maximum": 3900,
"minimum": 1
}
}
}
},
For the new CRD with x-kubernetes-preserve-unknown-fields: true
the response looks like this:
"rules": {
"description": "Represents the array of Oathkeeper access rules to be applied.",
"type": "array",
"minItems": 1,
"items": {
"description": "Rule .",
"required": [
"accessStrategies",
"methods",
"path"
],
"x-kubernetes-preserve-unknown-fields": true
}
},
As far as I know Busola uses /openapi/v2
endpoint to load the schema and parse it.
You can reproduce this by checking the response of the request to this endpoint on initial rendering of Busola or directly query the Kubernetes API server using
kubectl proxy --port=8080 &
and curl http://localhost:8080/openapi/v2
We should also create an ADR for the decision to handle the certificate rotation in the controller instead of using a separate job.
The root cause for the failing UI tests is that the rules fields are not rendered. The UI renders the fields based on the Kubernetes OpenAPI response from
/openapi/v2
. The OpenAPI response is influenced by settingx-kubernetes-preserve-unknown-fields: true
in the CRD. ...`
we decided we will remove the preserveUnknownFields
and for extAuths
we will introduce it as a new handler in v1beta1 accessStrategies
so we have it fully convertable v1beta1 <-> v1beta2
As far as I can see there is no integration tests for the init container behaviour. Was this a decision or did we forget to evaluate this?
As far as I can see there is no integration tests for the init container behaviour. Was this a decision or did we forget to evaluate this?
there is real integration test that is validating that the conversion works, so it is checked. But will think of adding some envtest to check the outcome of the init-container work.
Maybe the following error that deletion is blocked is related to the issue reported in this comment. Applying the following APIRule with invalid jwt config will set the APIRule correctly in Error:
kubectl apply -f - <<EOF
apiVersion: gateway.kyma-project.io/v1beta2
kind: APIRule
metadata:
name: httpbin-rule
namespace: default
labels:
app: httpbin
spec:
gateway: kyma-system/kyma-gateway
hosts:
- httpbin
service:
name: httpbin
port: 8000
rules:
- path: /.*
methods: [ "GET"]
mutators: [ ]
jwt: {}
EOF
Updating the APIRule by removing the jwt field and applying the APIRule again will keep the previous error:
kubectl apply -f - <<EOF
apiVersion: gateway.kyma-project.io/v1beta2
kind: APIRule
metadata:
name: httpbin-rule
namespace: default
labels:
app: httpbin
spec:
gateway: kyma-system/kyma-gateway
hosts:
- httpbin
service:
name: httpbin
port: 8000
rules:
- path: /.*
methods: [ "GET"]
mutators: [ ]
EOF
Then deleting the APIRule will not work and deletion is blocked and there are errors in the controller logs:
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:37Z INFO apirule-controller Starting reconciliation {"namespacedName": "default/httpbin-rule"}
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:37Z INFO apirule-controller Starting ApiRule reconciliation{"jwtHandler": "istio"}
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:37Z INFO apirule-controller ApiRule is converted from v1beta2
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:37Z INFO apirule-controller Reconciling ApiRule {"name": "httpbin-rule", "namespace": "default", "resource version": "373798"}
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:37Z ERROR apirule-controller Error happened during finalizer removal {"error": "APIRule.gateway.kyma-project.io \"httpbin-rule\" is invalid: spec.rules[0].accessStrategies: Required value"}
api-gateway-controller-manager-686799c6ff-mt775 manager github.com/kyma-project/api-gateway/controllers/gateway.(*APIRuleReconciler).Reconcile
api-gateway-controller-manager-686799c6ff-mt775 manager /workspace/controllers/gateway/apirule_controller.go:201
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:119
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:316
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:266
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:227
api-gateway-controller-manager-686799c6ff-mt775 manager 2024/05/14 08:53:37 http: TLS handshake error from 10.42.0.0:47392: EOF
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:38Z INFO apirule-controller Starting reconciliation {"namespacedName": "default/httpbin-rule"}
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:38Z INFO apirule-controller Starting ApiRule reconciliation{"jwtHandler": "istio"}
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:38Z INFO apirule-controller ApiRule is converted from v1beta2
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:38Z INFO apirule-controller Reconciling ApiRule {"name": "httpbin-rule", "namespace": "default", "resource version": "373798"}
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T08:53:38Z ERROR apirule-controller Error happened during finalizer removal {"error": "APIRule.gateway.kyma-project.io \"httpbin-rule\" is invalid: spec.rules[0].accessStrategies: Required value"}
api-gateway-controller-manager-686799c6ff-mt775 manager github.com/kyma-project/api-gateway/controllers/gateway.(*APIRuleReconciler).Reconcile
api-gateway-controller-manager-686799c6ff-mt775 manager /workspace/controllers/gateway/apirule_controller.go:201
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:119
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:316
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:266
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:227
@werdes72 for the mentioned example cases when we do not have either validation or case not support, please add / adapt unittests so we have this covered.
Accidentally (😏 ) deleting the secret api-gateway-webhook-certificate
will cause a continuous failed reconciliation being logged by the manager. Shouldn't we implement some self-healing capabilities, because only a manual restart of the manager will fix this.
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T09:23:32Z INFO certificate-controller Received reconciliation request{"namespace": "kyma-system", "name": "api-gateway-webhook-certificate"}
api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T09:23:32Z ERROR Reconciler error {"controller": "secret", "controllerGroup": "", "controllerKind": "Secret", "Secret": {"name":"api-gateway-webhook-certificate","namespace":"kyma-system"}, "namespace": "kyma-system", "name": "api-gateway-webhook-certificate", "reconcileID": "5b40d3d6-4e80-4df6-b55d-aec82729f2af", "error": "Secret \"api-gateway-webhook-certificate\" not found"}
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:329
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:266
api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:227
Accidentally (😏 ) deleting the secret
api-gateway-webhook-certificate
will cause a continuous failed reconciliation being logged by the manager. Shouldn't we implement some self-healing capabilities, because only a manual restart of the manager will fix this.api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T09:23:32Z INFO certificate-controller Received reconciliation request{"namespace": "kyma-system", "name": "api-gateway-webhook-certificate"} api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T09:23:32Z ERROR Reconciler error {"controller": "secret", "controllerGroup": "", "controllerKind": "Secret", "Secret": {"name":"api-gateway-webhook-certificate","namespace":"kyma-system"}, "namespace": "kyma-system", "name": "api-gateway-webhook-certificate", "reconcileID": "5b40d3d6-4e80-4df6-b55d-aec82729f2af", "error": "Secret \"api-gateway-webhook-certificate\" not found"} api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:329 api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:266 api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:227
I not sure about this one, I was thinking about it and I am more of a fan to deal with the problem with the workaround (restarting the operator). Deleting Secrets is kind of harsh and this is kyma-system resource, so has to be handled with an incident. Not a fan in the self-healing in this case.
Accidentally (😏 ) deleting the secret
api-gateway-webhook-certificate
will cause a continuous failed reconciliation being logged by the manager. Shouldn't we implement some self-healing capabilities, because only a manual restart of the manager will fix this.api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T09:23:32Z INFO certificate-controller Received reconciliation request{"namespace": "kyma-system", "name": "api-gateway-webhook-certificate"} api-gateway-controller-manager-686799c6ff-mt775 manager 2024-05-14T09:23:32Z ERROR Reconciler error {"controller": "secret", "controllerGroup": "", "controllerKind": "Secret", "Secret": {"name":"api-gateway-webhook-certificate","namespace":"kyma-system"}, "namespace": "kyma-system", "name": "api-gateway-webhook-certificate", "reconcileID": "5b40d3d6-4e80-4df6-b55d-aec82729f2af", "error": "Secret \"api-gateway-webhook-certificate\" not found"} api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:329 api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:266 api-gateway-controller-manager-686799c6ff-mt775 manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 api-gateway-controller-manager-686799c6ff-mt775 manager /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.3/pkg/internal/controller/controller.go:227
I not sure about this one, I was thinking about it and I am more of a fan to deal with the problem with the workaround (restarting the operator). Deleting Secrets is kind of harsh and this is kyma-system resource, so has to be handled with an incident. Not a fan in the self-healing in this case.
Maybe we could secure it with a finalizer so it will not be "accidentally" deleted?
It is possible to create a v1beta2 rule without JWT and NoAuth (or noAuth set to
false
). This will result in an APIRule without a status. We should make sure that we have at least JWT configured or noAuth set to true. APIRule:
this cases including the deletion has to be solved now, could you please retest?
Maybe we could secure it with a finalizer so it will not be "accidentally" deleted?
Maybe we could secure it with a finalizer so it will not be "accidentally" deleted?
Are we removing the secret as part of module deletion? If so it also needs to be cleared up
Are we removing the secret as part of module deletion? If so it also needs to be cleared up
how is this actually handled ? module deletion is kind of KLM ground, they delete static manifest that we provide ? this certificate secret is kind of bound the the operator lifecycle and not the APIGateway CR.
Are we removing the secret as part of module deletion? If so it also needs to be cleared up
how is this actually handled ? module deletion is kind of KLM ground, they delete static manifest that we provide ? this certificate secret is kind of bound the the operator lifecycle and not the APIGateway CR.
Yep, KLM will not delete it if it is not part of the manifest
Are we removing the secret as part of module deletion? If so it also needs to be cleared up
how is this actually handled ? module deletion is kind of KLM ground, they delete static manifest that we provide ? this certificate secret is kind of bound the the operator lifecycle and not the APIGateway CR.
Yep, KLM will not delete it if it is not part of the manifest
Using OwnerReference
to api-gateway deployment to clean it up should work, than the secret would be deleted with the controller
Using
OwnerReference
to api-gateway deployment to clean it up should work, than the secret would be deleted with the controller
using OwnerReference
will solve deleting it on operator removal, but having the Finalizer
still blocks it (can not unblock it because operator controller that reconciles the Secret will be gone at this point). Decided to end up with OwnerReference
to clean it on module removal. Accidental removal has to be handled via incident and solved with operator restart.
can not unblock it because operator controller that reconciles the Secret will be gone at this point
KLM will wait for operator to gracefully delete module resource, so it is possible to remove the finalizer as part of deletion of APIGateway CR. Athough I would leave resolving it for a follow up. @triffer @videlov wdyt?
KLM will wait for operator to gracefully delete module resource, so it is possible to remove the finalizer as part of deletion of APIGateway CR. Athough I would leave resolving it for a follow up. @triffer @videlov wdyt?
I would avoid coupling the Secret lifecycle with APIGateway CR.
Accidental removal has to be handled via incident and solved with operator restart.
@videlov Should we have a follow up to create already a troubleshooting guide for this? It's a scenario that will probably happen in some way in future 😅
@videlov Should we have a follow up to create already a troubleshooting guide for this? It's a scenario that will probably happen in some way in future 😅
did create a follow-up issue for this, but do not see it as a high prio task. lets see.
@videlov @werdes72 I just saw that we are missing release notes. Can you please go through the DoD and check it. I already saw that code coverage is lowered, but checking the coverage in the IDE shows that coverage can be considered fine.
@videlov @werdes72 I just saw that we are missing release notes. Can you please go through the DoD and check it. I already saw that code coverage is lowered, but checking the coverage in the IDE shows that coverage can be considered fine.
There is no milestone set for this issue.
Added the release notes.
Description
Changes proposed in this pull request:
Related issues
986