Closed agardnerIT closed 6 months ago
@agardnerIT I'm not too familiar with Argo Rollout, but I'm a but suspicious that the use of the openfeature.dev/enabled
isn't working in that context (we'd typically see it in deployments). Specifically I'm wondering if it even makes it onto the pod (where it needs to be) from there.
I see in the Rollout docs that you can reference a workload (deployment) with workloadRef
instead of using a template directly. Can you try that with a deployment that has the annotation?
I guess the difference between the index creation here and the idex used when backfilling the permissions here might be the problem.
The index creation is happening with spec.template.metadata.annotations.openfeature.dev/openfeature.dev/allowkubernetessync
but instead the permission backilling function is searching for metadata.annotations.openfeature.dev/openfeature.dev/allowkubernetessync
. I would say these need to be aligned. Honestly not sure which one of them is the correct one, will try to search when this was changed. It might also be me, as I made bigger refactoring of the webhooks recently and this probably slipped through.
Ok, I found to problem and it was me introducing it, before the refactoring, the indexer creation was creating an index to metadata.annotations.openfeature.dev/openfeature.dev/allowkubernetessync
, currently it's creating an index for spec.template.metadata.annotations.openfeature.dev/openfeature.dev/allowkubernetessync
.
It makes sense as this annotation is injected to the pod and not to the deployment. Therefore the indexed field is the annotation in the pod and not the annotation in the pod template, which is part of the deployment.
I already have a fix and will submit a PR soon
Sorry about the problems
According to Adam's manifests, I think the enablement of OF should be ok. Also if the OF would not bet enabled, we wouldn't receive the problems with backfilling permissions -> we won't be able to reach this part of code at all.
Maybe Argo was creating a Pod in a completely different namespace at that time (completely unrelated to this Deployment)? Here it would make sense that the podMutator kicks in, checks if this Pod is annotated, finds out it's not, writes out the message and lets the pod to be bind to a node without any mutation? I think here we need to have more info about what was happening in the system generally... Just thinking out loud what the possibilities might be, do not have any proof for this theory.
@agardnerIT can you maybe provide us with some more insight? Thank you!
Hi I'm also facing the same error
I start playing with the killercoda scenario for the openfeature operator for a POC, but after following the instructions I noticed that the changes made to the CRDs had not effects in the demo app, so I further investigate why and I found that:
Note: I previously did the changes mentioned here to migrate the CRs for versions of the OpenFeature Operator above 0.5.0 because the example files are outdated
I also tried installing everything in my k8s cluster following the instructions provided in your docs
After not seeing the flagd sidecar being injected, I checked the logs of the pod running the operator and again I found the same error
2023-12-23T20:20:26Z ERROR setup podMutator backfill permissions error {"error": "Index with name field:metadata.annotations.openfeature.dev/allowkubernetessync does not exist"}
and also I have in the logs:
2023-12-23T20:20:39Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/mutate-v1-pod", "UID": "d18b1152-6286-43df-bc5c-e2659363a1c5", "kind": "/v1, Kind=Pod", "resource": {"group":"","version":"v1","resource":"pods"}}
2023-12-23T20:20:39Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/mutate-v1-pod", "code": 200, "reason": "OpenFeature is disabled", "UID": "d18b1152-6286-43df-bc5c-e2659363a1c5", "allowed": true}
so the issue is not related to Argo, also is worth to mention that I tried using the versions 0.5.1 and 0.5.0 of the OF operator but all they have the same bug, you can confirm this by installing those versions in the killercoda scenario I mentioned above
@edxz7 @agardnerIT I've released 0.5.3, which should (at the very least partially) resolve this issue. Please confirm when you have time.
Hello @toddbaert
I did a quick test in the killercoda scenario for the openfeature operator with the version you just released (0.5.3) and the error message: ERROR setup podMutator backfill permissions error
has gone but openfeature is still disabled, I got the same logs as before:
2023-12-30T06:32:09Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/mutate-v1-pod", "code": 200, "reason": "OpenFeature is disabled", "UID": "fe49eaf0-2906-4f44-aac1-e2bb1203c873", "allowed": true}
and still the OFO doesn't inject the sidecar I don't know if this is the expected behavior (having openfeature disabled by default) and if it's the case, How can I enable it manually?
Btw, I also try to test it in my k8s cluster (which is running in eks) but the installation of the new version ends with an error, this are the logs of the installation:
clusterrolebinding.rbac.authorization.k8s.io/open-feature-operator-flagd-kubernetes-sync configured
clusterrolebinding.rbac.authorization.k8s.io/open-feature-operator-manager-rolebinding unchanged
clusterrolebinding.rbac.authorization.k8s.io/open-feature-operator-proxy-rolebinding unchanged
configmap/open-feature-operator-manager-config unchanged
service/open-feature-operator-controller-manager-metrics-service unchanged
service/open-feature-operator-webhook-service unchanged
deployment.apps/open-feature-operator-controller-manager unchanged
certificate.cert-manager.io/open-feature-operator-serving-cert unchanged
issuer.cert-manager.io/open-feature-operator-selfsigned-issuer unchanged
mutatingwebhookconfiguration.admissionregistration.k8s.io/open-feature-operator-mutating-webhook-configuration configured
error: no matching resources found
I don't know why that error happens, but the pods runs without issues and they throw the same logs as in the killercoda scenario, that is, the error message ERROR setup podMutator backfill permissions error
has gone but the message OpenFeature is disabled
is also present.
I was just looking at the killercoda demo and there may be other issues, but the main one is that demo hasn't been updated since our change to v1beta1
(migration path outlined here). It's using older CR definitions that are no longer supported in beta. I have created an issue to update the killercoda example, and to lock it to a specific OFO version to prevent this from happening again.
@edxz7 sorry about that!
I believe outside of killercoda, things are be OK. I just ran through https://openfeature.dev/docs/tutorials/ofo/ and it works well. If you can run kind
on your machine, I recommend going through that - the content is similar and up-to-date.
Hi gents, I think I've now updated the killercoda OpenFeature Operator demo for these fixes.
Please try: https://killercoda.com/agardnerit/scenario/testing-ground and LMK here. If all is well, I'll open a PR.
One thing I noticed is a typo in the docs tutorial. When the the welcome message flag is changed, the second text is: Fib3r: Math at the speed of the internet!
not Welcome to Fib3r: Fibonacci as a Service!
so the docs need to be updated. (https://github.com/open-feature/openfeature.dev/issues/321 created to track this fix)
Change Summary:
PORT
with 30002
and remove the beginning https://
(otherwise we end up with duplicates). Then set FLAGD_HOST_WEB
to the correct value + FLAGD_PORT_WEB
to 443
(if ommited, flagd will append :8013
which breaks things. Finally FLAGD_TLS: true
to re-add https://
otherwise we end up with http://https://KILLERCODA-URL...
Thanks @agardnerIT, I've updated the tutorial to include the correct banner text.
The tutorial works great but I there were a few minor issues I noticed.
Thanks!
Sorry it's taken so long. But I think the above changes have been fixed. You should see DEBUG_VERSION=6
at the top of the terminal output.
If you give me the thumbs up, I'll get the PR done.
@beeme1mr just chasing this up. Can we merge the tutorial into the main repo?
Yes, please open a PR when you have a moment.
@agardnerIT please update the Killercoda tutorials when you have a moment. I'm going to close this ticket because the underlying issue has been addressed.
In the same investigation as open-feature/open-feature-operator#575 and open-feature/flagd#1063
Also noticed:
OFO v0.5.2 deployed w/ Argo (Helm) + flagd proxy.
Application and Flag Config
kubectl version