Open ianb-mp opened 3 months ago
@ianb-mp you have to wait for all the pods under kubernetes-nmstate to be ready state, before apply an NNCP.
@ianb-mp you have to wait for all the pods under kubernetes-nmstate to be ready state, before apply an NNCP.
I have waited for all pods to be in ready state, and the error still occurs:
$ kubectl wait --for=condition=ready pod -n nmstate --all
pod/nmstate-cert-manager-5788576df8-rkknl condition met
pod/nmstate-handler-kwlwx condition met
pod/nmstate-handler-qfxg7 condition met
pod/nmstate-metrics-6889dd975d-58br9 condition met
pod/nmstate-operator-685cc75cd8-xwcc2 condition met
pod/nmstate-webhook-65447bb9f-5fkwz condition met
$ kubectl create -f nmstate.yaml
Error from server (InternalError): error when creating "nmstate.yaml": Internal error occurred: failed calling webhook "nodenetworkconfigurationpolicies-mutate.nmstate.io": failed to call webhook: Post "https://nmstate-webhook.nmstate.svc:443/nodenetworkconfigurationpolicies-mutate?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "nmstate")
I wait ~60 seconds and try again, it works:
$ kubectl create -f nmstate.yaml
nodenetworkconfigurationpolicy.nmstate.io/sriovpf-bne-lab-srv-6 created
I've also seen this issue as described able and in a slightly different form of the handler pods not able to resolve the webhook service unless I git the NMState a bit of time to stand up. Not tested yet but if/when using ArgoCD to deploy I don't think we'll have a way to control how soon the nntp resources are applied after the install of operator resources and the NMState.
the nmstate handler pods are certificate aware at readiness probe, we have to wait for them to be ready before apply an NNCP.
the nmstate handler pods are certificate aware at readiness probe, we have to wait for them to be ready before apply an NNCP.
The readiness probe should not report healthy until the certificate is ready. Then the user will know they can proceed. Am I misunderstanding?
the nmstate handler pods are certificate aware at readiness probe, we have to wait for them to be ready before apply an NNCP.
The readiness probe should not report healthy until the certificate is ready. Then the user will know they can proceed. Am I misunderstanding?
yep, that's it. ideally we implement this check at the operator and we have some Status at the NMState CR but we are not there yet.
What happened:
When applying a
NodeNetworkConfigurationPolicy
immediately after installation I sometimes see this errorcertificate signed by unknown authority
:If I wait a few minutes and try again, it works without error. I see this in the
nmstate-webhook
pod log:So it looks like the webhook is still deploying. I tried adding a
wait
check before applying the policy e.g.However this isn't reliable - the error still occurs sometimes. It would be good to have a way to test whether nmstate operator is fully ready before trying to apply policies.
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
NodeNetworkState
on affected nodes (usekubectl get nodenetworkstate <node_name> -o yaml
):NodeNetworkConfigurationPolicy
:kubectl get pods --all-namespaces -l app=kubernetes-nmstate -o jsonpath='{.items[0].spec.containers[0].image}'
):nmcli --version
): 1.46.0-8.el9_4kubectl version
): v1.30.2+k0s