Closed astelmashenko closed 1 year ago
It looks like similar to https://github.com/tektoncd/triggers/issues/875 Few things I'm trying to understand:
caBundle
is the same as secret nats-webhook-certs ca-cert.pem
, so it is not the case then?Another thing is from webhook.Options
// SecretName is the name of k8s secret that contains the webhook
// server key/cert and corresponding CA cert that signed them. The
// server key/cert are used to serve the webhook and the CA cert
// is provided to k8s apiserver during admission controller
// registration.
// If no SecretName is provided, then the webhook serves without TLS.
SecretName string
and webhook has secretname hardcoded
const (
// Component is the name of this component and is used in logging and leader-election
Component = "nats-webhook"
// SecretName must match the name of the Secret created in the configuration.
SecretName = "nats-webhook-certs"
)
We have mTLS and using istio, then we do not need webhook tls at all and we can not change that. Do we need to make it optional, like NameFromEnv()
maybe have SecretNameFromEnv()
?
Ok, we found a way how to reproduce it:
{"level":"error","ts":"2022-12-27T13:24:11.453Z","logger":"nats-webhook.DefaultingWebhook","caller":"controller/controller.go:559","msg":"Reconcile error","knative.dev/traceid":"d2b98046-a51c-4912-aa27-c3ff928d9501","knative.dev/key":"defaulting.webhook.nats.messaging.knative.dev","duration":0.000133991,"error":"error retrieving webhook: mutatingwebhookconfiguration.admissionregistration.k8s.io \"defaulting.webhook.nats.messaging.knative.dev\" not found","stacktrace":"knative.dev/pkg/controller.(*Impl).handleErr\n\tknative.dev/pkg@v0.0.0-20220301181942-2fdd5f232e77/controller/controller.go:559\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tknative.dev/pkg@v0.0.0-20220301181942-2fdd5f232e77/controller/controller.go:536\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\tknative.dev/pkg@v0.0.0-20220301181942-2fdd5f232e77/controller/controller.go:484"}
2022/12/27 13:36:29 http: TLS handshake error from 127.0.0.6:49317: remote error: tls: bad certificate
After above steps we can not fix badcertificate problem. Tried to delete pods, deployments, restarting webhook and controller, nothing helps.
@zhaojizhuang , @lionelvillard , do you any ideas how to fix that? Is there any caching of certificates somewhere?
Hi @astelmashenko encountered this error, just delete and recreate webhook validation and mutation resolve the problem
This will force to recreate cert webhook
new investigations, according to logs:
{"level":"info","ts":"2022-12-28T16:32:44.280Z","logger":"nats-webhook","caller":"webhook/admission.go:90","msg":"Webhook ServeHTTP request=&http.Request{Method:\"POST\", URL:(*url.URL)(0xc000949830), Proto:\"HTTP/1.1\", ProtoMajor:1, ProtoMinor:1, Header:http.Header{\"Accept\":[]string{\"application/json, */*\"}, \"Accept-Encoding\":[]string{\"gzip\"}, \"Content-Length\":[]string{\"37445\"}, \"Content-Type\":[]string{\"application/json\"}, \"User-Agent\":[]string{\"kube-apiserver-admission\"}}, Body:(*http.body)(0xc0009ba980), GetBody:(func() (io.ReadCloser, error))(nil), ContentLength:37445, TransferEncoding:[]string(nil), Close:false, Host:\"nats-webhook.knative-eventing.svc:443\", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:\"127.0.0.6:52401\", RequestURI:\"/defaulting?timeout=2s\", TLS:(*tls.ConnectionState)(0xc00083fc30), Cancel:(<-chan struct {})(nil), Response:(*http.Response)(nil), ctx:(*context.cancelCtx)(0xc0009ba9c0)}"}
{"level":"info","ts":"2022-12-28T16:32:44.297Z","logger":"nats-webhook","caller":"defaulting/defaulting.go:158","msg":"Kind: \"messaging.knative.dev/v1alpha1, Kind=NatsJetStreamChannel\" PatchBytes: null","knative.dev/kind":"messaging.knative.dev/v1alpha1, Kind=NatsJetStreamChannel","knative.dev/namespace":"viax","knative.dev/name":"internal-kne-trigger","knative.dev/operation":"UPDATE","knative.dev/resource":"messaging.knative.dev/v1alpha1, Resource=natsjetstreamchannels","knative.dev/subresource":"","knative.dev/userinfo":"{system:serviceaccount:knative-eventing:jetstream-ch-controller eabc9d9c-bfc6-4410-932f-3e37b5aa6b15 [system:serviceaccounts system:serviceaccounts:knative-eventing system:authenticated] map[authentication.kubernetes.io/pod-name:[jetstream-ch-controller-57c65d84fb-9h2c7] authentication.kubernetes.io/pod-uid:[d0e2ebd1-aa22-4700-823f-cea550500b29]]}"}
{"level":"info","ts":"2022-12-28T16:32:44.297Z","logger":"nats-webhook","caller":"webhook/admission.go:133","msg":"remote admission controller audit annotations=map[string]string(nil)","knative.dev/kind":"messaging.knative.dev/v1alpha1, Kind=NatsJetStreamChannel","knative.dev/namespace":"viax","knative.dev/name":"internal-kne-trigger","knative.dev/operation":"UPDATE","knative.dev/resource":"messaging.knative.dev/v1alpha1, Resource=natsjetstreamchannels","knative.dev/subresource":"","knative.dev/userinfo":"{system:serviceaccount:knative-eventing:jetstream-ch-controller eabc9d9c-bfc6-4410-932f-3e37b5aa6b15 [system:serviceaccounts system:serviceaccounts:knative-eventing system:authenticated] map[authentication.kubernetes.io/pod-name:[jetstream-ch-controller-57c65d84fb-9h2c7] authentication.kubernetes.io/pod-uid:[d0e2ebd1-aa22-4700-823f-cea550500b29]]}","admissionreview/uid":"61b4f1df-0e47-484c-98ce-c064e6cd9e68","admissionreview/allowed":true,"admissionreview/result":"nil"}
2022/12/28 16:32:44 http: TLS handshake error from 127.0.0.6:45837: remote error: tls: bad certificate
webhook receives request and admission.go admissionHandler is called. It mean that error happens it tries to write response back?
One more thing is I'm able to reproduce it on working cluster only, which is 1.21 version. It does not reproduce on 1.23 local minikube setup.
oh, god, I found the issue. There was MutatingWebhookConfiguration left from previous installation of eventing-natss.yaml, it's name was webhook.nats.messaging.knative.dev
and then it got renamed to defaulting.webhook.nats.messaging.knative.dev
.
Describe the bug After deleting a broker, jetstream controller can not do finalization of underlying channel because of comminication error with nats-webhook.
Expected behavior Broker/channel delete is working.
Knative release version 1.3.2
Additional context eventing-natss version is 1.3.5
I create a broker and then deleted it, then observed that channel has not been deleted. And observer error logs. jetstream-channel-controller:
and nats-webhook logs:
is it really certificate problem? one strange thing is this message:
AdmissionReview patch={ type: JSONPatch, body: null }
from the last debug log before error logremote error: tls: bad certificate
Any thoughts?
cc @dan-j @lionelvillard @zhaojizhuang