Open marandalucas opened 5 months ago
Hello @marandalucas 👋
I just had a look at it and I've tried re-creating the issue and it seems to be working fine on my side.
Just so I understand:
clusterDomain
metrics-server
pods throws the error you shared ?I'm wondering: what version of the chart are you using and are you using the certificates created by the chart ? (certificates.certManager.enabled: true
)
Also, could you share your certificate keda-operator-tls-certificates
content ?
Hello @lucchmielowski 👍
If you want to recreate the issue you have to:
ERROR:
W0314 15:03:14.706154 1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "keda-operator.keda.svc.gcp-prod-pv-na1-a.company.cluster.local:9666", ServerName: "keda-operator.keda.svc.gcp-prod-pv-na1-a.company.cluster.local:9666", }.
We'd like to avoid the cert-manager tool installation because of the following reasons:
Error: [resource mapping not found for name: "keda-operator-tls-certificates" namespace: "keda" from "": no matches for kind "Certificate" in version "cert-manager.io/v1" │ ensure CRDs are installed first, resource mapping not found for name: "keda-operator-ca" namespace: "keda" from "": no matches for kind "Certificate" in version "cert-manager.io/v1"
Is there another way to fix this through parametrizing metrics-service-address or something like that?
Thank you so much for this project
Hi @marandalucas, sorry but I won't really have the time to test in GKE in the next few days, but both issues you shared looks to be linked to a miss-match between the cluster-domain of your cluster and your configuration and not an issue with the chart itself (I might have misunderstood something though)
What makes me think of that is this part of the log you shared earlier :
certificate is valid for keda-operator, keda-operator, keda-operator.keda, keda-operator.keda.svc, keda-operator.keda.svc.cluster.local ... not keda-operator.keda.svc.gcp-prod-pv-na1-a.company.cluster.local
as well as the
addrConn.createTransport failed to connect to {Addr: "keda-operator.keda.svc.gcp-prod-pv-na1-a.company.cluster.local:9666...
That does not seem related to a cert issue but more of an addressing issue
Could it be possible that your GKE cluster is using the default svc.cluster.local
FQDN ? (in which case you wouldn't need to setup a clusterDomain
).
One way to check the correct value to use is running the following command that creates a pod and does an nslookup
:
kubectl run -it --image=ubuntu --restart=Never shell -- \
sh -c 'apt-get update > /dev/null && apt-get install -y dnsutils > /dev/null && \
nslookup kubernetes.default | grep Name | sed "s/Name:\skubernetes.default//"'`
Also I understand that you don't want to setup certificate-manager
, by default the chart enables the operator to create a kedaorg-certs
secret that is being created for TLS communication between keda's components.
Also, feel free to message me on the Kubernetes slack directly if you find it easier to have a "live" discussion about the issue.
Hello @marandalucas ,
You don't need cert manager, but you need to update the internal cert system too. (you can use cert-manager or the self-generated certs).
You have to add an extra arg in the operator k8s-cluster-domain: your-domain
. This will take your domain into account for certificate generation.
extraArgs:
# -- Additional KEDA Operator container arguments
keda:
k8s-cluster-domain: your-domain
clusterDomain: your-domain
I guess that we could automatically set the arg with clusterDomain value? 🤔 @lucchmielowski WDYT?
in any case, setting both you will be able to use KEDA without cert-manager.
@lucchmielowski Hi! Thank you so much for this fix. https://github.com/kedacore/charts/pull/399
Unfortunately, It doesn't work for us.
HELM CONFIG
clusterDomain: gcp-prod-pv-na1-a.company.cluster.local
ERROR:
W0314 15:03:14.706154 1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "keda-operator.keda.svc.gcp-prod-pv-na1-a.company.cluster.local:9666", ServerName: "keda-operator.keda.svc.gcp-prod-pv-na1-a.company.cluster.local:9666", }. Err: connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate is valid for keda-operator, keda-operator, keda-operator.keda, keda-operator.keda.svc, keda-operator.keda.svc.cluster.local, keda-admission-webhooks, keda-admission-webhooks.keda, keda-admission-webhooks.keda.svc, keda-admission-webhooks.keda.svc.cluster.local, keda-operator-metrics-apiserver, keda-operator-metrics-apiserver.keda, keda-operator-metrics-apiserver.keda.svc, keda-operator-metrics-apiserver.keda.svc.cluster.local, not keda-operator.keda.svc.gcp-prod-pv-na1-a.company.cluster.local"
We wonder if you could fix it. We don't need cert-manager in our clusters.
Thanks in advance