Open jonathon2nd opened 6 months ago
So, before, I had to add the following to cluster.yaml for rke
kube-controller:
extra_args:
cluster-signing-cert-file: "/etc/kubernetes/ssl/kube-ca.pem"
cluster-signing-key-file: "/etc/kubernetes/ssl/kube-ca-key.pem"
Now I am on RKE2, and things are different. I did not find any reference in doc, and only one in repo is this issue. Which of these certs do you think would work best? The cert setup seems completely different.
[root@ovbh-vtest-k8s03-master02 user]# ls /var/lib/rancher/rke2/server/tls
client-admin.crt client-ca.crt client-controller.key client-kube-proxy.crt client-rke2-controller.crt client-supervisor.crt kube-controller-manager server-ca.crt service.key temporary-certs
client-admin.key client-ca.key client-kube-apiserver.crt client-kube-proxy.key client-rke2-controller.key client-supervisor.key kube-scheduler server-ca.key serving-kube-apiserver.crt
client-auth-proxy.crt client-ca.nochain.crt client-kube-apiserver.key client-rke2-cloud-controller.crt client-scheduler.crt dynamic-cert.json request-header-ca.crt server-ca.nochain.crt serving-kube-apiserver.key
client-auth-proxy.key client-controller.crt client-kubelet.key client-rke2-cloud-controller.key client-scheduler.key etcd request-header-ca.key service.current.key serving-kubelet.key
eh, adding the following
kube-controller:
extra_args:
cluster-signing-cert-file: "/var/lib/rancher/rke2/server/tls/server-ca.crt"
cluster-signing-key-file: "/var/lib/rancher/rke2/server/tls/server-ca.key"
results in this error
E0401 22:51:00.828155 1 run.go:74] "command failed" err="cannot specify --cluster-signing-{cert,key}-file and other --cluster-signing-*-file flags at the same time"
kube-controller-manager logs without modification.
I0401 22:53:15.812803 1 controllermanager.go:187] "Starting" version="v1.27.11+rke2r1"
2024-04-01T15:53:15.813124034-07:00 I0401 22:53:15.812929 1 controllermanager.go:189] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
2024-04-01T15:53:15.817730506-07:00 I0401 22:53:15.817595 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
2024-04-01T15:53:15.817746036-07:00 I0401 22:53:15.817673 1 shared_informer.go:311] Waiting for caches to sync for RequestHeaderAuthRequestController
2024-04-01T15:53:15.817777777-07:00 I0401 22:53:15.817667 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0401 22:53:15.817719 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0401 22:53:15.817754 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0401 22:53:15.817758 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
2024-04-01T15:53:15.818800235-07:00 I0401 22:53:15.818752 1 secure_serving.go:213] Serving securely on 127.0.0.1:10257
2024-04-01T15:53:15.819215433-07:00 I0401 22:53:15.819116 1 leaderelection.go:245] attempting to acquire leader lease kube-system/kube-controller-manager...
2024-04-01T15:53:15.819508638-07:00 I0401 22:53:15.819458 1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt::/var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.key"
I0401 22:53:15.819738 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
2024-04-01T15:53:15.918025203-07:00 I0401 22:53:15.917877 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
2024-04-01T15:53:15.918245270-07:00 I0401 22:53:15.917906 1 shared_informer.go:318] Caches are synced for RequestHeaderAuthRequestController
2024-04-01T15:53:15.921282637-07:00 I0401 22:53:15.920067 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I am not able to peer using an OVH k8s cluster, Kubernetes Version: v1.28.3, unmodified, either.
jonathon@jonathon-framework:~/liqo-test-ovh$ liqoctl version
Client version: v0.10.2
Server version: v0.10.2
jonathon@jonathon-framework:~/liqo-test-ovh$ liqoctl status
┌─ Namespace existence check ──────────────────────────────────────────────────────┐
| INFO ✔ liqo control plane namespace liqo exists |
└──────────────────────────────────────────────────────────────────────────────────┘
┌─ Control plane check ────────────────────────────────────────────────────────────┐
| Deployment |
| liqo-controller-manager: Desired: 2, Ready: 2/2, Available: 2/2 |
| liqo-crd-replicator: Desired: 1, Ready: 1/1, Available: 1/1 |
| liqo-metric-agent: Desired: 1, Ready: 1/1, Available: 1/1 |
| liqo-auth: Desired: 1, Ready: 1/1, Available: 1/1 |
| liqo-proxy: Desired: 1, Ready: 1/1, Available: 1/1 |
| liqo-network-manager: Desired: 1, Ready: 1/1, Available: 1/1 |
| liqo-gateway: Desired: 2, Ready: 2/2, Available: 2/2 |
| DaemonSet |
| liqo-route: Desired: 3, Ready: 3/3, Available: 3/3 |
└──────────────────────────────────────────────────────────────────────────────────┘
┌─ Local cluster information ──────────────────────────────────────────────────────┐
| Cluster identity |
| Cluster ID: 7829a681-9dd5-4840-b6e8-f3a1f1a19d53 |
| Cluster name: frosty-wave |
| Cluster labels |
| liqo.io/provider: k3s |
| Configuration |
| Version: v0.10.2 |
| Network |
| Pod CIDR: 10.2.0.0/16 |
| Service CIDR: 10.3.0.0/16 |
| External CIDR: 10.4.0.0/16 |
| Reserved Subnets |
| • 10.1.0.0/24 |
| • 10.1.1.0/24 |
| • 10.1.3.0/28 |
| • 10.1.8.0/22 |
| • 10.0.1.0/24 |
| • 10.0.2.0/24 |
| Endpoints |
| Network gateway: udp://10.1.10.40:32110 |
| Authentication: https://redacted:31343 |
| Kubernetes API server: https://redacted.c1.bhs5.k8s.ovh.net |
└──────────────────────────────────────────────────────────────────────────────────┘
jonathon@jonathon-framework:~/liqo-test-ovh$ liqoctl install k3s -n liqo-test-ovh --only-output-values --pod-cidr 10.2.0.0/16 --service-cidr 10.3.0.0/16 --enable-ha --verbose --api-server-url=https://redacted.c1.bhs5.k8s.ovh.net --reserved-subnets 10.1.0.0/24,10.1.1.0/24,10.1.3.0/28,10.1.8.0/22,10.0.1.0/24,10.0.2.0/24
INFO Using chart from "liqo/liqo"
INFO Installer initialized
INFO Cluster name: damp-mountain
INFO Kubernetes API Server: https://redacted.c1.bhs5.k8s.ovh.net
INFO Pod CIDR: 10.2.0.0/16
INFO Service CIDR: 10.3.0.0/16
INFO Cluster configuration correctly retrieved
INFO Installation parameters correctly generated
INFO All Set! Chart values written to "./values.yaml"meters for your cluster (0s)
jonathon@jonathon-framework:~/liqo-test-ovh$ liqoctl peer out-of-band nameless-silence --auth-url https://redacted:31223 --cluster-id f441ba85-1b35-42e3-b58c-0da2303b03f8 --auth-token EEE
ERRO Failed peering clusters: Error from server (InternalError): Internal error occurred: failed calling webhook "fc.mutate.liqo.io": failed to call webhook: Post "https://liqo-controller-manager.liqo.svc:9443/mutate/foreign-cluster?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority
It expects that the CA found in the connections parameters inside the pod (the pod kubeconfig for simplicity, you can find it at something like /var/run/secrets/kubernetes.io/serviceaccount) and the CA signing remote user certificates is the same
We can consider adding the possibility for the user to override this value in the new auth module https://github.com/liqotech/liqo/issues/2382
Here's what we do when we peer an RKE2 cluster (from an EKS cluster)
Prerequisites:
apiServer.address
(in helm values), use ACE, e.g. https://40.52.12.84:6443
. Alternatively, if load balancer is properly set up, use that instead.Peering:
liqoctl generate peer-command
on the RKE2 cluster to generate the peering commandliqoctl peer out-of-band
command. Expect this would fail due to missing CA and/or lack of permission (we haven't had chance to verify what exactly was the cause)liqoctl peer
command, a secret (with liqo-identiy-
prefixed name) is created under the tenant namespace (e.g. liqo-tenant-test-cluster-c37ac5
). We need to update its certificate and CAapiServerCa
, apiServerUrl
, certificate
, namespace
and private-key
.
apiServerCa
. Use the CA from /etc/rancher/rke2/rke2.yaml
(find it on a control plane node)certificate
. Use cert client-kube-apiserver.crt
from /var/lib/rancher/rke2/server/tls
(find it on a control plane node)private-key
. Use key client-kube-apiserver.key
from /var/lib/rancher/rke2/server/tls
(find it on a control plane node)apiServerUrl
should use ACEWe haven't had a chance to verify whether we have to update the certificate
and private-key
data. Once the peering is there, we didn't bother to touch it.
Hello :D It has been a while :wave: I reviewed doc and updates, but I have missed something basic please let me know. I feel like I have, but my searches have not turned up anything.
What happened:
During the setup of Liqo peering using liqoctl, I encountered a TLS certificate verification error. The specific error message was:
ERRO Failed peering clusters: Error from server (InternalError): Internal error occurred: failed calling webhook 'fc.mutate.liqo.io': failed to call webhook: Post 'https://liqo-controller-manager.liqo.svc:9443/mutate/foreign-cluster?timeout=10s': tls: failed to verify certificate: x509: certificate signed by unknown authority
.What you expected to happen:
I expected the Liqo peering process to complete successfully without any TLS certificate errors.
How to reproduce it (as minimally and precisely as possible):
Set up two Liqo clusters. Run liqoctl generate peer-command on the first cluster to generate a peering command. Execute the generated peering command on the second cluster using liqoctl peer out-of-band. Observe the TLS certificate verification error.
Anything else we need to know?:
One cluster is on prem: k8s v1.27.11+rke2r1 rocky 9.3 vms Calico v3.27.0
The remote cluster is OVH: k8s v1.28.3 Canal registry.kubernatine.ovh/public/flannel:v0.21.3 registry.kubernatine.ovh/public/calico-node:v3.26.1-amd64
Both have liqo installed with helm via argo-cd, with values generated and modified by
liqoctl install k3s -n cluster1 --only-output-values
Environment:
kubectl version
): See aboveOn prem cluster
Remote cluster