Open vcharlet opened 11 months ago
After more digging, i found a lot of errors in fleet-controller logs.
2023-12-26T23:18:31.027085701Z time="2023-12-26T23:18:31Z" level=error msg="error syncing 'fleet-default/test-12': handler import-cluster: host must be a URL or a host:port pair: \"https://2b31:3440:c00:1b::9be3/k8s/clusters/c-m-lh7qt27d\", requeuing"
It seems to be the problem.
@vcharlet For future reference, ipv6 urls have to be encolsed in brackets (like this: https://[2b31:3440:c00:1b::9be3]
so your rancher server-url
is not a valid URL, which will prevent the cluster agents from dialing back to rancher as it cannot be resolved. Once that issue is alleviated, feel free to reach back out as I want to make sure it's working.
@vcharlet Were you able to alleviate the issue?
@jakefhyde Sorry for the delay.
I couldn't solve the problem. I know IPv6 have to be enclosed in brackets and my server url is good, it's a domain name.
I'm trying to deploy a custom RKE2 cluster from the dashboard with the registration command.
The provisioned cluster itself seems fine, all pods are OK, all probes are OK, i can access it with kubetl, etc ... The provisioning is just stuck at the end and i can't access the cluster in the dashboard.
The only error i can find is in fleet-controller logs. Maybe this issue is related : https://github.com/rancher/rancher/issues/42722 https://github.com/rancher/rancher/blob/4cf3b4a6e94f99b8ef78bf8f254d9e62fdf400cc/cmd/agent/main.go#L367
serverURL.Host return an IPv6 with no brackets ? It's not the same error than mine, I've a problem with /k8s/clusters/**.
I've not set this IP anywhere manually and it's not my server-url. It's the IP of the Rancher service on the local cluster.
I can do more tests but i don't know where to look for.
Thanks for helping
I think there is a problem here. service.Spec.ClusterIP is an IPv6. The url is malformed and brackets are missing.
Could this be the problem ?
@vcharlet That may very well be the issue, makes sense since fleet consumes that as part of the kubeconfig secret.
@jakefhyde Yes, I've done quite a few tests and I'm pretty sure that's the problem.
This has to do with the fact that the local cluster where rancher is installed is ipv6 only.
The service.Spec.ClusterIP of the rancher service is an IPv6.
It's easy to reproduce.
The error is also present in the dashboard in the "Clusters" tab for all cluster (local cluster included).
0 Nodes Ready despite the fact that the cluster works well.
@olblak I'm assigning this over to the fleet team, since this is related to the kubeconfig secret that we generate for the local cluster.
This should be fixed in the current main
branch.
This should be fixed in the current
main
branch.
So I guess we are targeting the Rancher 2.10 Is this something we need to backport to 2.9.3/2.8.7?
@Jono-SUSE-Rancher Do you have any opinion?
Hi @olblak - If you want to target it for v2.10.0, we should move it into that milestone. In terms of whether or not we should backport it, I mean, I would think we definitely want to fix it in v2.9.3. I would check with Cam to see if we need it in v2.8x.
Documentation regarding QA testing considerations and tests in general are documented in my pr at the top.
@kkaempf - Can we close this? I am going to close the milestone since we released on Monday.
Rancher Server Setup
Describe the bug All nodes are IPv6 only (local and downstream).
On a fresh Rancher v2.8.0, i tried to create a custom RKE2 cluster with 1 node. The cluster is stuck on waiting for cluster agent to connect.
Additional informations The cluster is an IPv6 only cluster, so it has been configured with additional options to avoid being stuck with probes: https://github.com/rancher/rancher/issues/42411
Exemple :
Screenshots
Machine status :
The downstream cluster seems fine :
cluster-agent on downstream cluster seems fine too. There are no errors in logs. I tried to restart cluster-agent without success.
rancher.lan is the rancher hostname
i tried to open a shell on cluster-agent pod to do some tests :
full logs
Thanks for helping.