rancher / k3os

Purpose-built OS for Kubernetes, fully managed by Kubernetes.
https://k3os.io
Apache License 2.0
3.5k stars 397 forks source link

K3OS cluster cannot register with Rancher instance in cloud #748

Open smorgan912 opened 2 years ago

smorgan912 commented 2 years ago

Version (k3OS / kernel)

k3os version v0.20.7-k3s1r0

5.4.0-73-generic #82 SMP Thu Jun 3 02:29:43 UTC 2021

Architecture

x86_64

Describe the bug I have a Rancher cluster running in Azure and a local K3OS cluster running on Intel NUCs. I’m trying to register the local cluster with the Azure-based cluster and am getting the following error in the cattle-cluster-agent pod:

time="2021-09-17T17:30:52Z" level=info msg="Connecting to wss://rancher.xxxxxxxx.nip.io/v3/connect/register with token " time="2021-09-17T17:30:52Z" level=info msg="Connecting to proxy" url="wss://rancher.xxxxxxxxxx.nip.io/v3/connect/register" time="2021-09-17T17:31:02Z" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp: i/o timeout" time="2021-09-17T17:31:02Z" level=error msg="Remotedialer proxy error" error="dial tcp: i/o timeout"

I don’t have a proxy configured. Notice no ip or host is listed in the error. Also, I am able to control this local cluster using fleet management so I know it can talk to the Azure-based cluster.

To Reproduce Ran cluster registration command:

curl --insecure -sfL https://rancher.xxxxxxxxx.nip.io/v3/import/.yaml | kubectl apply -f -

Expected behavior Cluster should be registered in cloud Rancher instance.

Actual behavior Connection to register failed.

dweomer commented 2 years ago

a dial timeout tells me that you likely have a security group in between your nucs and azure instances helpfully dropping packets

smorgan912 commented 2 years ago

I've been able to register other clusters (GKE) to this cluster with no issues.

smorgan912 commented 2 years ago

I also created a Rancher cluster in GCP and am having this same issue trying to register a K3OS cluster with this new GCP cluster. Any ideas?

dweomer commented 2 years ago

time="2021-09-17T17:31:02Z" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp: i/o timeout"

If you are still seeing errors such as above then you likely have something in between your local cluster and your cloud cluster that is dropping packets. Please review the following: