konstructio / kubefirst

The Kubefirst Open Source Platform
https://kubefirst.konstruct.io/docs
MIT License
1.8k stars 142 forks source link

Cluster on Civo via CLI: `no such host` / `cluster not found` #1903

Open marc0olo opened 12 months ago

marc0olo commented 12 months ago

Which version of kubefirst are you using?

v2.3.5

Which cloud provider?

Civo

Which DNS?

Cloud ones (default)

Which installation type?

CLI

Which distributed Git provider?

GitHub

Which Operating System?

Windows

What is the issue?

We have discussed this already on Slack. I have tested multiple scenarios. In this specific case I used my own custom gitops-template which is based on the latest commit of the main branch. I don't think that the gitops-template causes the error.

My current assumption is that this is related to my local setup. I am running on Windows 10 and Docker is running with WLS2 (Ubuntu).

For whatever reason the kubefirst cluster for running cannot resolve the DNS names.

Logs

...
...
...
2023-11-08T10:27 INF pkg/shell.go:36 > OUT: 
2023-11-08T10:27 INF pkg/shell.go:37 > Command: /home/marc0olo/.k1/kubefirst-console/tools/mkcert
2023-11-08T10:27 INF internal/launch/cmd.go:544 > Created Kubernetes Secret for certificate
2023-11-08T10:32 INF internal/cluster/cluster.go:137 > error Get "https://console.kubefirst.dev/api/proxy?url=/cluster/soon-market": dial tcp: lookup console.kubefirst.dev on 172.19.64.1:53: no such host
2023-11-08T10:32 INF internal/provision/provision.go:26 > cluster not found
2023-11-08T10:32 INF internal/cluster/cluster.go:57 > error Post "https://console.kubefirst.dev/api/proxy": dial tcp: lookup console.kubefirst.dev on 172.19.64.1:53: no such host

Possible solution

I just saw that other users faced this issue and I saw that @johndietz posted (https://kubefirst.slack.com/archives/C03U34WJ7FW/p1699204416688779?thread_ts=1699009556.005929&cid=C03U34WJ7FW) following solution:

yeah everything internal to the cluster should be good and healthy and ready to go out of the box, the problem with your setup is that your host macbook is unable to resolve the cluster’s ingressed hosts at kubefirst.dev. we have a public dns on aws route53 that establishes that all subdomains of *.kubefirst.dev will resolve to 127.0.0.1 . in order for your host to resolve the argocd.kubefirst.dev hostname, you just need >to be using a nameserver that can find the public record. usually default configs are adequate, but in your case, likely because >of your internet provider or your company if using a corporate laptop, you aren’t finding it. you can see how your host is configured to resolve dns by running cat /etc/resolv.conf.

resolution option 1: change nameservers (preferred) if you were to change your nameserver to 8.8.8.8 or 8.8.4.4 you would change it from whatever it is to instead use google dns. google dns will definitely find the record without the need to adjust your hosts file.

resolution option 2: override in your hosts file (a little gross, but fine for this local use case) the /etc/hosts file is a mechanism that allows you to override how your machine resolves any hostname. it doesn’t use dns, it’s just hardcoded mappings of hostnames to ip addresses. whatever you have in this file will always win, no matter what your dns settings or nameservers are set up to accommodate. real dns is only leveraged when the hostname being requested is not in your /etc/hosts file.

Code of Conduct

marc0olo commented 12 months ago

the problem is definitely related to WSL2 and Docker on Windows. if somebody managed to solve it please let me know. I am sticking with the marketplace installer for now instead of CLI. wasted too much time on this "little issue" 😅

fharper commented 11 months ago

@marc0olo: both a colleague, and myself were not able to reproduce this issue on Windows 11, and WSL2 using Civo with v2.3.5 and GitHub/GitLab. It may be something about Windows 10, which I doubt.

I assume John's solution didn't work for you?

What is the result of running dig kubefirst.kubefirst.dev ?

marc0olo commented 11 months ago

mhh 🤔 right now I am avoiding to test this again by using the marketplace installer of Civo instead. will check again if I find some time. but I saw another person posting on slack about this problem, maybe it is the same or related.

John's solution is clear, but there is some strange config to be considered when running the whole stack on Windows via WSL2. so I wasn't able to solve it easily.

alechp commented 2 months ago

Having an identical issue with Civo. Will be testing with DigitalOcean to confirm that this is Civo exclusive:

Screenshot 2024-09-02 at 11 00 19 AM

Logs for context:

{"level":"debug","time":"2024-09-02T10:59:21-07:00","message":"unable to reach \"https://console.kubefirst.dev/api/proxyHealth\" (57/60)"}
{"level":"debug","time":"2024-09-02T10:59:26-07:00","message":"unable to reach \"https://console.kubefirst.dev/api/proxyHealth\" (58/60)"}
{"level":"debug","time":"2024-09-02T10:59:31-07:00","message":"unable to reach \"https://console.kubefirst.dev/api/proxyHealth\" (59/60)"}
{"level":"debug","time":"2024-09-02T10:59:36-07:00","message":"unable to reach \"https://console.kubefirst.dev/api/proxyHealth\" (60/60)"}
{"level":"info","time":"2024-09-02T10:59:41-07:00","message":"error Get \"https://console.kubefirst.dev/api/proxy?url=/cluster/civo-phx1-cscloud\": dial tcp 127.0.0.1:443: connect: connection refused"}
{"level":"info","time":"2024-09-02T10:59:41-07:00","message":"cluster not found"}
{"level":"info","time":"2024-09-02T10:59:41-07:00","message":"error Post \"https://console.kubefirst.dev/api/proxy\": dial tcp 127.0.0.1:443: connect: connection refused"}
alechp commented 2 months ago

Update: was having this issue when testing with DigitalOcean as well. However, managed to get it working.

More context on what was done to resolve it here in this comment: https://github.com/ssotops/k1space/issues/10#issuecomment-2325330556

fharper commented 2 months ago

@alechp: I think your issue isn't related to what @marc0olo was experimenting since it is specific to the move we are doing.

As for @marc0olo, I wasn't able to replicate last time I tried, so we'll need to debug this together once you have the time. We can also consider this issue as closed if, as you mentioned, using only the marketplace now, and won't replicate.