Open Alilat-imad opened 3 months ago
Please edit and remove your token asap if that's a valid token :) I'll read your message and reply shortly
You have disabled the public ipv4 interface. Are you executing these commands from a server in the same private network as the cluster? If not that's what you need to do or the computer from which you run commands will not be able to reach the nodes. Also, when you disable the public interface you need some additional setup to be able to use Internet from the nodes and download packages etc. You need to set up the NAT gateway. This of setup is not yet described in the docs since it's less common but for the time being you can refer to the post create commands in https://github.com/vitobotta/hetzner-k3s/discussions/385#discussioncomment-10168998 as an example. But you also need to configure the NAT gateway.
I previously created a server on Hetzner to install PostgreSQL. Since IPv4 came at an additional cost, I opted for IPv6 only and use it for SSH access.
So I need to have the ipv4 enabled even for worker nodes ?
Any way, once I've did the bellow changes, everything works well.
Now I have another issue where I need your precious assistance.
networking:
public_network:
ipv4: false
ipv6: true
Trying to use a k3s datastore into my postgres instance instead of the default etcd.
To do that, i've created a private network group where is found my postgres server. Than I created the k3s cluster using that same network group. but it seems that the property external_datastore_endpoint
is being ignored.
First i thought I've mistaked the url by doing :
datastore:
external_datastore_endpoint: postgres://user:password@PRIVATE_IP:5432/k3s_cluster_db
Than I've fixhed the url to this :
datastore:
external_datastore_endpoint: postgresql://user:password@POSTGRES_PRIVATE_IP:5432/k3s_cluster_db
But it didn't work, (etc is the one being used)
Next obvious verification I've tried was to ssh into the master node and ping POSTGRES_PRIVATE_IP it did work well.
Last check i've did was
sudo apt-get install postgresql-client
psql postgresql://user:password@PRIVATE_IP:5432/k3s_cluster_db
It also did work.
I previously created a server on Hetzner to install PostgreSQL. Since IPv4 came at an additional cost, I opted for IPv6 only and use it for SSH access.
So I need to have the ipv4 enabled even for worker nodes ?
Any way, once I've did the bellow changes, everything works well.
You can disable the public IPs if you prefer, but in order to access the cluster with hetzner-k3s you need to run it from a server in the same private network. I haven't tested with ipv6 only to be honest.
Now I have another issue where I need your precious assistance.
networking: public_network: ipv4: false ipv6: true
Trying to use a k3s datastore into my postgres instance instead of the default etcd. To do that, i've created a private network group where is found my postgres server. Than I created the k3s cluster using that same network group. but it seems that the property
external_datastore_endpoint
is being ignored.First i thought I've mistaked the url by doing :
datastore: external_datastore_endpoint: postgres://user:password@PRIVATE_IP:5432/k3s_cluster_db
Than I've fixhed the url to this :
datastore: external_datastore_endpoint: postgresql://user:password@POSTGRES_PRIVATE_IP:5432/k3s_cluster_db
But it didn't work, (etc is the one being used)
Next obvious verification I've tried was to ssh into the master node and ping POSTGRES_PRIVATE_IP it did work well.
Last check i've did was
sudo apt-get install postgresql-client psql postgresql://user:password@PRIVATE_IP:5432/k3s_cluster_db
It also did work.
You forgot to set the mode of the datastore to external
:
datastore:
mode: external
external_datastore_endpoint: postgres://....
I was missing the mode
property set to external
My new config look like this :
datastore:
mode: external # etcd (default) or external
external_datastore_endpoint: postgresql://user:password@PRIVATE_IP:5432/k3s_cluster_db
And I am getting the bellow error :
[Control plane] Generating the kubeconfig file to /Users/USER/.kube/config... error: no context exists with the name: "k3s-cluster-master1"
But once I switch mode to etcd everything work's well.
By "Switch" do you mean that you changed datastore type on an existing cluster? If yes, that's not supported. The datastore choice is permanent for the life of the cluster.
Indeed, I changed the datastore type before the creation of the cluster. I appreciate your guidance on this matter.
I've done multiple Creation/deletion and the result is the same, the bellow configuration doesn't work :
datastore:
mode: external
external_datastore_endpoint: postgresql://user:password@PRIVATE_IP:5432/k3s_cluster_db
Here is the error I got :
[Instance k3s-cluster-master1] [INFO] Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.30.3+k3s1/k3s
[Instance k3s-cluster-master1] [INFO] Verifying binary download
[Instance k3s-cluster-master1] [INFO] Installing k3s to /usr/local/bin/k3s
[Instance k3s-cluster-master1] [INFO] Skipping installation of SELinux RPM
[Instance k3s-cluster-master1] [INFO] Creating /usr/local/bin/kubectl symlink to k3s
[Instance k3s-cluster-master1] [INFO] Creating /usr/local/bin/crictl symlink to k3s
[Instance k3s-cluster-master1] [INFO] Creating /usr/local/bin/ctr symlink to k3s
[Instance k3s-cluster-master1] [INFO] Creating killall script /usr/local/bin/k3s-killall.sh
[Instance k3s-cluster-master1] [INFO] Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[Instance k3s-cluster-master1] [INFO] env: Creating environment file /etc/systemd/system/k3s.service.env
[Instance k3s-cluster-master1] [INFO] systemd: Creating service file /etc/systemd/system/k3s.service
[Instance k3s-cluster-master1] [INFO] systemd: Enabling k3s unit
[Instance k3s-cluster-master1] [INFO] systemd: Starting k3s
[Instance k3s-cluster-master1] Waiting for the control plane to be ready...
[Control plane] Generating the kubeconfig file to /Users/username/.kube/config...
error: no context exists with the name: "k3s-cluster-master1"
[Control plane] ...kubeconfig file generated as /Users/username/.kube/config.
Unhandled exception in spawn: timeout after 00:00:30 (Tasker::Timeout)
from /opt/homebrew/Cellar/hetzner_k3s/2.0.2/bin/hetzner-k3s in 'raise<Tasker::Timeout>:NoReturn'
from /opt/homebrew/Cellar/hetzner_k3s/2.0.2/bin/hetzner-k3s in 'Tasker@Tasker::Methods::timeout<Time::Span, &Proc(Nil)>:Nil'
from /opt/homebrew/Cellar/hetzner_k3s/2.0.2/bin/hetzner-k3s in '~procProc(Nil)@src/cluster/create.cr:75'
from /opt/homebrew/Cellar/hetzner_k3s/2.0.2/bin/hetzner-k3s in 'Fiber#run:(IO::FileDescriptor | Nil)'
Then, i've thought my be their is an issue with kubeconfig_path: "~/.kube/config"
so I've switched it to kubeconfig_path: "./kubeconfig"
but same error :
error: no context exists with the name: "k3s-cluster-master1"
[Control plane] ...kubeconfig file generated as /Users/username/Projects/side-projects/infra/kubeconfig.
Unhandled exception in spawn: timeout after 00:00:30 (Tasker::Timeout)
from /opt/homebrew/Cellar/hetzner_k3s/2.0.2/bin/hetzner-k3s in 'raise<Tasker::Timeout>:NoReturn'
from /opt/homebrew/Cellar/hetzner_k3s/2.0.2/bin/hetzner-k3s in 'Tasker@Tasker::Methods::timeout<Time::Span, &Proc(Nil)>:Nil'
from /opt/homebrew/Cellar/hetzner_k3s/2.0.2/bin/hetzner-k3s in '~procProc(Nil)@src/cluster/create.cr:75'
from /opt/homebrew/Cellar/hetzner_k3s/2.0.2/bin/hetzner-k3s in 'Fiber#run:(IO::FileDescriptor | Nil)'
I previously created a server on Hetzner to install PostgreSQL. Since IPv4 came at an additional cost, I opted for IPv6 only and use it for SSH access. So I need to have the ipv4 enabled even for worker nodes ? Any way, once I've did the bellow changes, everything works well.
You can disable the public IPs if you prefer, but in order to access the cluster with hetzner-k3s you need to run it from a server in the same private network. I haven't tested with ipv6 only to be honest.
I've tried it but it get stuck on the creation
[Configuration] Validating configuration...
[Configuration] ...configuration seems valid.
[SSH key] SSH key already exists, skipping create
[Placement groups] Creating placement group k3s-cluster-masters...
[Placement groups] ...placement group k3s-cluster-masters created
[Instance k3s-cluster-master1] Creating instance k3s-cluster-master1 (attempt 1)...
[Instance k3s-cluster-master1] Instance status: off
[Instance k3s-cluster-master1] Powering on instance (attempt 1)
[Instance k3s-cluster-master1] Waiting for instance to be powered on...
[Instance k3s-cluster-master1] Instance status: running
[Instance k3s-cluster-master1] Waiting for successful ssh connectivity with instance k3s-cluster-master1...
[Instance k3s-cluster-master1] Instance k3s-cluster-master1 already exists, skipping create
[Instance k3s-cluster-master1] Instance status: running
[Instance k3s-cluster-master1] Waiting for successful ssh connectivity with instance k3s-cluster-master1...
[Instance k3s-cluster-master1] Instance k3s-cluster-master1 already exists, skipping create
[Instance k3s-cluster-master1] Instance status: running
[Instance k3s-cluster-master1] Waiting for successful ssh connectivity with instance k3s-cluster-master1...
Error creating instance: timeout after 00:01:00
Instance creation for k3s-cluster-master1 failed. Try rerunning the create command.
Indeed, I changed the datastore type before the creation of the cluster. I appreciate your guidance on this matter.
I've done multiple Creation/deletion and the result is the same, the bellow configuration doesn't work :
datastore: mode: external external_datastore_endpoint: postgresql://user:password@PRIVATE_IP:5432/k3s_cluster_db
Here is the error I got :
[Instance k3s-cluster-master1] [INFO] Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.30.3+k3s1/k3s [Instance k3s-cluster-master1] [INFO] Verifying binary download [Instance k3s-cluster-master1] [INFO] Installing k3s to /usr/local/bin/k3s [Instance k3s-cluster-master1] [INFO] Skipping installation of SELinux RPM [Instance k3s-cluster-master1] [INFO] Creating /usr/local/bin/kubectl symlink to k3s [Instance k3s-cluster-master1] [INFO] Creating /usr/local/bin/crictl symlink to k3s [Instance k3s-cluster-master1] [INFO] Creating /usr/local/bin/ctr symlink to k3s [Instance k3s-cluster-master1] [INFO] Creating killall script /usr/local/bin/k3s-killall.sh [Instance k3s-cluster-master1] [INFO] Creating uninstall script /usr/local/bin/k3s-uninstall.sh [Instance k3s-cluster-master1] [INFO] env: Creating environment file /etc/systemd/system/k3s.service.env [Instance k3s-cluster-master1] [INFO] systemd: Creating service file /etc/systemd/system/k3s.service [Instance k3s-cluster-master1] [INFO] systemd: Enabling k3s unit [Instance k3s-cluster-master1] [INFO] systemd: Starting k3s [Instance k3s-cluster-master1] Waiting for the control plane to be ready... [Control plane] Generating the kubeconfig file to /Users/username/.kube/config... error: no context exists with the name: "k3s-cluster-master1" [Control plane] ...kubeconfig file generated as /Users/username/.kube/config. Unhandled exception in spawn: timeout after 00:00:30 (Tasker::Timeout) from /opt/homebrew/Cellar/hetzner_k3s/2.0.2/bin/hetzner-k3s in 'raise<Tasker::Timeout>:NoReturn' from /opt/homebrew/Cellar/hetzner_k3s/2.0.2/bin/hetzner-k3s in 'Tasker@Tasker::Methods::timeout<Time::Span, &Proc(Nil)>:Nil' from /opt/homebrew/Cellar/hetzner_k3s/2.0.2/bin/hetzner-k3s in '~procProc(Nil)@src/cluster/create.cr:75' from /opt/homebrew/Cellar/hetzner_k3s/2.0.2/bin/hetzner-k3s in 'Fiber#run:(IO::FileDescriptor | Nil)'
Then, i've thought my be their is an issue with
kubeconfig_path: "~/.kube/config"
so I've switched it tokubeconfig_path: "./kubeconfig"
but same error :error: no context exists with the name: "k3s-cluster-master1" [Control plane] ...kubeconfig file generated as /Users/username/Projects/side-projects/infra/kubeconfig. Unhandled exception in spawn: timeout after 00:00:30 (Tasker::Timeout) from /opt/homebrew/Cellar/hetzner_k3s/2.0.2/bin/hetzner-k3s in 'raise<Tasker::Timeout>:NoReturn' from /opt/homebrew/Cellar/hetzner_k3s/2.0.2/bin/hetzner-k3s in 'Tasker@Tasker::Methods::timeout<Time::Span, &Proc(Nil)>:Nil' from /opt/homebrew/Cellar/hetzner_k3s/2.0.2/bin/hetzner-k3s in '~procProc(Nil)@src/cluster/create.cr:75' from /opt/homebrew/Cellar/hetzner_k3s/2.0.2/bin/hetzner-k3s in 'Fiber#run:(IO::FileDescriptor | Nil)'
Do you see any data in the pg database?
I've just checked pgAdmin, the k3s_cluster db have no table on it.
And you can connect with psql to postgresql://user:password@PRIVATE_IP:5432/k3s_cluster_db?
Yes I confirm it does work
I was planning to do a test cluster with a postgres db tonight but I just finished working and it's 1AM, so I'll have to postpone this to tomorrow night.
Hi, I could try now to figure out the problem you're having, but it would save me some time if you could describe step by step what you have done, from how you have set up the existing network, Internet access from the servers, and the Postgres server. Any detail you can provide may help me with the investigation, because I don't see any problems with a regular setup using a Postgres database, so it's perhaps something I am missing from your setup.
I had the same issue, with version ´v1.26.9+k3s1´
Thank you for this thread, I could solve it with adding this to the condig:
embedded_registry_mirror:
enabled: false
Next Fail on master updates was:
[System Upgrade Controller] deployment.apps/system-upgrade-controller configured
The ClusterRoleBinding "system-upgrade" is invalid: roleRef: Invalid value: rbac.RoleRef{APIGroup:"rbac.authorization.k8s.io", Kind:"ClusterRole", Name:"system-upgrade-controller"}: cannot change roleRef
[System Upgrade Controller] : The ClusterRoleBinding "system-upgrade" is invalid: roleRef: Invalid value: rbac.RoleRef{APIGroup:"rbac.authorization.k8s.io", Kind:"ClusterRole", Name:"system-upgrade-controller"}: cannot change roleRef
But I could fix it with:
kubectl delete clusterrolebinding system-upgrade
Thank you so much for the great work. Consider linking the upgrade notes to the readme. It took me while to find them and nearly destroying my cluster :D
New servers now unfortunately seems to cannot connect to the network like:
(combined from similar events): Could not create route b5d2cae7-4afb-486a-b9a3-d35f12bd2a1a 10.244.4.0/24 for node cl11-pool-cpx41-worker1 after 287.960811ms: hcloud/CreateRoute: hcops/AllServersCache.ByName: cl11-pool-cpx41-worker1 hcops/AllServersCache.getCache: not found
I had the same issue, with version ´v1.26.9+k3s1´
Thank you for this thread, I could solve it with adding this to the condig:
embedded_registry_mirror: enabled: false
Next Fail on master updates was:
[System Upgrade Controller] deployment.apps/system-upgrade-controller configured The ClusterRoleBinding "system-upgrade" is invalid: roleRef: Invalid value: rbac.RoleRef{APIGroup:"rbac.authorization.k8s.io", Kind:"ClusterRole", Name:"system-upgrade-controller"}: cannot change roleRef [System Upgrade Controller] : The ClusterRoleBinding "system-upgrade" is invalid: roleRef: Invalid value: rbac.RoleRef{APIGroup:"rbac.authorization.k8s.io", Kind:"ClusterRole", Name:"system-upgrade-controller"}: cannot change roleRef
But I could fix it with:
kubectl delete clusterrolebinding system-upgrade
Thank you so much for the great work. Consider linking the upgrade notes to the readme. It took me while to find them and nearly destroying my cluster :D
I'll add an "Upgrading page" when I have a bit of time (or you could make a PR? :)) But the upgrade instructions are defined in the 2.0.0 release notes and linked to in the following minor releases so who is upgrading should see them easily.
New servers now unfortunately seems to cannot connect to the network like:
(combined from similar events): Could not create route b5d2cae7-4afb-486a-b9a3-d35f12bd2a1a 10.244.4.0/24 for node cl11-pool-cpx41-worker1 after 287.960811ms: hcloud/CreateRoute: hcops/AllServersCache.ByName: cl11-pool-cpx41-worker1 hcops/AllServersCache.getCache: not found
Please open a separate issue with the details including your config file.
thank you, I got it fixed, the hostname and the name of the server in hetzner console did not match. But this could have likely been my error of setting include_instance_type_in_instance_name
to late.
thank you, I got it fixed, the hostname and the name of the server in hetzner console did not match. But this could have likely been my error of setting
include_instance_type_in_instance_name
to late.
I guess we can close this issue then? :)
Scenario to reproduce :
After exec
hetzner-k3s create --config hetzner-k3s-config.yml | tee create.log
The output :
And then nothing happen, no more attempt no more outputs.