Closed vinnytwice closed 1 year ago
@vinnytwice Please delete your hcloud_token
and create a new one. You've exposed it in your kube.tf file.
I adjusted your comment, but it will still be available in the history.
@M4t7e thanks for pointing it out, of course I'll rotate it.
Ok, I did start fresh a new Hetzner and Kube-hetzner projects so I could run some tests and see what leads to what results.. but generally speaking using initial_k3s = “stable”
is causing the above error.
In all other cases terraform hangs on destroying the cluster either on the control plane node or the agent node depending on how the load balancer is set up.
Here is the step-by-step tweaking I made and the results:
Setup :
terraform init --upgrade
Initializing the backend...
Upgrading modules...
Downloading registry.terraform.io/kube-hetzner/kube-hetzner/hcloud 2.3.2 for kube-hetzner...
- kube-hetzner in .terraform/modules/kube-hetzner
- kube-hetzner.agents in .terraform/modules/kube-hetzner/modules/host
- kube-hetzner.control_planes in .terraform/modules/kube-hetzner/modules/host
Initializing provider plugins...
- Finding latest version of hashicorp/random...
- Finding latest version of hashicorp/cloudinit...
- Finding hetznercloud/hcloud versions matching ">= 1.41.0"...
- Finding hashicorp/local versions matching ">= 2.0.0"...
- Finding tenstad/remote versions matching ">= 0.0.23"...
- Finding integrations/github versions matching ">= 4.0.0"...
- Finding latest version of hashicorp/null...
- Installing hashicorp/cloudinit v2.3.2...
- Installed hashicorp/cloudinit v2.3.2 (signed by HashiCorp)
- Installing hetznercloud/hcloud v1.42.0...
- Installed hetznercloud/hcloud v1.42.0 (signed by a HashiCorp partner, key ID 5219EACB3A77198B)
- Installing hashicorp/local v2.4.0...
- Installed hashicorp/local v2.4.0 (signed by HashiCorp)
- Installing tenstad/remote v0.1.2...
- Installed tenstad/remote v0.1.2 (self-signed, key ID 0696D656FC3AC5FA)
- Installing integrations/github v5.32.0...
- Installed integrations/github v5.32.0 (signed by a HashiCorp partner, key ID 38027F80D7FD5FB2)
- Installing hashicorp/null v3.2.1...
- Installed hashicorp/null v3.2.1 (signed by HashiCorp)
- Installing hashicorp/random v3.5.1...
- Installed hashicorp/random v3.5.1 (signed by HashiCorp)
Configuration tests
Updated the default kube.tf file to create just 1 cax11 Control plane + 1 cax11 agent nodes
Apply : success in 3’41” Destroy: hangs on
module.kube-hetzner.hcloud_network_subnet.agent[0]
Network and Load Balancer are not destroyed.
Added:
@L419 automaitically_upgrade_k3s = false @L423 automatically_upgrade_os = false @L441 initial_k3s = “stable”
Apply: Error in 3’25”
│ Error: remote-exec provisioner error
│
│ with module.kube-hetzner.null_resource.kustomization,
│ on .terraform/modules/kube-hetzner/init.tf line 285, in resource "null_resource" "kustomization":
│ 285: provisioner "remote-exec" {
│
│ error executing "/tmp/terraform_815743714.sh": Process exited with status 1
Destroy: success
Removed: @L441 initial_k3s = “stable”
Apply: Success in 3’30” Destroy: hangs on
module.kube-hetzner.hcloud_network_subnet.agent[0]
Network and Load Balancer are not destroyed.
Added: @L370 ingress_controller = “nginx”
Apply: Success in 7’30” TAKES WAY LONGER !!! Destroy: hangs on
module.kube-hetzner.hcloud_network_subnet.control_plane[0]: Still destroying... [id=3156572-10.255.0.0/16, 9m40s elapsed]
Network and Load Balancer are not destroyed.
Added:
@L204 control_planes_custom_config = {
etcd-expose-metrics = true,
kube-controller-manager-arg = "bind-address=0.0.0.0",
kube-proxy-arg ="metrics-bind-address=0.0.0.0",
kube-scheduler-arg = "bind-address=0.0.0.0",
}
Apply: Success in 4’08”
Destroy: hangs on
module.kube-hetzner.hcloud_network_subnet.agent[0]: Still destroying... [id=3156635-10.0.0.0/16, 6m0s elapsed]
Network and Load Balancer are not destroyed.
Added:
@L541 use_control_plane_lb = true
Apply: success in 7’30”
Destroy: hangs on
module.kube-hetzner.hcloud_network_subnet.control_plane[0]: Still destroying... [id=3156672-10.255.0.0/16, 9m20s elapsed]
Added:
nginx_values = <<EOT
controller:
watchIngressWithoutClass: "true"
kind: "DaemonSet"
config:
"use-forwarded-headers": "true"
"compute-full-forwarded-for": "true"
"use-proxy-protocol": "true"
service:
annotations:
"load-balancer.hetzner.cloud/name": "k3s"
"load-balancer.hetzner.cloud/use-private-ip": "true"
"load-balancer.hetzner.cloud/disable-private-ingress": "true"
"load-balancer.hetzner.cloud/location": "fsn1"
"load-balancer.hetzner.cloud/type": "lb11"
"load-balancer.hetzner.cloud/uses-proxyprotocol": "true"
extraArgs:
default-ssl-certificate: "default/tls-secret" # Only difference from sample config
EOT
Apply: Success
Destroy: hangs on
module.kube-hetzner.hcloud_network_subnet.control_plane[0]: Still destroying... [id=3156860-10.255.0.0/16, 3m30s elapsed]
I can't seem to find a stable configuration..
I'm basically trying to set the cluster as :
1 Control-plane 3 Agents ( Node.js server - MongoDb - Neo4j ) Nginx ingress controller with some TCP port mapping to some services ( mainly to expose Neo4j Browser )
Right now HA is not a priority.. I'm just testing Hetzner cloud as I want to move away from Azure
What is the correct configuration for this cluster??
@vinnytwice See the readme on how to destroy. We've created a cleanupkh command to help you out. Basically as soon as you reach the subnet destroy, you need to apply it in a separate terminal tab or window. It will delete the lb and the autoscaled nodes, that are not part of terraform (they cause the hanging on destroy).
So basically you do that and you should have it stable! :)
@mysticaltech oh I see.. thanks. I'll check this solution for the hanging part.
By the way we're here there is a part which I don't fully get from the kube.tf file.. sorry I'm still quite unexperienced in setting up kubernetes clusters and infrastructure and could use a help here..
communicate to the cluster from inside the cluster itself
, in which case it is important to set this value, as it will configure the hostnameIs setting this parameter related to external communications using a DNS record instead?
AFAIK kubernetes internal cluster communication
is performed using Service
objects, but the DNS A record
I set "hetzner_cloud" on my domain "my domain.com" pointing to the load balancer IP address should give me a "hetzner_cloud.mydomain.com" url to access the cluster.
As I understand it, if I set lb_hostname = "hetzner_cloud.mydomain.com"
I am then able to use "hetzner_cloud.mydomain.com" as a url to access the cluster, if I leave it unset then I have to use the Load Balancer IP address directly to access the cluster.
Or is this parameter used instead of/in combination with declaring the tls hosts in the Ingress
manifest as follow?
kind: Ingress
metadata:
name: echo-ingress
annotations:
kubernetes.io/ingress.class: nginx
certmanager.k8s.io/cluster-issuer: letsencrypt-staging
spec:
tls:
- hosts:
- hetzner_cloud.mydomain.com
Thank you very much again
@vinnytwice Apologies for the late reply, just saw this now. I understand this can be confusing. Basically it's only necessary if you have internal pods that are going to use full blown complete domains instead of services to talk to other in-cluster pods.
Let's say you have pod A, with associated service A, exposing as a.mycluster.domain.com, instead of using A, pod B or C are going to try to reach A on a.mycluster.domain.com, this can cause slow down if the Hetzner LB that the A service uses is not associated with the domain name in question.
So basically, if you are in this scenario, it's best to set to lb_hostname
to mycluster.domain.com,. A record that to your LB IP, and CNAME a.mycluster.domain.com to mycluster.domain.com, that way when an internal service that to A via a.mycluster.domain.com, there would not be any slow downs.
lb_hostname
just sets load-balancer.hetzner.cloud/hostname
in the Hetzner LB definition, either in Nginx or Traefik.
Honestly I do not use it, because I just have internel services use the other services names. If it is in another namespace, you just prefix like
More context here: https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/discussions/478
Hope this clarifies it enough. This section in the docs is definitely confusing. PR welcome!
@vinnytwice Please check the kube.tf.example again, I clarified the section. Hopefully it's even clearer than the above.
Description
I'm getting inconsistent results so I can't pinpoint the causes of the issues I'm facing.
I was trying to set a multi node cluster with 1 cp 3 agents and an autoscaler on a working cluster that was using a single node ARM server. The first issue that I faced is that adding labels to the Autoscale Nodes throws an error:
so I tried not including it. After a few tries I started getting the
image has incompatible architecture
error for all servers( I started using version = "2.0.9", without it the hcloud used is 2.3.4 ) :Destroyed the infra and started again.
Not specifying a version number fixed the incompatible architecture issue and started deploying the infra. I reduced the amount of servers created to 2 so only the control plane and one agent, both cax21 servers in case it was a "quota" problem as Im using free credits, but (after creating all resources as I see on cloud console) tf gets stuck until its throws an error:
I then destroyed the infra once again, set
create_kustomization = false
and deployed the infra agin. No changes..still same errors:What can I check to solve this? Many thanks
Kube.tf file
Screenshots
No response
Platform
Mac