Closed carstenblt closed 11 months ago
@carstenblt The best way forward here, is to SSH into the server when that error happens (see readme debug section) and run the script directly, in that previous attempt you would have run /tmp/terraform_1586559340.sh directly just executing it as in in your terminal, that will show the error message, please share that, and also the content of that file (without sensitive values).
@mysticaltech Thank you for your reply. The two scripts that fail:
#!/bin/sh
set -ex
sed -i 's/^- |[0-9]\+$/- |/g' /var/post_install/kustomization.yaml
timeout 360 bash <<EOF
until [[ "\$(kubectl get --raw='/readyz' 2> /dev/null)" == "ok" ]]; do
echo "Waiting for the cluster to become ready..."
sleep 2
done
EOF
kubectl apply -k /var/post_install
echo 'Waiting for the system-upgrade-controller deployment to become available...'
kubectl -n system-upgrade wait --for=condition=available --timeout=360s deployment/system-upgrade-controller
sleep 7
kubectl -n system-upgrade apply -f /var/post_install/plans.yaml
fails with
+ sed -i 's/^- |[0-9]\+$/- |/g' /var/post_install/kustomization.yaml
sed: can't read /var/post_install/kustomization.yaml: Not a directory
and
#!/bin/sh
rm -f /var/user_kustomize/*.yaml.tpl
echo 'Deploying manifests from /var/user_kustomize/:' && ls -alh /var/user_kustomize
kubectl kustomize /var/user_kustomize/ | kubectl apply --wait=true -f -
fails with
Deploying manifests from /var/user_kustomize/:
-rw-r--r--. 1 root root 334 Dec 22 11:50 /var/user_kustomize
error: must build at directory: '/var/user_kustomize/': file is not directory
error: no objects passed to apply
Indeed, /var/user_kustomize
is not a directory but rather my extra-manifests/external-secrets.yaml.tpl
file and /var/post_install
is a kured DaemonSet definition.
@carstenblt Very interesting, I will try to reproduce and look into it ASAP.
@mysticaltech have you had time to look at this, I am experiencing the same issue.
@libracoder Sorry not yet, will try tonight.
Oh great, Thank you!
I think i found the issue @mysticaltech
There is a wierd "" character appended to the private keys each time its created, so the local-exec is unable to find the ssh key
Strange discovery
This works
command = "install -b -m 600 /dev/null /tmp/${random_string.identity_file.id} && echo ${file("~/.ssh/id_ed25519")} > /tmp/${random_string.identity_file.id}"
This dosent
provisioner "local-exec" {
command = <<-EOT
install -b -m 600 /dev/null /tmp/${random_string.identity_file.id}
echo "${local.ssh_client_identity}" > /tmp/${random_string.identity_file.id}
EOT
}
@libracoder Your issue may be different from @carstenblt's one. Which platform are you on? If on Windows, please make sure to use WSL.
Yeah, I think so I am having a myraid of issue. I am on windows using WSL. After fixing the issue with the file names, now I have these errors
module.kube-hetzner.module.agents["0-0-agent-small"].hcloud_server.server (local-exec): Executing: ["/bin/sh" "-c" "until ssh -o UserKnownHostsFile=/dev/null -o StrictHos
tKeyChecking=no -o IdentitiesOnly=yes -o PubkeyAuthentication=yes -i /tmp/3rgtcngppy7cskz6vtq8 -o ConnectTimeout=2 -p 22 root@128.140.58.213 true 2> /dev/null\r\ndo\r\n echo \"Waiting for MicroOS to become available...\"\r\n sleep 3\r\ndone\r\n"]
module.kube-hetzner.module.agents["0-0-agent-small"].hcloud_server.server (local-exec): /bin/sh: 6: Syntax error: end of file unexpected (expecting "do")
╷
│ Error: local-exec provisioner error
│
│ with module.kube-hetzner.module.control_planes["0-0-control-plane-fsn1"].hcloud_server.server,
│ on .terraform/modules/kube-hetzner/modules/host/main.tf line 60, in resource "hcloud_server" "server":
│ 60: provisioner "local-exec" {
│
│ Error running command 'until ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o IdentitiesOnly=yes -o PubkeyAuthentication=yes -i
│ /tmp/4tc950ymt3qc42lez113 -o ConnectTimeout=2 -p 22 root@49.12.185.102 true 2> /dev/null
│ do
│ echo "Waiting for MicroOS to become available..."
│ sleep 3
│ done
│ ': exit status 2. Output: /bin/sh: 6: Syntax error: end of file unexpected (expecting "do")
│
╵
╷
│ Error: local-exec provisioner error
│
│ with module.kube-hetzner.module.agents["0-0-agent-small"].hcloud_server.server,
│ on .terraform/modules/kube-hetzner/modules/host/main.tf line 60, in resource "hcloud_server" "server":
│ 60: provisioner "local-exec" {
│
│ Error running command 'until ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o IdentitiesOnly=yes -o PubkeyAuthentication=yes -i
│ /tmp/3rgtcngppy7cskz6vtq8 -o ConnectTimeout=2 -p 22 root@128.140.58.213 true 2> /dev/null
│ do
│ echo "Waiting for MicroOS to become available..."
│ sleep 3
│ done
│ ': exit status 2. Output: /bin/sh: 6: Syntax error: end of file unexpected (expecting "do")
│
╵
libracoder@Libracoder-Surface8-Pro:/var/www/ht-cloud$
Ok, @libracoder please search this repo, others have successfully deployed via WSL-2, it definitely works. And you may want to have a look at that short but important SSH guide, your keys need to be created in that manner from WSL too. https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/blob/master/docs/ssh.md
Thank you so much for you help. I will go through the docs you shared.
@libracoder That's one issue that just showed up on my radar with a solution, and there are others too. https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/issues/1140
@carstenblt FYI, just pushed the fix in v2.11.2, and tested with your own example, it worked like a charm. Good luck!
@carstenblt FYI, just pushed the fix in v2.11.2, and tested with your own example, it worked like a charm. Good luck!
+ sed -i 's/^- |[0-9]\+$/- |/g' /var/post_install/kustomization.yaml
sed: can't read /var/post_install/kustomization.yaml: Not a directory
is still the same. This is the kured plan.
The other changed:
/tmp/terraform_1365493471.sh: line 3: unexpected EOF while looking for matching `''
This is because the file looks like this, typo:
#!/bin/sh
rm -f /var/user_kustomize/*.yaml.tpl
echo 'Applying user kustomization...
kubectl apply -k /var/user_kustomize/ --wait=true
Sorry I'm not of much help, I don't know any terraform. But I believe the problem might be, that the creation of the directories is only done in the resource "null_resource" "first_control_plane"
section, but it should be done on all control_planes? The file provisioner then does not work because apparently folders are not created when using ssh: https://developer.hashicorp.com/terraform/language/resources/provisioners/file
It might be that the problem arose after moving one control node to a new group because I wanted it to move to another location.
@carstenblt Terraform destroy, make sure you are on a proper linux shell like WSL and try again, if it worked for me with this very same setup, it should work for you.
@carstenblt Thanks for your PR #1143 , it was merged and deployed in v2.11.3.
@mysticaltech I believe this is what breaks it:
I'll try to check later if this reproduces it.
@carstenblt Of course, that's not supposed to happen. See readme and kube.tf.example, you can only scale up and down nodepools, and it should be done very carefully with draining proper node deletion etc. And once HA you cannot scale down to non-HA in my experience, what you could do is create a new nodepool with a count of 1, bringing the total control planes to 4, then drain the last node of first nodepool, do kubectl delete node and scale down the count to 2.
It seems that something is wrong. Please see the details:
module.infra.null_resource.kustomization (remote-exec): Connecting to remote host via SSH...
module.infra.null_resource.kustomization (remote-exec): Host: xxxx
module.infra.null_resource.kustomization (remote-exec): User: root
module.infra.null_resource.kustomization (remote-exec): Password: false
module.infra.null_resource.kustomization (remote-exec): Private key: false
module.infra.null_resource.kustomization (remote-exec): Certificate: false
module.infra.null_resource.kustomization (remote-exec): SSH Agent: true
module.infra.null_resource.kustomization (remote-exec): Checking Host Key: false
module.infra.null_resource.kustomization (remote-exec): Target Platform: unix
module.infra.null_resource.kustomization (remote-exec): Connected!
module.infra.null_resource.kustomization: Still creating... [40s elapsed]
module.infra.null_resource.kustomization (remote-exec): + sed -i 's/^- |[0-9]\+$/- |/g' /var/post_install/kustomization.yaml
module.infra.null_resource.kustomization (remote-exec): sed: can't read /var/post_install/kustomization.yaml: Not a directory
╷
│ Error: remote-exec provisioner error
│
│ with module.infra.null_resource.kustomization,
│ on .terraform/modules/infra/init.tf line 288, in resource "null_resource" "kustomization":
│ 288: provisioner "remote-exec" {
│
│ error executing "/tmp/terraform_148135017.sh": Process exited with status 2
╵
% ssh root@xxxx -p 60022
infra-control-plane-vnc:~ # less /var/post_install
infra-control-plane-vnc:~ # ls -l /var/post_install
-rw-r--r--. 1 root root 642 Jan 2 18:16 /var/post_install
infra-control-plane-vnc:~ #
The file /var/post_install
should have been a directory instead.
@otavio Please open a new issue with reproduceable steps, and please try to debug yourself too.
Description
Applying fails with
As this might have something to do with extra-manifests, here is my
extra-manifests/kustomization.yaml.tpl
:and my
extra-manifests/external-secrets.yaml.tpl
:Kube.tf file
Screenshots
No response
Platform
MacOS