rancher / terraform-aws-rke2

Terraform module to manage an RKE2 node from AWS
GNU General Public License v3.0
1 stars 4 forks source link

Not working on AMI with Ubuntu 22.04 #153

Closed mateuszkwiatkowski closed 2 months ago

mateuszkwiatkowski commented 4 months ago

I set image_type = ubuntu-22 and install_method = tar.

I'm getting this error:

module.InitialServer.module.install.null_resource.install: Creating...
module.InitialServer.module.install.null_resource.install: Provisioning with 'file'...
module.InitialServer.module.install.null_resource.install: Provisioning with 'remote-exec'...
module.InitialServer.module.install.null_resource.install (remote-exec): Connecting to remote host via SSH...
module.InitialServer.module.install.null_resource.install (remote-exec):   Host: [CUT]
module.InitialServer.module.install.null_resource.install (remote-exec):   User: tf-rke2
module.InitialServer.module.install.null_resource.install (remote-exec):   Password: false
module.InitialServer.module.install.null_resource.install (remote-exec):   Private key: false
module.InitialServer.module.install.null_resource.install (remote-exec):   Certificate: false
module.InitialServer.module.install.null_resource.install (remote-exec):   SSH Agent: true
module.InitialServer.module.install.null_resource.install (remote-exec):   Checking Host Key: false
module.InitialServer.module.install.null_resource.install (remote-exec):   Target Platform: unix
module.InitialServer.module.install.null_resource.install (remote-exec): Connected!
module.InitialServer.module.install.null_resource.install (remote-exec): + set -e
module.InitialServer.module.install.null_resource.install (remote-exec): + sudo chmod +x /home/tf-rke2/install.sh
module.InitialServer.module.install.null_resource.install (remote-exec): + sudo /home/tf-rke2/install.sh server /home/tf-rke2/rke2_artifacts v1.29.3+rke2r1 tar
module.InitialServer.module.install.null_resource.install (remote-exec): + set -e
module.InitialServer.module.install.null_resource.install (remote-exec): + ROLE=server
module.InitialServer.module.install.null_resource.install (remote-exec): + REMOTE_PATH=/home/tf-rke2/rke2_artifacts
module.InitialServer.module.install.null_resource.install (remote-exec): + RELEASE=v1.29.3+rke2r1
module.InitialServer.module.install.null_resource.install (remote-exec): + INSTALL_METHOD=tar
module.InitialServer.module.install.null_resource.install (remote-exec): + CHANNEL=
module.InitialServer.module.install.null_resource.install (remote-exec): + systemctl is-active rke2-server.service
module.InitialServer.module.install.null_resource.install (remote-exec): + [ inactive = active ]
module.InitialServer.module.install.null_resource.install (remote-exec): + unset INSTALL_RKE2_CHANNEL
module.InitialServer.module.install.null_resource.install (remote-exec): + unset INSTALL_RKE2_VERSION
module.InitialServer.module.install.null_resource.install (remote-exec): + [ v1.29.3+rke2r1 == latest ]
module.InitialServer.module.install.null_resource.install (remote-exec): /home/tf-rke2/install.sh: 17: [: v1.29.3+rke2r1: unexpected operator
module.InitialServer.module.install.null_resource.install (remote-exec): + [ v1.29.3+rke2r1 == stable ]
module.InitialServer.module.install.null_resource.install (remote-exec): /home/tf-rke2/install.sh: 19: [: v1.29.3+rke2r1: unexpected operator
module.InitialServer.module.install.null_resource.install (remote-exec): + export INSTALL_RKE2_VERSION=v1.29.3+rke2r1
module.InitialServer.module.install.null_resource.install (remote-exec): + [  !=  ]
module.InitialServer.module.install.null_resource.install (remote-exec): + export INSTALL_RKE2_METHOD=tar
module.InitialServer.module.install.null_resource.install (remote-exec): + export INSTALL_RKE2_TYPE=server
module.InitialServer.module.install.null_resource.install (remote-exec): + [ tar = rpm ]
module.InitialServer.module.install.null_resource.install (remote-exec): + export INSTALL_RKE2_ARTIFACT_PATH=/home/tf-rke2/rke2_artifacts
module.InitialServer.module.install.null_resource.install (remote-exec): + [ ! -f /home/tf-rke2/rke2_artifacts/install.sh ]
module.InitialServer.module.install.null_resource.install (remote-exec): + curl -sfL https://get.rke2.io -o /home/tf-rke2/rke2_artifacts/install.sh
module.InitialServer.module.install.null_resource.install (remote-exec): + chmod +x /home/tf-rke2/rke2_artifacts/install.sh
module.InitialServer.module.install.null_resource.install (remote-exec): + /home/tf-rke2/rke2_artifacts/install.sh
module.InitialServer.module.install.null_resource.install (remote-exec): [INFO]  staging local checksums from /home/tf-rke2/rke2_artifacts/sha256sum-amd64.txt
module.InitialServer.module.install.null_resource.install (remote-exec): cp: cannot stat '/home/tf-rke2/rke2_artifacts/sha256sum-amd64.txt': No such file or directory
╷
│ Error: remote-exec provisioner error
│
│   with module.InitialServer.module.install.null_resource.install,
│   on .terraform/modules/InitialServer.install/main.tf line 92, in resource "null_resource" "install":
│   92:   provisioner "remote-exec" {
│
│ error executing "/home/tf-rke2/rke2_install_terraform": Process exited with status 1

It looks that the script is not executed with bash and it uses bash syntax.

mateuszkwiatkowski commented 4 months ago

Changing /bin/sh symlink to point to bash instead of dash (withdpkg-reconfigure dash) fixes the "unexpected operator" errors. The cp error is still there.

mateuszkwiatkowski commented 4 months ago

The bug happens when the last argument to the /home/tf-rke2/install.sh script is empty like in this Terraform log:

+ sudo /home/tf-rke2/install.sh server /home/tf-rke2/rke2_artifacts v1.29.3+rke2r1 tar ''

The fix here is to provide RKE2's .tar.gz and checksum file before the install.sh script is launched or to let it download it by itself (this is controlled with the INSTALL_RKE2_ARTIFACT_PATH variable.

matttrach commented 4 months ago

ok, I am looking into this, thank you! I should have a new major release for this repo sometime this week(hopefully): https://github.com/rancher/terraform-aws-rke2/pull/152

matttrach commented 4 months ago

The reason we aren't letting the installer download the tarball files is because we are targeting air-gapped environments with install method tar.

What version of the module are you using?

mateuszkwiatkowski commented 3 months ago

Hey @matttrach, It's version 0.1.22. If target is air-gapped env it wasn't obvious to me. Now I understand that after reading source code a bit. At first I have setup a cluster with RHEL-9 + rpm installation method and it just worked. After a while I decided I prefer running it on Ubuntu and I just switched these two settings to Ubuntu-22.04 and tar - it didn't work. Maybe it just lacks documentation how to populate artifacts to these servers? I don't mind downloading them to my workstation and letting terraform to upload them to servers.

matttrach commented 3 months ago

I am so sorry that this happened... Thank you for reaching out. I want to make sure to validate the options that you are choosing, this is what I understand so far:

This is the test id I am using to represent this use case: "ubuntu-22-canal-stable-one-tar-ipv4-nginx"

mateuszkwiatkowski commented 3 months ago

Yes, I copied the devcluster example and changed these two variables: OS = ubuntu-22.04, installation_method = tar. It's single node, yes.

matttrach commented 3 months ago

Excellent! If you don't mind upgrading, you can use either the "one" example or the "simple" example to accomplish this.

I manually tested this diff on the simple example:

diff --git a/examples/simple/main.tf b/examples/simple/main.tf
index 3b98736..c64cc3f 100644
--- a/examples/simple/main.tf
+++ b/examples/simple/main.tf
@@ -40,4 +40,7 @@ module "this" {
   }
   local_file_path      = local.local_file_path
   install_rke2_version = local.rke2_version
+
+  server_image_type    = "ubuntu-22"
+  install_use_strategy = "tar"
 }

For the "one" example I set these parameters:

"key_name" = <my key name in aws>
"key" = <my public key>
"identifier" = <a random string>
"zone" = <my route53 zone>
"rke2_version" = "v1.29.5+rke2r1"
"os" = "ubuntu-22"
"install_method" = "tar"

"file_path" = "" // using the default ./rke2
"cni"  = "canal" // default
"ip_family" = "ipv4" // default
ingress_controller = "nginx" // default

You can also skip dns provisioning by setting project_domain_use_strategy = "skip". I have not yet implemented ipv6 or dualstack for ip_family, or alternative ingresses, those are coming soon.

You can see the test id here: https://github.com/rancher/terraform-aws-rke2/blob/main/tests/test/ready_test.go#L41 This is where I set the combinations that we verify before release, please let me know if there are any others you would like to try out.