Open twz123 opened 2 years ago
I was able to reproduce it (the "tee" error) without the upscale, i.e. running k0sctl again on a single node that has already been provisioned via a prior run of k0sctl.
I noticed that k0sctl wants to upgrade even if the target host is already running the correct version.
WARN [ssh] 10.83.134.135:22: k0s will be upgraded
I noticed that k0sctl wants to upgrade even if the target host is already running the correct version.
Yes, this always happens when k0sBinaryPath
or files:
is used because k0sctl didn't know if the file was changed. Now that k0sctl can detect local vs remote file changes, it should probably take this into consideration when deciding if the upgrade workflow should be chosen or not.
A second run of k0sctl also fails because it tries to join new controllers by requesting a token from the wrong node (the newly created one which hasn't been joined)
I wonder how this happens. The K0sLeader()
should always pick a host that has k0s running.
tee: /usr/local/bin/k0s: Text file busy
The only possible explanation for this is that k0s is still running when trying to replace the binary.
I had the exact same problem. The first update was met with "tee: /usr/local/bin/k0s: Text file busy" and the k0s binary was removed from the node where the error occurred. I then tried to update again using k0sctl, but failed when trying to do a token generation and join. However, this relocated the k0s binary on the node, so after starting the service again with systemctl from the node, the update was performed again with k0sctl, and the process ended successfully.
This is definitely some timing issue. There's the check if k0s is still running, but maybe this check just races when the actual process is about to terminate but not quite terminated. When rerunning k0sctl apply again (after some seconds), the binary can be uploaded again, but will fail later on when trying to invoke k0s install
(#362).
I see multiple ways of fixing this:
time.Sleep(10 * time.Second)
The red army knife for timing issues :eyes: Well, there must be a better way than this ...Hmmm, this is a forced upgrade because of the presence of files
. The "upload binaries" phase should be skipped because k0s is going to be upgraded. There's some error in the host selection logic in that phase.
Reopening as this is not yet resolved.
Upgrading a cluster from one node to three nodes failed with the following log line:
Target OS: Alpine 3.15 k0sctl version: 0.13.0-rc.1-1-gaf2f60b (af2f60b896c1b4ba4f1e6016fe445d9cfa7fe247) k0sctl.log
A second run of k0sctl also fails because it tries to join new controllers by requesting a token from the wrong node (the newly created one which hasn't been joined):
Logs from the second run: k0sctl_2.log
Config used: