starting kubernetes: preparing server: json: cannot unmarshal string into Go value of type bootstrap.File

larmic commented 2 years ago

Environmental Info: K3s Version: v1.22.4+k3s1 (bec170bc)

Node(s) CPU architecture, OS, and Version: Linux pi4-rack-1.local 5.10.82-v8+ #1497 SMP PREEMPT Fri Dec 3 16:30:35 GMT 2021 aarch64 GNU/Linux

Cluster Configuration: 2 Servers, 1 Agent

Describe the bug: After upgrading from Buster to Bullseye and get a cgroup error. Then upgrades k3s from 1.18.x to v1.22.4+k3s1. After starting k3s I get starting kubernetes: preparing server: json: cannot unmarshal string into Go value of type bootstrap.File

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: xxx
    server: https://127.0.0.1:6443
  name: default
contexts:
- context:
    cluster: default
    user: default
  name: default
current-context: default
kind: Config
preferences: {}
users:
- name: default
  user:
    password: xxx
    username: xxx

Steps To Reproduce:

Install Buster
Install k3s in Version 1.18.x
Upgrade Buster to Bullseye
Upgrade k3s to Version v1.22.4+k3s1

Expected behavior: Server should started :)

Actual behavior: Server not started :)

Additional context / logs:

Dez 06 11:39:12 pi4-rack-1.local k3s[15556]: time="2021-12-06T11:39:12Z" level=info msg="Starting k3s v1.22.4+k3s1 (bec170bc)"
Dez 06 11:39:12 pi4-rack-1.local k3s[15556]: time="2021-12-06T11:39:12Z" level=info msg="Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s"
Dez 06 11:39:12 pi4-rack-1.local k3s[15556]: time="2021-12-06T11:39:12Z" level=info msg="Configuring database table schema and indexes, this may take a moment..."
Dez 06 11:39:12 pi4-rack-1.local k3s[15556]: time="2021-12-06T11:39:12Z" level=info msg="Database tables and indexes are up to date"
Dez 06 11:39:12 pi4-rack-1.local k3s[15556]: time="2021-12-06T11:39:12Z" level=info msg="Kine available at unix://kine.sock"
Dez 06 11:39:12 pi4-rack-1.local k3s[15556]: time="2021-12-06T11:39:12Z" level=info msg="Reconciling bootstrap data between datastore and disk"
Dez 06 11:39:12 pi4-rack-1.local k3s[15556]: time="2021-12-06T11:39:12Z" level=fatal msg="starting kubernetes: preparing server: json: cannot unmarshal string into Go value of type bootstrap.File"
Dez 06 11:39:12 pi4-rack-1.local systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
Dez 06 11:39:12 pi4-rack-1.local systemd[1]: k3s.service: Failed with result 'exit-code'.
Dez 06 11:39:12 pi4-rack-1.local systemd[1]: Failed to start Lightweight Kubernetes.

brandond commented 2 years ago

I'm not sure that it's safe to jump forward 4 minor versions, from 1.18.x to v1.22.4. In general we follow the Kubernetes version skew policy, which requires that server components be within 1 minor revision of each other during upgrades. Can you try upgrading 1.18 -> 1.19 -> 1.20 -> 1.21 -> 1.22 and see if you get better results?

larmic commented 2 years ago

Hi @brandond thanks for your answer. I thought is was ok to jump forward 4 minor version. In SemVer minor versions should not break anything (imho).

brandond commented 2 years ago

Kubernetes doesn't adhere to semver in that respect. Take a look through the version skew policy doc.

I believe our QA team in particular adheres to that policy, and only tests upgrades between minor versions - so if you're jumping across multiple minor versions you're likely to run into scenarios we've not tested. I don't believe any Kubernetes managed service providers will let you skip minors when upgrading either.

ghost commented 2 years ago

Please be aware that it might be necessary to upgrade to the latest patch-version first. For me the upgrade path

1.20.7 -> 1.21.7 -> 1.22.4

failed on the first upgrade with the same log messages you posted above. Applying the following path worked seamlessly:

1.20.7 -> 1.20.13 -> 1.21.7 -> 1.22.4

larmic commented 2 years ago

Ok, a complete reinstall fixed it. Thx.

knweiss commented 2 years ago

FWIW: We saw the same error today upgrading our three node k3s (stable channel; embedded etcd) from v1.21.5+k3s2 to v1.21.7+k3s1.

brandond commented 2 years ago

cc @briandowns any ideas?

briandowns commented 2 years ago

This is probably due to the change in bootstrap file data storage format in the database. In recent versions there's a timestamp (UNIX epoch) that's stored along with the file name and content. I can take a look.

brandond commented 2 years ago

I thought we'd handled that by up-converting the bootstrap data in-memory with dummy timestamps, but it sounds like that's not working here for some reason?

~@knweiss are you using managed etcd, or sql, for your HA datastore?~ Nevermind, I see that you're using etcd.

briandowns commented 2 years ago

We have code that handles the migration of the bootstrap data that doesn't have the timestamp. I'd probably need some more info to see where in this process it might be going sideways.

ohlol commented 2 years ago

@briandowns I can consistently reproduce this. what information would be helpful?

brandond commented 2 years ago

Reopening so that we can find root cause on this.

@ohlol can you provide information on the versions of the nodes in your cluster - both the nodes already in the cluster, and the ones you're attempting to join.

myoung34 commented 2 years ago

I started hitting this also.

I have an 8 node k3s cluster with 2 masters using mysql I had to rebuild one of my masters.

I tore it down and did k delete node cluster22 and verified there were no lingering passwords k -n kube-system get secrets | grep cluster22

It was originally 1.21.5+k3s2 but i pulled in 1.21.7+k3s1 and it never completes:

$ k get nodes | sort
NAME        STATUS   ROLES                  AGE     VERSION
cluster11   Ready    <none>                 52d     v1.21.5+k3s2
cluster12   Ready    control-plane,master   21d     v1.21.5+k3s2
cluster13   Ready    <none>                 49m     v1.21.7+k3s1
cluster14   Ready    <none>                 6d22h   v1.21.7+k3s1
cluster21   Ready    <none>                 52d     v1.21.5+k3s2
cluster23   Ready    <none>                 52d     v1.21.5+k3s2
cluster24   Ready    <none>                 48d     v1.21.5+k3s2

Reinstalled with

myoung@cluster22:~$ curl -sfL https://get.k3s.io | K3S_DATASTORE_ENDPOINT='mysql://k3s:redact@tcp(192.168.3.2:3306)/k3s' \ 
  K3S_URL=https://192.168.1.21:6443 K3S_TOKEN=redact::server:redact \
  INSTALL_K3S_VERSION="v1.21.7+k3s1" \
  INSTALL_K3S_EXEC="server --no-deploy traefik" sh -
[INFO]  Using v1.21.7+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.21.7+k3s1/sha256sum-arm64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.21.7+k3s1/k3s-arm64
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s
Job for k3s.service failed because the control process exited with error code.
See "systemctl status k3s.service" and "journalctl -xe" for details.

Logs

-- A start job for unit k3s.service has begun execution.
--
-- The job identifier is 90580.
Dec 12 22:09:09 cluster22 sh[101296]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Dec 12 22:09:09 cluster22 sh[101302]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Dec 12 22:09:09 cluster22 k3s[101307]: time="2021-12-12T22:09:09.968691139Z" level=info msg="Starting k3s v1.21.7+k3s1 (ac705709)"
Dec 12 22:09:09 cluster22 k3s[101307]: time="2021-12-12T22:09:09.974703658Z" level=info msg="Configuring mysql database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s"
Dec 12 22:09:09 cluster22 k3s[101307]: time="2021-12-12T22:09:09.974814489Z" level=info msg="Configuring database table schema and indexes, this may take a moment..."
Dec 12 22:09:09 cluster22 k3s[101307]: time="2021-12-12T22:09:09.978136013Z" level=info msg="Database tables and indexes are up to date"
Dec 12 22:09:12 cluster22 k3s[101307]: time="2021-12-12T22:09:12.836233117Z" level=info msg="Kine listening on unix://kine.sock"
Dec 12 22:09:12 cluster22 k3s[101307]: time="2021-12-12T22:09:12.899644999Z" level=info msg="Reconciling bootstrap data between datastore and disk"
Dec 12 22:09:12 cluster22 k3s[101307]: time="2021-12-12T22:09:12.900682483Z" level=fatal msg="starting kubernetes: preparing server: json: cannot unmarshal string into Go value of type bootstrap.File"
Dec 12 22:09:12 cluster22 systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- An ExecStart= process belonging to unit k3s.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 1.
Dec 12 22:09:12 cluster22 systemd[1]: k3s.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit k3s.service has entered the 'failed' state with result 'exit-code'.
Dec 12 22:09:12 cluster22 systemd[1]: Failed to start Lightweight Kubernetes.
-- Subject: A start job for unit k3s.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit k3s.service has finished with a failure.
--

Downgrading back to 1.21.5+k3s2 (which matches the other master) works fine

myoung@cluster22:~$ curl -sfL https://get.k3s.io | K3S_DATASTORE_ENDPOINT='mysql://redact@tcp(192.168.3.2:3306)/k3s' \
  K3S_URL=https://192.168.1.21:6443 \
  K3S_TOKEN=redact::server:redact \
  INSTALL_K3S_VERSION="v1.21.5+k3s2" \
  INSTALL_K3S_EXEC="server --no-deploy traefik" sh -

[INFO]  Using v1.21.5+k3s2 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.21.5+k3s2/sha256sum-arm64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.21.5+k3s2/k3s-arm64
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s

myoung@cluster22:~$ sudo systemctl status k3s.service
● k3s.service - Lightweight Kubernetes
     Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2021-12-12 22:16:11 UTC; 24s ago
       Docs: https://k3s.io
    Process: 102010 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
    Process: 102012 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 102013 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
   Main PID: 102024 (k3s-server)
      Tasks: 27
     Memory: 755.2M
     CGroup: /system.slice/k3s.service
             ├─102024 /usr/local/bin/k3s server
             └─102062 containerd

verified

$ k get nodes | sort
NAME        STATUS   ROLES                  AGE     VERSION
cluster11   Ready    <none>                 52d     v1.21.5+k3s2
cluster12   Ready    control-plane,master   21d     v1.21.5+k3s2
cluster13   Ready    <none>                 58m     v1.21.7+k3s1
cluster14   Ready    <none>                 6d23h   v1.21.7+k3s1
cluster21   Ready    <none>                 52d     v1.21.5+k3s2
cluster22   Ready    control-plane,master   80s     v1.21.5+k3s2
cluster23   Ready    <none>                 52d     v1.21.5+k3s2
cluster24   Ready    <none>                 48d     v1.21.5+k3s2

brandond commented 2 years ago

It is not technically supported to have agents running a newer version of Kubernetes than your servers, so having v1.21.7+k3s1 agents while your servers are still on v1.21.4+k3s2 is not a supported configuration.

Servers should be compatible with other patch versions within the same minor version though. It looks like the issue is caused by trying to join v1.21.7+k3s1 servers to a cluster where other servers are still running v1.21.5+k3s2.

In this case the workaround should be to upgrade the existing servers to v1.21.7+k3s1 before adding more v1.21.7+k3s1 servers, but we should handle down-revision servers at join time properly since this is a valid configuration as per the Kubernetes version skew policy. cc @briandowns

ffly90 commented 2 years ago

@brandond: I'm working with @knweiss and we have a three server HA setup with embedded etcd. The scenario is, to do a rolling update. But when trying to lift one of the serves to v1.21.7+k3s1, the error occurs. From my understanding the server tries to rejoin the cluster after the upgrade which fails due to the described problem. At this stage we can only get the server back into running state by downgrading the affected server to match the version of the other two.

Oats87 commented 2 years ago

@knweiss @firefly-serenity @ohlol

When you are experiencing this issue, are you attempting to upgrade from v1.21.5+k3s2 to v1.21.7+k3s1 on a "fresh" node i.e. a rolling upgrade with a new node to the cluster?

I can consistently reproduce this when attempting to join a v1.21.7+k3s1 server to a cluster of v1.21.5+k3s2 servers.

In my case, this is due to the fact that https://github.com/k3s-io/k3s/blob/53ef842a9885fbf371a11089b7f95979c255b0a4/pkg/bootstrap/bootstrap.go#L70 does not appear to migrate bootstrap data.

If you attempt an in-place upgrade from v1.21.5+k3s2 to v1.21.7+k3s1, it should work as the certificates should already be on disk.

briandowns commented 2 years ago

PR #4730 has been added to help address this situation. After this is merged, I'll be backporting to the 1.21 branch.

knweiss commented 2 years ago

@Oats87 We have three k3s servers running v1.21.5+k3s2 and wanted to upgrade them to v1.21.7+k3s1 - one after the other.

However, during the upgrade of the first server we already saw the error. The k3s.service wouldn‘t start successfully because of the unmarshaling error and we had to go back to v1.21.5+k3s2.

I.e. an in-place upgrade.

Oats87 commented 2 years ago

@Oats87 We have three k3s servers running v1.21.5+k3s2 and wanted to upgrade them to v1.21.7+k3s1 - one after the other.

However, during the upgrade of the first server we already saw the error. The k3s.service wouldn‘t start successfully because of the unmarshaling error and we had to go back to v1.21.5+k3s2.

I.e. an in-place upgrade.

OK, thanks for the information.

Are you running the upgrade on the "init" node of your cluster, i.e. the one that does not have K3S_URL?

Edit: Does your init node get restarted with --cluster-init?

I'm attempting to work with @briandowns to isolate the exact logic path that you are encountering that gets you into the state. The certificate issue we identified above (and are remediating) is most definitely an issue on its own; however, I'm wondering if there's another scenario I'm just missing.

ohlol commented 2 years ago

I'll have detailed steps describing my process and observations later today, but briefly:

architectural context/details:

RKE2 leaders are in an Auto Scaling Group
RKE2 is installed w/Packer, so an upgrade is done by building a new AMI with the target version of RKE2, which is used to update the ASG's Launch Template
I don't upgrade live systems--they get replaced

I take a snapshot (to S3) of etcd state
Scale the ASG to zero, update Launch Template, scale ASG to 1
SSH to new leader, stop rke2-server, and perform upgrade as described in snapshot backup/restore doc

As an aside, prior to 1.21.7 I was able to do this by just manually cycling ASG instances one by one.

Anyway, hope that helps clarify my process at least, for what it's worth. I'll get some actual log output etc later today.

Oats87 commented 2 years ago

@ohlol if you are replacing your nodes with new ones using an ASG, then the fix @briandowns has in #4730 will fix your issue most likely.

I'm now attempting to hone in on what issue is being hit on an in-place upgrade

brandond commented 2 years ago

I take a snapshot (to S3) of etcd state Scale the ASG to zero, update Launch Template, scale ASG to 1 SSH to new leader, stop rke2-server, and perform upgrade as described in snapshot backup/restore doc

This suggests that we need to also test restoring a v1.21.5-rke2r2 backup to a clean v1.21.7-rke2r1 node. This would be different than the down-version HTTP boostrap case that @briandowns current fix resolves.

Oats87 commented 2 years ago

I take a snapshot (to S3) of etcd state Scale the ASG to zero, update Launch Template, scale ASG to 1 SSH to new leader, stop rke2-server, and perform upgrade as described in snapshot backup/restore doc

This suggests that we need to also test restoring a v1.21.5-rke2r2 backup to a clean v1.21.7-rke2r1 node. This would be different than the down-version HTTP boostrap case that @briandowns current fix resolves.

To clarify, I still believe that the fix @briandowns merged this morning should resolve the issue. We can explicitly test this case though.

knweiss commented 2 years ago

@Oats87 Yes, we started on the init node.

I'm away from the system right now. I will provide the information regarding --cluster-init tomorrow because I'm not 100% sure off the top of my head.

knweiss commented 2 years ago

@Oats87 So, on the init node we use what boils down to this:

common_options="--etcd-snapshot-retention 10 --selinux"
INSTALL_K3S_EXEC="--write-kubeconfig-mode 644 --tls-san $VIP --cluster-init $common_options --node-external-ip $VIP"
curl -sfL https://get.k3s.io | \
  INSTALL_K3S_CHANNEL="stable" \
  K3S_TOKEN="$K3S_TOKEN" \
  INSTALL_K3S_EXEC="$INSTALL_K3S_EXEC" sh -

So, yes, there's --cluster-init.

(FWIW: We use kube-vip on this system.)

myoung34 commented 2 years ago

It is not technically supported to have agents running a newer version of Kubernetes than your servers, so having v1.21.7+k3s1 agents while your servers are still on v1.21.4+k3s2 is not a supported configuration.

Im in the same situation as @knweiss

I know I shouldn't mix and match long term, but that shouldnt be true in minor version in-place upgrades; which is why I was attempting to do a rolling upgrade.

When the rolling upgrade failed I removed and re-bootrapped it from scratch to 1.21.7 and it did the same thing.

It basically made it impossible to upgrade from 1.21.5

Oats87 commented 2 years ago

@Oats87 So, on the init node we use what boils down to this:

common_options="--etcd-snapshot-retention 10 --selinux"
INSTALL_K3S_EXEC="--write-kubeconfig-mode 644 --tls-san $VIP --cluster-init $common_options --node-external-ip $VIP"
curl -sfL https://get.k3s.io | \
  INSTALL_K3S_CHANNEL="stable" \
  K3S_TOKEN="$K3S_TOKEN" \
  INSTALL_K3S_EXEC="$INSTALL_K3S_EXEC" sh -

So, yes, there's --cluster-init.

(FWIW: We use kube-vip on this system.)

Great, thank you. I want to some more testing with this. Out of curiosity, what are you running on your other controlplane/server nodes?

Im in the same situation as @knweiss

I know I shouldn't mix and match long term, but that shouldnt be true in minor version in-place upgrades; which is why I was attempting to do a rolling upgrade.

When the rolling upgrade failed I removed and re-bootrapped it from scratch to 1.21.7 and it did the same thing.

It basically made it impossible to upgrade from 1.21.5

Until @briandowns and @galal-hussein 's work lands in a release, you will not be able to use an etcd snapshot from v1.21.5+k3s2 on a fresh/new v1.21.7+k3s1 install, and will also not be able to add a v1.21.7+k3s1 server node to an existing cluster containing v1.21.5+k3s2 server nodes.

Are you saying you were unable to perform an in-place upgrade (i.e. upgrade the version of K3s on the same node in place) from v1.21.5+k3s2 to v1.21.7+k3s1?

The root of the issue @briandowns is addressing is that K3s attempts to place certificates on disk from bootstrap data, but is not using the migrated/converted bootstrap data (and using the old format) which is why the json decoder is failing.

@galal-hussein is working on an issue that deals with etcd snapshot restore, where the procedure never succeeds due to some logic that was recently added to block starting certain components until the agent is ready (but during an etcd restore, we don't start the agent)

briandowns commented 2 years ago

These 2 issues have been merged into the 3 active release branches and will be included in the coming releases.

myoung34 commented 2 years ago

Are you saying you were unable to perform an in-place upgrade (i.e. upgrade the version of K3s on the same node in place) from v1.21.5+k3s2 to v1.21.7+k3s1?

I have an 8 node home cluster with 2 masters. One of my non-master agents (from a typo on the k3s system upgrade controller) went to 1.21.7 I tried to upgrade one master (cluster22 in my original response) to 1.21.7 as a result so I could roll everything to 1.21.7 and as soon as it upgraded the masters could no longer cooperate (hence the logs)

brandond commented 2 years ago

I have an 8 node home cluster with 2 masters.

Are you using an external SQL datastore, or etcd? A two-node etcd cluster will not have quorum; if either node goes down the other will be non-functional.

myoung34 commented 2 years ago

MySQL for HA

ShylajaDevadiga commented 2 years ago

To reproduce the issue,

created a 2 server, 1 worker node cluster with v1.21.5+k3s2

Tried to join 3rd node with v1.21.7+k3s1 Node failed to join the cluster with error mentioned above


Dec 15 03:52:43 ip-172-31-7-32.us-east-2.compute.internal k3s[6844]: time="2021-12-15T03:52:43.960777643Z" level=fatal msg="starting kubernetes: preparing server: json: cannot unmarshal string into Go value of type bootstrap.File"

$ kubectl get nodes NAME STATUS ROLES AGE VERSION ip-172-31-3-168.us-east-2.compute.internal Ready control-plane,etcd,master 11m v1.21.5+k3s2 ip-172-31-4-235.us-east-2.compute.internal Ready control-plane,etcd,master 5m2s v1.21.5+k3s2 ip-172-31-5-1.us-east-2.compute.internal Ready 14s v1.21.5+k3s2

Validated the fix using **v1.21.7-rc2+k3s2 **
To the above cluster joined the node using version v1.21.7-rc2+k3s2 successfully

$ kubectl get nodes NAME STATUS ROLES AGE VERSION ip-172-31-3-168.us-east-2.compute.internal Ready control-plane,etcd,master 30m v1.21.5+k3s2 ip-172-31-4-235.us-east-2.compute.internal Ready control-plane,etcd,master 24m v1.21.5+k3s2 ip-172-31-5-1.us-east-2.compute.internal Ready 19m v1.21.5+k3s2 ip-172-31-7-32.us-east-2.compute.internal Ready control-plane,etcd,master 96s v1.21.7-rc2+k3s2

Repeated on a 2 node setup
Node1:

curl -sfL https://get.k3s.io |INSTALL_K3S_VERSION=v1.21.5+k3s2 INSTALL_K3S_TYPE='server' sh -s - server --cluster-init --token

Node2

curl -sfL https://get.k3s.io |INSTALL_K3S_VERSION=v1.21.7-rc2+k3s2 INSTALL_K3S_TYPE='server' sh -s - server --server https://:6443 --token

Node 2 joined successfully

$ kubectl get nodes NAME STATUS ROLES AGE VERSION ip-172-31-13-136.us-east-2.compute.internal Ready control-plane,etcd,master 11m v1.21.7-rc2+k3s2 ip-172-31-14-229.us-east-2.compute.internal Ready control-plane,etcd,master 14m v1.21.5+k3s2

$ kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-7448499f4d-5wgk8 1/1 Running 0 14m kube-system helm-install-traefik-crd-qr4zq 0/1 Completed 0 11m kube-system helm-install-traefik-xk9bz 0/1 Completed 1 11m kube-system local-path-provisioner-5ff76fc89d-hrkt9 1/1 Running 0 14m kube-system metrics-server-86cbb8457f-gb2nh 1/1 Running 0 14m kube-system svclb-traefik-m6q9k 2/2 Running 0 14m kube-system svclb-traefik-qsv8j 2/2 Running 0 11m kube-system traefik-6b84f7cbc-ks27t 1/1 Running 0 11m


Tried to reproduce upgrade scenario using the following setup. Issue was not reproduced. Upgrade was successful.
1. Created a 3 node cluster with v1.21.5+k3s2
2. Skipped upgrading node1. Started with upgrading node2 to v1.21.7+k3s1. Upgrade was successful
3. Similarly upgraded node3 as well as node1 without issues to v1.21.7+k3s1

Note: In all scenarios token was passed in every node

knweiss commented 2 years ago

@Oats87

Great, thank you. I want to some more testing with this. Out of curiosity, what are you running on your other controlplane/server nodes?

The other server nodes would be started like this (as mentioned we did not reach this point with v1.21.7+k3s1 as the init node did not start successfully because of the unmarshaling issue):

common_options="--etcd-snapshot-retention 10 --selinux"
INSTALL_K3S_EXEC="--server=https://INITNODE:6443 $common_options --node-external-ip $VIP"
curl -sfL https://get.k3s.io | \
  INSTALL_K3S_CHANNEL="stable" \
  K3S_TOKEN="$K3S_TOKEN" \
  INSTALL_K3S_EXEC="$INSTALL_K3S_EXEC" sh -

(Gonna move --node-external-ip into the common_options variable...)

Oats87 commented 2 years ago

@knweiss

Out of curiosity, why are you specifying --node-external-ip $VIP on all nodes? This doesn't seem like a good idea.

I'm still unable to reproduce the error following your steps.

Common variables are:

K3S_TOKEN=<token>
INITNODE=172.16.133.139 # IP of my node1
VIP=172.16.128.35 # IP for VIP
common_options="--etcd-snapshot-retention 10 --selinux"

I ran

kube-vip manifest daemonset --interface ens192 --address $VIP --inCluster --taint --controlplane --services --arp --leaderElection

to generate a manifest for kube-vip, and populated it into /var/lib/rancher/k3s/server/manifests along with the kube-vip RBAC.

Notably, I'm not able to get my init node to come up with --node-external-ip $VIP so I first have to:

INSTALL_K3S_EXEC="--write-kubeconfig-mode 644 --tls-san $VIP --cluster-init $common_options"
curl -sfL https://get.k3s.io | \
  INSTALL_K3S_VERSION="v1.21.5+k3s2" \
  K3S_TOKEN="$K3S_TOKEN" \
  INSTALL_K3S_EXEC="$INSTALL_K3S_EXEC" sh -

then, once the node is healthy and kube-vip properly advertises my VIP, I then run

INSTALL_K3S_EXEC="--write-kubeconfig-mode 644 --tls-san $VIP --cluster-init $common_options --node-external-ip $VIP"
curl -sfL https://get.k3s.io | \
  INSTALL_K3S_VERSION="v1.21.5+k3s2" \
  K3S_TOKEN="$K3S_TOKEN" \
  INSTALL_K3S_EXEC="$INSTALL_K3S_EXEC" sh -

on the init node.

On nodes 2 and 3, I run

INSTALL_K3S_EXEC="--server=https://$INITNODE:6443 $common_options --node-external-ip $VIP"
curl -sfL https://get.k3s.io | \
  INSTALL_K3S_VERSION="v1.21.5+k3s2" \
  K3S_TOKEN="$K3S_TOKEN" \
  INSTALL_K3S_EXEC="$INSTALL_K3S_EXEC" sh -

Once these nodes are all healthy, I go back to node 1 and run:

INSTALL_K3S_EXEC="--write-kubeconfig-mode 644 --tls-san $VIP --cluster-init $common_options --node-external-ip $VIP"
curl -sfL https://get.k3s.io | \
  INSTALL_K3S_VERSION="v1.21.7+k3s1" \
  K3S_TOKEN="$K3S_TOKEN" \
  INSTALL_K3S_EXEC="$INSTALL_K3S_EXEC" sh -

Notably, the node is able to successfully upgrade to v1.21.7+k3s1 albeit things break because of the --node-external-ip $VIP and kube-vip loses leader election due to it (because kube-proxy can't talk to the VIP etc)

knweiss commented 2 years ago

@Oats87 Today, we repeated the stable channel upgrade from v1.21.5+k3s2 to v1.21.7+k3s1 and much to our surprise this time it succeeded on all three nodes (with active --node-external-ip on all three nodes). Unfortunately, we don't know what's different this time. :-/ (We may did the last test with --disable servicelb in common-options but we're not 100% sure anymore.)

Regarding the --node-external-ip $VIP: The idea is to use the kube-vip VIP as a LoadBalancer IP for Traefik and not only to access the control plane. We have a wildcard DNS *.domain.local that points to this VIP. Traefik is the IngressController for our services and cert-manager provides TLS certs for all DNS names (e.g. svc1.domain.local).

In the default k3s configuration with three k3s servers (also used as workers) Traefik will use the three node-local IP addresses as Traefik's LoadBalancer IPs. This works. However, if DNS resolution for external service names points to one of those three node-local IPs the service would not be available during maintenance of this (server) node. To prevent this situation we came up with the --node-external-ip $VIP solution. Do you think this is a bad idea?

The --node-external-ip is a very recent change in our setup (we did not do much testing yet). The only issue we noticed so far is that the helm-install-traefik*pods had problems starting while the VIP was not on their node.

NAME   STATUS   ROLES                       AGE   VERSION        INTERNAL-IP   EXTERNAL-IP     OS-IMAGE                           KERNEL-VERSION                CONTAINER-RUNTIME
node0   Ready    control-plane,etcd,master   47d   v1.21.7+k3s1   x.y.142.241   x.y.142.232   Rocky Linux 8.5 (Green Obsidian)   4.18.0-348.2.1.el8_5.x86_64   containerd://1.4.12-k3s1
node1   Ready    control-plane,etcd,master   47d   v1.21.7+k3s1   x.y.142.240   x.y.142.232   Rocky Linux 8.5 (Green Obsidian)   4.18.0-348.2.1.el8_5.x86_64   containerd://1.4.12-k3s1
node2   Ready    control-plane,etcd,master   48d   v1.21.7+k3s1   x.y.142.239   x.y.142.232   Rocky Linux 8.5 (Green Obsidian)   4.18.0-348.2.1.el8_5.x86_64   containerd://1.4.12-k3s1

(Notice, the external IP (VIP) is shown on all three (server) nodes but is only active on one at a time.)

Oats87 commented 2 years ago

@Oats87 Today, we repeated the stable channel upgrade from v1.21.5+k3s2 to v1.21.7+k3s1 and much to our surprise this time it succeeded on all three nodes (with active --node-external-ip on all three nodes). Unfortunately, we don't know what's different this time. :-/ (We may did the last test with --disable servicelb in common-options but we're not 100% sure anymore.)

Regarding the --node-external-ip $VIP: The idea is to use the kube-vip VIP as a LoadBalancer IP for Traefik and not only to access the control plane. We have a wildcard DNS *.domain.local that points to this VIP. Traefik is the IngressController for our services and cert-manager provides TLS certs for all DNS names (e.g. svc1.domain.local).

In the default k3s configuration with three k3s servers (also used as workers) Traefik will use the three node-local IP addresses as Traefik's LoadBalancer IPs. This works. However, if DNS resolution for external service names points to one of those three node-local IPs the service would not be available during maintenance of this (server) node. To prevent this situation we came up with the --node-external-ip $VIP solution. Do you think this is a bad idea?

The --node-external-ip is a very recent change in our setup (we did not do much testing yet). The only issue we noticed so far is that the helm-install-traefik*pods had problems starting while the VIP was not on their node.
NAME   STATUS   ROLES                       AGE   VERSION        INTERNAL-IP   EXTERNAL-IP     OS-IMAGE                           KERNEL-VERSION                CONTAINER-RUNTIME
node0   Ready    control-plane,etcd,master   47d   v1.21.7+k3s1   x.y.142.241   x.y.142.232   Rocky Linux 8.5 (Green Obsidian)   4.18.0-348.2.1.el8_5.x86_64   containerd://1.4.12-k3s1
node1   Ready    control-plane,etcd,master   47d   v1.21.7+k3s1   x.y.142.240   x.y.142.232   Rocky Linux 8.5 (Green Obsidian)   4.18.0-348.2.1.el8_5.x86_64   containerd://1.4.12-k3s1
node2   Ready    control-plane,etcd,master   48d   v1.21.7+k3s1   x.y.142.239   x.y.142.232   Rocky Linux 8.5 (Green Obsidian)   4.18.0-348.2.1.el8_5.x86_64   containerd://1.4.12-k3s1
(Notice, the external IP (VIP) is shown on all three (server) nodes but is only active on one at a time.)

I think you may be trying to do things a bit backwards here. Setting --node-external-ip to the same value for all of your nodes has big implications as that affects core K8s/K3s behavior and can lead to unexpected behavior.

What I would recommend in this case is to disable servicelb and then deploy kube-vip configured to fulfill services with type LoadBalancer.

Regardless, if you hit this issue again (or have any desired clarifications on this), please open a new issue and be sure to mention me on it -- I will let the K3s QA close out this issue when they finish validation of the edge case we identified above /cc @k3s-io/k3s-testing

ShylajaDevadiga commented 2 years ago

Thanks for the update @knweiss.

k3s-io / k3s

starting kubernetes: preparing server: json: cannot unmarshal string into Go value of type bootstrap.File #4644