Upgrade/apply fatal timeout 1.19.4 to 1.20.6

instantlinux commented 3 years ago

What keywords did you search in kubeadm issues before filing this one?

fatal timeout invalid bearer token waiting to restart

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version): 1.20.6 {Major:"1", Minor:"20", GitVersion:"v1.20.6", GitCommit:"8a62859e515889f07e3e3be6a1080413f17cf2c3", GitTreeState:"clean", BuildDate:"2021-04-15T03:26:21Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Kubernetes version (use kubectl version): 1.19.4 / 1.20.6 (upgrading from 1.19.4, on the master this is what shows upon failure) Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"6b1d87acf3c8253c123756b9e61dac642678305f", GitTreeState:"clean", BuildDate:"2021-03-18T01:10:43Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration: Bare-metal plus VMs
OS (e.g. from /etc/os-release): Ubuntu 20.04.2 LTS

What happened?

The kubeadm upgrade apply v1.20.6 command will not get past this on my master:

[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-apiserver-borg0.ci.net hash: 9160e7ddf2ec811c44ee54195ce49d0d
Static pod: kube-apiserver-borg0.ci.net hash: 9160e7ddf2ec811c44ee54195ce49d0d
Static pod: kube-apiserver-borg0.ci.net hash: 9160e7ddf2ec811c44ee54195ce49d0d
timed out waiting for the condition
couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state.

What you expected to happen?

Success, or at least a diagnostic error message telling me what I might want to look at more closely.

How to reproduce it (as minimally and precisely as possible)?

In the hopes of being able to use kubeadm upgrade as a routine low-risk update process, just 5 months ago I built a new cluster and laboriously spent a week getting dozens of services running on this 1.19.4 instance. As far as I know, steps to reproduce are: install 1.19.4, run normal workloads, then invoke kubeadm upgrade apply. It's a vanilla cluster with single master and three workers.

Anything else we need to know?

On the fourth attempt, I ran docker logs -f on the three containers it spun up. The one that seemed to give the best hint as to what the problem is: apiserver. It was generating 10 to 20 of these types of errors per second during the 5-minute wait for timeout:

E0503 14:56:02.622620       1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, Token has been invalidated]
E0503 14:56:03.038727       1 status.go:71] apiserver received an error that is not an metav1.Status: 3
E0503 14:56:03.039625       1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, Token has been invalidated]
E0503 14:56:03.122677       1 status.go:71] apiserver received an error that is not an metav1.Status: 3
E0503 14:56:03.124287       1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, Token has been invalidated]
E0503 14:56:03.354751       1 cacher.go:419] cacher (*core.Secret): unexpected ListAndWatch error: failed to list *core.Secret: illegal base64 data at input byte 3; reinitializing...
E0503 14:56:03.371693       1 status.go:71] apiserver received an error that is not an metav1.Status: 3
E0503 14:56:03.372849       1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, Token has been invalidated]

The stderr output from kubeadm itself, at verbosity 5, is attached here:

kubeadm.log

fabriziopandini commented 3 years ago

/kind bug I see two problems here :-(

the new pod not starting
the rollback procedure failing

instantlinux commented 3 years ago

the rollback procedure failing

Rollback seems to be OK (it takes 3-5 minutes to bring things back to steady-state)--I've gone through this 4 times and haven't gotten into a non-working state. Note that this is not a zero-downtime process, a number of production-facing impacts do occur during the attempt to roll forward & back.

pacoxu commented 3 years ago

The static pod hash is not changed for 5 minutes.

Static pod: kube-apiserver-borg0.ci.net hash: 9160e7ddf2ec811c44ee54195ce49d0d
Static pod: kube-apiserver-borg0.ci.net hash: 9160e7ddf2ec811c44ee54195ce49d0d

I think you should check the kubelet log to see why. Also, check the apiserver yaml in /etc/kubernetes/manifests/ whether it is changed to the new version.

instantlinux commented 3 years ago

Here is the info requested by @pacoxu as I reproduce the scenario a 5th time:

Contents of /etc/kubernetes/manifests/apiserver.yaml prior to upgrade:

kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 192.168.2.72:6443
  creationTimestamp: null
  labels:
    component: kube-apiserver
    tier: control-plane
  name: kube-apiserver
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-apiserver
    - --advertise-address=192.168.2.72
    - --allow-privileged=true
    - --authorization-mode=Node,RBAC
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --enable-admission-plugins=NodeRestriction
    - --enable-bootstrap-token-auth=true
    - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
    - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
    - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
    - --etcd-servers=https://127.0.0.1:2379
    - --encryption-provider-config=/etc/kubernetes/pki/secrets.conf
    - --insecure-port=0
    - --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
    - --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    - --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
    - --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
    - --requestheader-allowed-names=front-proxy-client
    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
    - --requestheader-extra-headers-prefix=X-Remote-Extra-
    - --requestheader-group-headers=X-Remote-Group
    - --requestheader-username-headers=X-Remote-User
    - --secure-port=6443
    - --service-account-key-file=/etc/kubernetes/pki/sa.pub
    - --service-cluster-ip-range=10.96.0.0/12
    - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
    - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
    image: k8s.gcr.io/kube-apiserver:v1.19.4
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 192.168.2.72
        path: /livez
        port: 6443
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: kube-apiserver
    readinessProbe:
      failureThreshold: 3
      httpGet:
        host: 192.168.2.72
        path: /readyz
        port: 6443
        scheme: HTTPS
      periodSeconds: 1
      timeoutSeconds: 15
    resources:
      requests:
        cpu: 250m
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 192.168.2.72
        path: /livez
        port: 6443
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /etc/ssl/certs
      name: ca-certs
      readOnly: true
    - mountPath: /etc/ca-certificates
      name: etc-ca-certificates
      readOnly: true
    - mountPath: /etc/pki
      name: etc-pki
      readOnly: true
    - mountPath: /etc/kubernetes/pki
      name: k8s-certs
      readOnly: true
    - mountPath: /usr/local/share/ca-certificates
      name: usr-local-share-ca-certificates
      readOnly: true
    - mountPath: /usr/share/ca-certificates
      name: usr-share-ca-certificates
      readOnly: true
  hostNetwork: true
  priorityClassName: system-node-critical
  volumes:
  - hostPath:
      path: /etc/ssl/certs
      type: DirectoryOrCreate
    name: ca-certs
  - hostPath:
      path: /etc/ca-certificates
      type: DirectoryOrCreate
    name: etc-ca-certificates
  - hostPath:
      path: /etc/pki
      type: DirectoryOrCreate
    name: etc-pki
  - hostPath:
      path: /etc/kubernetes/pki
      type: DirectoryOrCreate
    name: k8s-certs
  - hostPath:
      path: /usr/local/share/ca-certificates
      type: DirectoryOrCreate
    name: usr-local-share-ca-certificates
  - hostPath:
      path: /usr/share/ca-certificates
      type: DirectoryOrCreate
    name: usr-share-ca-certificates
status: {}

During the upgrade attempt, I see these diffs in the file:

26d25
<     - --encryption-provider-config=/etc/kubernetes/pki/secrets.conf
38a38
>     - --service-account-issuer=https://kubernetes.default.svc.cluster.local
39a40
>     - --service-account-signing-key-file=/etc/kubernetes/pki/sa.key
43c44
<     image: k8s.gcr.io/kube-apiserver:v1.19.4
---
>     image: k8s.gcr.io/kube-apiserver:v1.20.6

I cleared syslog beforehand, and am attaching the full kubelet syslog here (it reports timeouts, which I think are caused by the auth failure noted in my original report, but perhaps something in there will shed light on what's happening in my environment). The latter part shows the rollback to 1.19.4:

syslog.log

Thanks!

pacoxu commented 3 years ago

/sig node Since kubelet tried to delete the pod in 19:10:27 and 19:15:43, static pod updating hung. (I think. Correct me if I am wrong.)

May 9 19:10:27 borg0 kubelet[15813]: I0509 19:10:27.121965 15813 kubelet.go:1559] Trying to delete pod kube-apiserver-borg0.ci.net_kube-system e55fb50d-6206-423b-8428-6f8ccbe99771 May 9 19:10:27 borg0 kubelet[15813]: W0509 19:10:27.146356 15813 kubelet.go:1563] Deleted mirror pod "kube-apiserver-borg0.ci.net_kube-system(e55fb50d-6206-423b-8428-6f8ccbe99771)" because it is outdated

133: May 9 19:10:29 borg0 kubelet[15813]: E0509 19:10:29.691463 15813 event.go:273] Unable to write event: 'Patch "https://192.168.2.72:6443/api/v1/namespaces/kube-system/events/kube-apiserver-borg0.ci.net.167b960bc37b722c": dial tcp 192.168.2.72:6443: connect: connection refused' (may retry after sleeping) 152: May 9 19:10:32 borg0 kubelet[15813]: W0509 19:10:32.762824 15813 status_manager.go:550] Failed to get status for pod "kube-apiserver-borg0.ci.net_kube-system(1348e04d121b128a9ca5e64dd5ddc5fd)": Get "https://192.168.2.72:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-borg0.ci.net": dial tcp 192.168.2.72:6443: connect: connection refused 420: May 9 19:15:43 borg0 kubelet[15813]: W0509 19:15:43.110936 15813 status_manager.go:550] Failed to get status for pod "kube-apiserver-borg0.ci.net_kube-system(1348e04d121b128a9ca5e64dd5ddc5fd)": an error on the server ("") has prevented the request from succeeding (get pods kube-apiserver-borg0.ci.net) 426: May 9 19:15:44 borg0 kubelet[15813]: I0509 19:15:44.641980 15813 kubelet.go:1559] Trying to delete pod kube-apiserver-borg0.ci.net_kube-system 732f8223-55c4-46a7-8671-c3e4e842b7c7 441: May 9 19:15:44 borg0 kubelet[15813]: I0509 19:15:44.797132 15813 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "etc-ca-certificates" (UniqueName: "kubernetes.io/host-path/9160e7ddf2ec811c44ee54195ce49d0d-etc-ca-certificates") pod "kube-apiserver-borg0.ci.net" (UID: "9160e7ddf2ec811c44ee54195ce49d0d")

neolit123 commented 3 years ago

@instantlinux

In the hopes of being able to use kubeadm upgrade as a routine low-risk update process, just 5 months ago I built a new cluster and laboriously spent a week getting dozens of services running on this 1.19.4 instance. As far as I know, steps to reproduce are: install 1.19.4, run normal workloads, then invoke kubeadm upgrade apply. It's a vanilla cluster with single master and three workers.

we are not seeing this problem in our CI. are you still seeing this consistently?

Rollback seems to be OK (it takes 3-5 minutes to bring things back to steady-state)--I've gone through this 4 times and haven't gotten into a non-working state. Note that this is not a zero-downtime process, a number of production-facing impacts do occur during the attempt to roll forward & back.

then there is no kubeadm problem per se. the pod-restart is the responsibility of the kubelet.

Contents of /etc/kubernetes/manifests/apiserver.yaml prior to upgrade:

questions:

i'm not seeing any customizations from the kubeadm vanilla config for the apiserver, is that true?
are you configuring the kubelet, what are the contents of your kubeadm configuration file?
anything else that is specific / outstanding about the node setup?

neolit123 commented 3 years ago

we might have to open an issue and report the kubelet problems in kubernetes/kubernetes.

neolit123 commented 3 years ago

During the upgrade attempt, I see these diffs in the file: 26d25

--encryption-provider-config=/etc/kubernetes/pki/secrets.conf ...

these are changes part of https://github.com/kubernetes/kubernetes/commit/ff641f6eb229e9d48a439bd98bcb057403838951 AFAIK the token request (+projection) functionality is also e2e / upgrade tested. cc @zshihang

instantlinux commented 3 years ago

Here's my /etc/kubernetes/kubeadm-config.yaml:

---
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
bootstrapTokens:
- token: 3dk<redact>wvzt
  ttl: 1h0m0s
  usages:
  - signing
  - authentication
  groups:
  - system:bootstrappers:kubeadm:default-node-token
localAPIEndpoint:
  advertiseAddress: 192.168.2.72
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: borg0.ci.net
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
# TODO cluster will not bootstrap, has to be added after
# apiServerExtraArgs:
#  enable-admission-plugins: PodSecurityPolicy
apiServer:
  certSans:
  - 192.168.2.72
certificatesDir: /etc/kubernetes/pki
controllerManager:
  extraArgs:
    address: 0.0.0.0
kubernetesVersion: v1.19.4
networking:
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/12
scheduler:
  extraArgs:
    address: 0.0.0.0

And kubelet.conf (nothing of interest):

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CR<redact>Cg==
    server: https://192.168.2.72:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: system:node:borg0.ci.net
  name: system:node:borg0.ci.net@kubernetes
current-context: system:node:borg0.ci.net@kubernetes
kind: Config
preferences: {}
users:
- name: system:node:borg0.ci.net
  user:
    client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
    client-key: /var/lib/kubelet/pki/kubelet-client-current.pem

Output of kubectl get nodes:

$ kubectl get nodes
NAME               STATUS   ROLES    AGE    VERSION
aconcagua.ci.net   Ready    <none>   164d   v1.19.4
borg0.ci.net       Ready    master   171d   v1.19.4
elbrus.ci.net      Ready    <none>   65d    v1.19.4
mckinley.ci.net    Ready    <none>   164d   v1.19.4
montblanc.ci.net   Ready    <none>   164d   v1.19.4

Output from describe node for one of the workers:

$ kubectl describe node elbrus.ci.net
Name:               elbrus.ci.net
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=elbrus.ci.net
                    kubernetes.io/os=linux
                    service.data-sync=allow
                    service.dovecot=allow
                    service.gitlab=allow
                    service.gitlab-runner=allow
                    service.haproxy-keepalived=allow
                    service.mt-daapd=allow
                    service.mysqldump=allow
                    service.nexus=allow
                    service.splunk=allow
Annotations:        flannel.alpha.coreos.com/backend-data: null
                    flannel.alpha.coreos.com/backend-type: host-gw
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.2.86
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 05 Mar 2021 20:40:19 -0800
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  elbrus.ci.net
  AcquireTime:     <unset>
  RenewTime:       Mon, 10 May 2021 09:31:03 -0700
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Thu, 06 May 2021 07:12:34 -0700   Thu, 06 May 2021 07:12:34 -0700   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Mon, 10 May 2021 09:31:00 -0700   Sun, 09 May 2021 12:16:01 -0700   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Mon, 10 May 2021 09:31:00 -0700   Sun, 09 May 2021 12:16:01 -0700   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Mon, 10 May 2021 09:31:00 -0700   Sun, 09 May 2021 12:16:01 -0700   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Mon, 10 May 2021 09:31:00 -0700   Sun, 09 May 2021 12:16:01 -0700   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  192.168.2.86
  Hostname:    elbrus.ci.net
Capacity:
  cpu:                4
  ephemeral-storage:  489051Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32694092Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  450709401
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32591692Ki
  pods:               110
System Info:
  Machine ID:                 ff452fa4f3004d4ea90b134849baca60
  System UUID:                03d502e0-045e-05c9-0506-3e0700080009
  Boot ID:                    50017a43-1c1a-46ad-9b90-ffc14b6127d9
  Kernel Version:             5.4.0-72-generic
  OS Image:                   Ubuntu 20.04.2 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.14
  Kubelet Version:            v1.19.4
  Kube-Proxy Version:         v1.19.4
PodCIDR:                      10.244.1.0/24
PodCIDRs:                     10.244.1.0/24
Non-terminated Pods:          (31 in total)
  Namespace                   Name                                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                   ------------  ----------  ---------------  -------------  ---
  conclave-dev                conclave-prometheus-0                  50m (1%)      500m (12%)  64Mi (0%)        256Mi (0%)     7d13h
  conclave-dev                conclave-redis-0                       50m (1%)      500m (12%)  64Mi (0%)        256Mi (0%)     7d13h
   ...
  instantlinux                splunk-0                               200m (5%)     500m (12%)  384Mi (1%)       4Gi (12%)      7d18h
  kube-system                 kube-flannel-ds-rxx2v                  100m (2%)     100m (2%)   50Mi (0%)        50Mi (0%)      65d
  kube-system                 kube-proxy-qc4vv                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         65d
  kube-system                 logspout-l8smr                         50m (1%)      500m (12%)  32Mi (0%)        64Mi (0%)      65d
  pgo                         dbgitlab-7c5f467f85-kskjs              50m (1%)      500m (12%)  128Mi (0%)       256Mi (0%)     47d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                2500m (62%)   18100m (452%)
  memory             7190Mi (22%)  39666Mi (124%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)
Events:              <none>

neolit123 commented 3 years ago

i don't see anything suspicious.

here is our e2e test job where the vanilla 1.19 -> 1.20 kubeadm upgrade (using "kubeadm upgrade apply/node") passes: https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-upgrade-1-19-1-20

neolit123 commented 3 years ago

# TODO cluster will not bootstrap, has to be added after
# apiServerExtraArgs:
#  enable-admission-plugins: PodSecurityPolicy

is PSP enabled before upgrade?

EDIT: looks like not. above you showed:

--enable-admission-plugins=NodeRestriction

instantlinux commented 3 years ago

I think it is:

$ kubectl get psp --context=sudo
NAME                       PRIV    CAPS                SELINUX    RUNASUSER   FSGROUP     SUPGROUP   READONLYROOTFS   VOLUMES
approved                   false   NET_ADMIN           RunAsAny   RunAsAny    RunAsAny    RunAsAny   false            configMap,downwardAPI,emptyDir,hostPath,persistentVolumeClaim,projected,secret
default                    false                       RunAsAny   RunAsAny    RunAsAny    RunAsAny   false            configMap,downwardAPI,emptyDir,hostPath,persistentVolumeClaim,projected,secret
dockersock                 false                       RunAsAny   RunAsAny    MustRunAs   RunAsAny   false            configMap,downwardAPI,emptyDir,hostPath,persistentVolumeClaim,projected,secret
privileged                 true    *                   RunAsAny   RunAsAny    RunAsAny    RunAsAny   false            *
psp.flannel.unprivileged   false   NET_ADMIN,NET_RAW   RunAsAny   RunAsAny    RunAsAny    RunAsAny   false            configMap,secret,emptyDir,hostPath

But I'm not sure what command to use to fully answer your question.

neolit123 commented 3 years ago

If the apiserver manifest file does not have the psp admission plugin, then it is not enabled. If you do not wish to use psp, you can try to remove all psp objects and bindings and try upgrading again.

It is not clear to me why the new service account flags are tripping the upgrade and whether psp is related.

instantlinux commented 3 years ago

It looks like PodSecurityPolicy is a setting that I experimented with a couple years ago and forgot all about. I have this commented-out note in the ClusterConfiguration of my kubeadm startup script:

      kind: ClusterConfiguration
      # TODO cluster will not bootstrap, has to be added after
      # apiServerExtraArgs:
      #  enable-admission-plugins: PodSecurityPolicy

Anything else here deserving of the priority/awaiting-more-evidence label, or have I given you everything you need?

neolit123 commented 3 years ago

We have not seen reports by others about this and our CI is green. This must be something specific to your setup. Something related to bad tokens could be tripping the new version pods.

I do not see a problem on the kubeadm side so I suggest that you log a kubernetes/kubernetes issue titled with one of the apiserver errors.

You should include all the detail from here including before and after /etc/kubernetes/manifests dump.

instantlinux commented 3 years ago

That's unfortunate. Getting attention in kubernetes/kubernetes for issues like this is almost impossible; they get so many issues filed. Thanks for taking my report seriously and taking a look. I assume that I will have to reinstall yet again from scratch and spend a week migrating workload next time I upgrade, so it will be a year before I try again.

neolit123 commented 3 years ago

If you tag it with "/sig auth" it should be visible in the next issue triage from the *token owning group.

It does seem like something went wrong in that cluster, but there isn't much we can do on the kubeadm side even if we know more details. Our manifest for the upgraded version is valid.

/close

k8s-ci-robot commented 3 years ago

@neolit123: Closing this issue.

In response to [this](https://github.com/kubernetes/kubeadm/issues/2468#issuecomment-843108912): >If you tag it with "/sig auth" it should be visible in the next issue >triage from the *token owning group. > >It does seem like something went wrong in that cluster, but there isn't >much we can do on the kubeadm side even if we know more details. Our >manifest for the upgraded version is valid. > >/close > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes / kubeadm