Closed pacoxu closed 1 year ago
We can verify and cherry-pick patches to 1.28.
The latest diff result(https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-upgrade-1-28-latest/1701751770478284800/build-log.txt) shows that the default values are no longer injected, this is what we expect.
I0913 00:25:24.641339 3894 staticpods.go:225] Pod manifest files diff:
@@ -46 +45,0 @@
- successThreshold: 1
@@ -62 +60,0 @@
- successThreshold: 1
@@ -64,2 +61,0 @@
- terminationMessagePath: /dev/termination-log
- terminationMessagePolicy: File
@@ -71,2 +66,0 @@
- dnsPolicy: ClusterFirst
- enableServiceLinks: true
@@ -76,2 +69,0 @@
- restartPolicy: Always
- schedulerName: default-scheduler
@@ -81 +72,0 @@
- terminationGracePeriodSeconds: 30
Now we should fix the v1.28 version, and then the CI will be green. Waiting https://github.com/kubernetes/kubernetes/pull/120605 to be merged.
I think after we remove the reference of k8s.io/kubernetes/pkg/apis/core/v1
from the v1.28 branch, everything will be back to normal.
Because the Pod defaulter is registered into Scheme by:
https://github.com/kubernetes/kubernetes/blob/160fe010f32fd1896917fecad680769ad0e40ca0/pkg/apis/core/v1/register.go#L29-L34
func init() {
// We only register manually written functions here. The registration of the
// generated functions takes place in the generated files. The separation
// makes the code compile even when the generated files are missing.
localSchemeBuilder.Register(addDefaultingFuncs, addConversionFuncs)
}
That's why the default values are injected...
https://github.com/kubernetes/kubernetes/pull/120605 is open. i will ping some folks on slack to try to merge it faster.
EDIT: https://kubernetes.slack.com/archives/CJH2GBF7Y/p1694575171227399
but if currently, the 1.28 kubeadm binary is producing a different manifest during upgrade due to the internal defaulters, that would mean the 1.27->1.28 upgrade must be failing as well? instead, it's currently green.
edit: etcd version is the same: https://github.com/kubernetes/kubernetes/blob/d8e9fb8b7f244536325100e332faefbae01cfd7b/cmd/kubeadm/app/constants/constants.go#L462
edit2: but an etcd upgrade is performed/ successful:
[upgrade/staticpods] Component "etcd" upgraded successfully!
some testing from me confirms this is really go version related.
defaults will always generated with golang 1.20, so that whatever the k8s.io/kubernetes/pkg/apis/core/v1
is imported or not, diff will always empty as both of them are generated with defaults.
defaults will not be generated by golang 1.21 by default, and I suspect kind
update the golang recently, so this issue is hit.
shows that the default values are no longer injected, this is what we expect.
some testing in kinder shows the new manifest will not have the defaults generated, but old manifest for each of the pod has the defaults created.
but if currently, the 1.28 kubeadm binary is producing a different manifest during upgrade due to the internal defaulters, that would mean the 1.27->1.28 upgrade must be failing as well? instead, it's currently green.
just tested with kinder this workflow and there is a diff:
--- etcd.27.yaml 2023-09-13 18:03:01.839883253 +0300
+++ etcd.28.yaml 2023-09-13 17:55:56.203458105 +0300
@@ -32,7 +32,7 @@
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- image: registry.k8s.io/etcd:3.5.7-0
+ image: registry.k8s.io/etcd:3.5.9-0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
at the time of testing kubeadm 1.27 still has the 3.5.7 etcd: https://github.com/kubernetes/kubernetes/blob/release-1.27/cmd/kubeadm/app/constants/constants.go#L486
there is a pending backport for 3.5.9: https://github.com/kubernetes/kubernetes/pull/118079
but both the .27 and .28 etcd manifests do not have the defaults!
~/go/src/k8s.io/kubeadm/kinder$ cat etcd.28.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.17.0.2:2379
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://172.17.0.2:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --experimental-initial-corrupt-check=true
- --experimental-watch-progress-notify-interval=5s
- --initial-advertise-peer-urls=https://172.17.0.2:2380
- --initial-cluster=kinder-upgrade-control-plane-1=https://172.17.0.2:2380
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379,https://172.17.0.2:2379
- --listen-metrics-urls=http://127.0.0.1:2381
- --listen-peer-urls=https://172.17.0.2:2380
- --name=kinder-upgrade-control-plane-1
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: registry.k8s.io/etcd:3.5.9-0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /health?exclude=NOSPACE&serializable=true
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: etcd
resources:
requests:
cpu: 100m
memory: 100Mi
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /health?serializable=false
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priority: 2000001000
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
status: {}
unclear to me if https://github.com/kubernetes/kubernetes/pull/120605 will fix anything.
some testing from me confirms this is really go version related. defaults will always generated with golang 1.20, so that whatever the
k8s.io/kubernetes/pkg/apis/core/v1
is imported or not, diff will always empty as both of them are generated with defaults. defaults will not be generated by golang 1.21 by default, and I suspectkind
update the golang recently, so this issue is hit.shows that the default values are no longer injected, this is what we expect.
some testing in kinder shows the new manifest will not have the defaults generated, but old manifest for each of the pod has the defaults created.
i saw go version diff as well at some point. go 1.21 was OK, go 1.20 generated defaults for the TestFunc in this ticket.
but kubeadm at the 1.28 branch is still built with 1.20:
$ docker exec kinder-regular-control-plane-1 kinder/upgrade/v1.28.1-59+d8e9fb8b7f2445/kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"28+", GitVersion:"v1.28.1-59+d8e9fb8b7f2445", GitCommit:"d8e9fb8b7f244536325100e332faefbae01cfd7b", GitTreeState:"clean", BuildDate:"2023-09-08T19:46:29Z", GoVersion:"go1.20.8", Compiler:"gc", Platform:"linux/amd64"}
and it's not generating defaults as i showed above. so somehow i think the bugs are still at k/k master for: https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-upgrade-1-28-latest (i.e. latest)
https://github.com/kubernetes/release/issues/3076 is in progress.
The default values will only be injected when kubeadm init
is performed.
This is why the v1.27->1.28
upgrade works but v1.28 -> latest
upgrade fails.
The default values are injected only when
kubeadm init
is performed. This is why thev1.27->1.28
upgrade works butv1.28 -> latest
upgrade fails.
hmm, how so? aren't both init and upgrade using the same "create manifest" logic?
The default values will be injected when v1.28
kubeadm init
is performed.
in Kinder? IIRC, the default will be not injected when I init my cluster on Centos directly.
in Kinder? IIRC, the default will be not injected when I init my cluster on Centos directly.
You can try the old v1.28 version:
wget https://storage.googleapis.com/k8s-release-dev/ci/v1.28.2-1+a68748c7cd04f2/bin/linux/amd64/kubeadm
chmod +x kubeadm
kubeadm init ...
BTW, https://github.com/kubernetes/kubernetes/pull/120605 is merged.
/close
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-upgrade-1-28-latest/1702296617098416128 failed for a recent flake
@pacoxu: Closing this issue.
i don't see defaults after @chendave 's cherry pick merged: https://github.com/kubernetes/kubernetes/commit/728862ded5e9d8fc3db1555499a66c5569ad8db6
it's 728862ded5e9d8
, found in the kinder output below:
You can try the old v1.28 version:
but i did not see defaults with the old 1.28 binary too:
but i did not see defaults with the old 1.28 binary too:
@neolit123 @pacoxu You can try to reproduce it by:
docker pull kindest/base:v20221102-76f15095
kinder build node-image-variant --base-image=kindest/base:v20221102-76f15095 --image=kindest/node:test --with-init-artifacts=v1.28.2-1+a68748c7cd04f2 --with-upgrade-artifacts=v1.29.0-alpha.0.802+a68093a3ffb552 --loglevel=debug
kinder create cluster --name=kinder-upgrade --image=kindest/node:test --control-plane-nodes=1 --worker-nodes=1 --loglevel=debug
kinder do kubeadm-init --name=kinder-upgrade --copy-certs=auto --loglevel=debug --kubeadm-verbosity=6
echo "---------------------------------------------"
echo "----Old v1.28 kubeadm generated etcd.yaml----"
echo "---------------------------------------------"
docker exec kinder-upgrade-control-plane-1 cat /etc/kubernetes/manifests/etcd.yaml
kinder delete cluster --name=kinder-upgrade
kinder build node-image-variant --base-image=kindest/base:v20221102-76f15095 --image=kindest/node:test --with-init-artifacts=v1.28.2-7+728862ded5e9d8 --with-upgrade-artifacts=v1.29.0-alpha.0.806+4abf29c5c86349 --loglevel=debug
kinder create cluster --name=kinder-upgrade --image=kindest/node:test --control-plane-nodes=1 --worker-nodes=1 --loglevel=debug
kinder do kubeadm-init --name=kinder-upgrade --copy-certs=auto --loglevel=debug --kubeadm-verbosity=6
echo "---------------------------------------------"
echo "----New v1.28 kubeadm generated etcd.yaml----"
echo "---------------------------------------------"
docker exec kinder-upgrade-control-plane-1 cat /etc/kubernetes/manifests/etcd.yaml
kinder delete cluster --name=kinder-upgrade
The output is as follows:
...
---------------------------------------------
----Old v1.28 kubeadm generated etcd.yaml----
---------------------------------------------
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.17.0.3:2379/
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://172.17.0.3:2379/
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --experimental-initial-corrupt-check=true
- --experimental-watch-progress-notify-interval=5s
- --initial-advertise-peer-urls=https://172.17.0.3:2380/
- --initial-cluster=kinder-upgrade-control-plane-1=https://172.17.0.3:2380/
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379/,https://172.17.0.3:2379/
- --listen-metrics-urls=http://127.0.0.1:2381/
- --listen-peer-urls=https://172.17.0.3:2380/
- --name=kinder-upgrade-control-plane-1
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: registry.k8s.io/etcd:3.5.9-0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /health?exclude=NOSPACE&serializable=true
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 15
name: etcd
resources:
requests:
cpu: 100m
memory: 100Mi
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /health?serializable=false
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 15
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostNetwork: true
priority: 2000001000
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
seccompProfile:
type: RuntimeDefault
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
status: {}
...
---------------------------------------------
----New v1.28 kubeadm generated etcd.yaml----
---------------------------------------------
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.17.0.4:2379/
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://172.17.0.4:2379/
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --experimental-initial-corrupt-check=true
- --experimental-watch-progress-notify-interval=5s
- --initial-advertise-peer-urls=https://172.17.0.4:2380/
- --initial-cluster=kinder-upgrade-control-plane-1=https://172.17.0.4:2380/
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379/,https://172.17.0.4:2379/
- --listen-metrics-urls=http://127.0.0.1:2381/
- --listen-peer-urls=https://172.17.0.4:2380/
- --name=kinder-upgrade-control-plane-1
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: registry.k8s.io/etcd:3.5.9-0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /health?exclude=NOSPACE&serializable=true
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: etcd
resources:
requests:
cpu: 100m
memory: 100Mi
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /health?serializable=false
port: 2381
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priority: 2000001000
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
status: {}
Therefore, if a user initializes the cluster with old v1.28 kubeadm (<1.28.2), they may encounter problems when upgrading to v1.29. However, if we can bump the etcd version number in v1.29, this will not be a problem.
strange, in my test here: https://github.com/kubernetes/kubeadm/issues/2927#issuecomment-1719430589
i runed against the older v1.28.2-1+a68748c7cd04f2 and did not get defaults.
--with-init-artifacts=v1.28.2-1+a68748c7cd04f2
you are running the same "old" version but getting a different etcd.yaml.
@neolit123 Emm... I don't really understand.
./kubeadm init phase etcd local
will NOT get defaults.
But ./kubeadm init --ignore-preflight-errors=Swap,SystemVerification,FileContent--proc-sys-net-bridge-bridge-nf-call-iptables --config=/kind/kubeadm.conf --v=6 --upload-certs
will get defaults.
kubeadm version:
# ./kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"28+", GitVersion:"v1.28.2-1+a68748c7cd04f2", GitCommit:"a68748c7cd04f2462352afb05ba31f06fc799595", GitTreeState:"clean", BuildDate:"2023-09-13T09:54:55Z", GoVersion:"go1.20.8", Compiler:"gc", Platform:"linux/amd64"}
root@kinder-upgrade-control-plane-1:/#
Perhaps related to the execution path?
no idea. init is technically calling the same phase.
https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-upgrade-1-28-latest https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-upgrade-addons-before-controlplane-1-28-latest
keeps failing after https://github.com/kubernetes/release/pull/3254.
/assign
ref https://github.com/kubernetes/kubeadm/issues/2925
See https://github.com/kubernetes/kubeadm/issues/2927#issuecomment-1713870411 for the conclusion.
Option1: if we have to revert(importthe kubernetes/pkg import is not allowed in kubeadm!
)revert https://github.com/kubernetes/kubernetes/pull/120554 in v1.29 (master)cherry-pick to v1.28 if needed