Closed karty-s closed 1 month ago
This issue is currently awaiting triage.
If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
This:
I0516 08:13:24.811572 1 node_lifecycle_controller.go:164] deleting node since it is no longer present in cloud provider: ip-10-230-13-35.ec2.internal
Isn't an error, it's expected behavior when a Node
becomes NotReady
and the corresponding EC2 instance is terminated (or doesn't exist). Are you sure the EC2 instances for your old 1.26 control plane nodes have been terminated? They wouldn't have a Ready
status if the kubelet
stopped heartbeating.
@cartermckinnon We have followed below steps on existing 1.26 cluster to make it ready for 1.27 upgrade
On existing version 1.26
Add tag to each node [kubernetes.io/cluster/cluster-name: owned
k edit cm kubeadm-config -n kube-system to update cloud-provider=external
Update existing master kube-controller and kube-apiserver manifest to use cloud-provider=external
Made aws-controller-manager running
Now when upgrading cluster to 1.27, below are the issues which we are facing:-
Are you passing --cloud-provider=external
to kubelet
as well?
CCM should fill in the provider ID if it's missing, but it's generally preferable to just pass it to kubelet
to avoid extra API calls in CCM. The EKS AMI uses this helper script to set it: https://github.com/awslabs/amazon-eks-ami/blob/f5111dd100ebd94d9fbfbb1fe2f43b75fd1a6703/templates/al2/runtime/bin/provider-id
@cartermckinnon Let me share you 10-kubeadm-conf and kubeadm-config which we currently have in 1.26 where in tree support is there :-
10-kubeam-conf
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
Environment="KUBELET_EXTRA_ARGS=--cloud-provider=aws --node-labels=node.kubernetes.io/role=${kind},instance-group=${group_name},${extra_labels} --register-with-taints=${taints} --cert-dir=/etc/kubernetes/pki --cgroup-driver=systemd"
# Environment="KUBELET_KUBEADM_ARGS=--feature-gates=RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true --rotate-certificates"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/opt/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
kubeadm-config
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
apiServer:
certSANs:
- "api-int.${cluster_fqdn}"
- "api.${cluster_fqdn}"
extraArgs:
anonymous-auth: "true"
audit-log-maxage: "7"
audit-log-maxbackup: "50"
audit-log-maxsize: "100"
audit-log-path: /var/log/kube-apiserver-audit.log
audit-policy-file: /etc/kubernetes/files/audit-log-policy.yaml
authorization-mode: Node,RBAC
cloud-provider: aws
max-mutating-requests-inflight: "400"
max-requests-inflight: "800"
oidc-client-id: "${dex_oidc_client_id}"
oidc-groups-claim: "${dex_oidc_groups_claim}"
oidc-issuer-url: "${dex_oidc_issuer_url}"
oidc-username-claim: "${dex_oidc_username_claim}"
profiling: "false"
request-timeout: 30m0s
service-account-lookup: "true"
tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256
extraVolumes:
- hostPath: /etc/kubernetes/files
mountPath: /etc/kubernetes/files
name: cloud-config
readOnly: true
- hostPath: /var/log
mountPath: /var/log
name: var-log
readOnly: false
timeoutForControlPlane: 10m0s
certificatesDir: /etc/kubernetes/pki
clusterName: "${cluster_fqdn}"
controlPlaneEndpoint: "${api_endpoint}:${api_port}"
controllerManager:
extraArgs:
cluster-signing-cert-file: /etc/kubernetes/pki/ca.crt
cluster-signing-key-file: /etc/kubernetes/pki/ca.key
feature-gates: RotateKubeletServerCertificate=true
profiling: "false"
terminated-pod-gc-threshold: "12500"
configure-cloud-routes: "false"
cluster-name: "${cluster_fqdn}"
attach-detach-reconcile-sync-period: "1m0s"
cloud-provider: "aws"
{{- if contains "1.15" .Kubernetes.Version | not }}
flex-volume-plugin-dir: "/var/lib/kubelet/volumeplugins/"
{{- end }}
dns:
type: CoreDNS
etcd:
${etcd_type}:
endpoints:
${endpoints}
caFile: ${etcd_cafile}
certFile: "/etc/kubernetes/pki/apiserver-etcd-client.crt"
keyFile: "/etc/kubernetes/pki/apiserver-etcd-client.key"
imageRepository: registry.k8s.io
kubernetesVersion: "${k8s_version}"
networking:
dnsDomain: cluster.local
podSubnet: "${pod_subnet}"
serviceSubnet: "100.64.0.0/13"
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
bootstrapTokens:
- token: "${kubeadm_token}"
description: "kubeadm bootstrap token"
ttl: "43800h"
nodeRegistration:
criSocket: "unix:///var/run/containerd/containerd.sock"
kubeletExtraArgs:
container-runtime: remote
container-runtime-endpoint: unix:///run/containerd/containerd.sock
ignorePreflightErrors:
- IsPrivilegedUser
localAPIEndpoint:
bindPort: 443
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd
Now we are planning to opt out off tree aws cloud controller manager, Could you please guide us what changes we need to make to migrate from in-tree to out-tree . Currently we have deployed aws-cloud-controllermanager daemonset and those are running. But kube-controller-manager also running with above configurations.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
What happened: We are running k8s cluster of version 1.26 using kubeadm with resources from aws. We wanted to upgrade our clusters to 1.28 (1.26->1.27->1.28) as per update notes we tried to move from in-tree aws cloud provider to external aws cloud provider. As per the upgrade process we deployed the new 1.27 nodes along with aws cloud controller manager in the cluster, post which we scaled down the 1.26 nodes.
What you expected to happen: The issue we face is that the etcd and worker nodes of 1.26 version which is scaled down gets removed from the cluster, but the control plane nodes still shows up in the cluster even after its ec2 instance is removed. eg -
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: we are seeing this error in the cloud controller manager pod logs -
we have set the hostname according to the pre req but still we get this
Environment: kubeadm
kubectl version
):Cloud provider or hardware configuration: aws
OS (e.g. from /etc/os-release):
Kernel (e.g.
uname -a
):Install tools:
Others:
/kind bug