Open maxime202400 opened 2 months ago
/kind support
Hi, I am facing this exact problem.
kops
version:
Client version: 1.29.2 (git-v1.29.2)
k8s
version:
1.24.16
The error that I am seeing in /var/log/syslog
of the master node is this:
Aug 8 19:26:20 ip-a-b-c-d kubelet[3103]: I0808 19:26:20.157295 3103 csi_plugin.go:1021] Failed to contact API server when waiting for CSINode publishing: Get "https://127.0.0.1/apis/storage.k8s.io/v1/csinodes/i-xxxx": dial tcp 127.0.0.1:443: connect: connection refused
Aug 8 19:26:20 ip-a-b-c-d kubelet[3103]: E0808 19:26:20.220919 3103 kubelet.go:2427] "Error getting node" err="node \"i-xxxx\" not found"
Aug 8 19:26:20 ip-a-b-c-d kubelet[3103]: I0808 19:26:20.228123 3103 kubelet_node_status.go:352] "Setting node annotation to enable volume controller attach/detach"
Aug 8 19:26:20 ip-a-b-c-d kubelet[3103]: I0808 19:26:20.234155 3103 kubelet_node_status.go:563] "Recording event message for node" node="i-xxxx" event="NodeHasSufficientMemory"
Aug 8 19:26:20 ip-a-b-c-d kubelet[3103]: I0808 19:26:20.234212 3103 kubelet_node_status.go:563] "Recording event message for node" node="i-xxxx" event="NodeHasNoDiskPressure"
Aug 8 19:26:20 ip-a-b-c-d kubelet[3103]: I0808 19:26:20.234233 3103 kubelet_node_status.go:563] "Recording event message for node" node="i-xxxx" event="NodeHasSufficientPID"
Aug 8 19:26:20 ip-a-b-c-d kubelet[3103]: I0808 19:26:20.234266 3103 kubelet_node_status.go:70] "Attempting to register node" node="i-xxxx"
Aug 8 19:26:20 ip-a-b-c-d kubelet[3103]: E0808 19:26:20.235047 3103 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://127.0.0.1/api/v1/nodes\": dial tcp 127.0.0.1:443: connect: connection refused" node="i-xxxx"
Aug 8 19:26:20 ip-a-b-c-d kubelet[3103]: E0808 19:26:20.321805 3103 kubelet.go:2427] "Error getting node" err="node \"i-xxxx\" not found"
Aug 8 19:26:20 ip-a-b-c-d kubelet[3103]: E0808 19:26:20.422894 3103 kubelet.go:2427] "Error getting node" err="node \"i-xxxx\" not found"
Aug 8 19:26:20 ip-a-b-c-d kubelet[3103]: E0808 19:26:20.523918 3103 kubelet.go:2427] "Error getting node" err="node \"i-xxxx\" not found"
Aug 8 19:26:20 ip-a-b-c-d kubelet[3103]: E0808 19:26:20.624965 3103 kubelet.go:2427] "Error getting node" err="node \"i-xxxx\" not found"
Aug 8 19:26:20 ip-a-b-c-d kubelet[3103]: E0808 19:26:20.725940 3103 kubelet.go:2427] "Error getting node" err="node \"i-xxxx\" not found"
Aug 8 19:26:20 ip-a-b-c-d kubelet[3103]: E0808 19:26:20.826678 3103 kubelet.go:2427] "Error getting node" err="node \"i-xxxx\" not found"
Aug 8 19:26:20 ip-a-b-c-d kubelet[3103]: E0808 19:26:20.927813 3103 kubelet.go:2427] "Error getting node" err="node \"i-xxxx\" not found"
Aug 8 19:26:21 ip-a-b-c-d kubelet[3103]: E0808 19:26:21.028995 3103 kubelet.go:2427] "Error getting node" err="node \"i-xxxx\" not found"
I am using kops
version 1.29.2 because I need to use the wildcard namespace feature for IRSA.
The cluster spec is here:
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
creationTimestamp: null
generation: 1
name: k8s-124.foo.bar.com
spec:
additionalPolicies:
master: |
[
{
"Effect": "Allow",
"Action": ["ec2:ModifyInstanceAttribute"],
"Resource": ["*"]
}
]
api:
loadBalancer:
class: Network
type: Public
authorization:
rbac: {}
certManager:
enabled: true
channel: stable
cloudLabels:
App: k8s-124
Env: foo
Region: eu-west-1
cloudProvider: aws
clusterAutoscaler:
awsUseStaticInstanceList: false
balanceSimilarNodeGroups: false
cpuRequest: 100m
enabled: true
expander: least-waste
memoryRequest: 300Mi
newPodScaleUpDelay: 0s
scaleDownDelayAfterAdd: 10m0s
scaleDownUnneededTime: 5m0s
scaleDownUnreadyTime: 10m0s
scaleDownUtilizationThreshold: "0.6"
skipNodesWithLocalStorage: true
skipNodesWithSystemPods: true
configBase: s3://my-bucket/prefix
dnsZone: xxxx
etcdClusters:
- cpuRequest: 200m
etcdMembers:
- encryptedVolume: true
instanceGroup: master-eu-west-1a
name: a
memoryRequest: 100Mi
name: main
- cpuRequest: 100m
etcdMembers:
- encryptedVolume: true
instanceGroup: master-eu-west-1a
name: a
memoryRequest: 100Mi
name: events
externalPolicies:
master:
- arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess
node:
- arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess
fileAssets:
- content: |
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
name: audit-policy-config
path: /srv/kubernetes/kube-apiserver/audit/policy-config.yaml
roles:
- Master
- content: |
apiVersion: v1
kind: Config
clusters:
- name: bar
cluster:
server: https://audit-logs-receiver-endpoint/some-token
contexts:
- context:
cluster: bar
user: ""
name: default-context
current-context: default-context
preferences: {}
users: []
name: audit-webhook-config
path: /var/log/audit/webhook-config.yaml
roles:
- Master
iam:
allowContainerRegistry: true
legacy: false
serviceAccountExternalPermissions:
- aws:
inlinePolicy: |-
[
{
"Effect": "Allow",
"Action": [
"S3:*"
],
"Resource": [
"*"
]
}
]
name: s3perm
namespace: '*'
kubeAPIServer:
auditLogMaxAge: 10
auditLogMaxBackups: 1
auditLogMaxSize: 100
auditLogPath: /var/log/kube-apiserver-audit.log
auditPolicyFile: /srv/kubernetes/kube-apiserver/audit/policy-config.yaml
auditWebhookBatchMaxWait: 5s
auditWebhookConfigFile: /srv/kubernetes/kube-apiserver/audit/webhook-config.yaml
kubeDNS:
provider: CoreDNS
kubelet:
anonymousAuth: false
authenticationTokenWebhook: true
authorizationMode: Webhook
maxPods: 150
shutdownGracePeriod: 1m0s
shutdownGracePeriodCriticalPods: 30s
kubernetesApiAccess:
- 0.0.0.0/0
kubernetesVersion: 1.24.16
masterPublicName: api.k8s-124.foo.bar.com
networkCIDR: 10.8.0.0/16
networkID: vpc-xxxx
networking:
cilium:
hubble:
enabled: true
nonMasqueradeCIDR: 100.64.0.0/10
podIdentityWebhook:
enabled: true
rollingUpdate:
maxSurge: 4
serviceAccountIssuerDiscovery:
discoveryStore: s3://oidc-bucket/k8s-1-24-2
enableAWSOIDCProvider: true
sshAccess:
- 0.0.0.0/0
sshKeyName: kops
subnets:
- cidr: 1.2.3.4/19
id: subnet-xx
name: eu-west-1a
type: Private
zone: eu-west-1a
- cidr: 4.3.2.1/22
id: subnet-yy
name: utility-eu-west-1a
type: Utility
zone: eu-west-1a
topology:
dns:
type: Private
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2024-08-08T19:50:22Z"
labels:
kops.k8s.io/cluster: k8s-124.foo.bar.com
name: master-eu-west-1a
spec:
image: ubuntu/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20240411
instanceMetadata:
httpPutResponseHopLimit: 2
httpTokens: required
machineType: t3a.medium
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-eu-west-1a
role: Master
rootVolumeEncryption: true
rootVolumeSize: 30
subnets:
- eu-west-1a
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2024-08-08T19:50:23Z"
labels:
kops.k8s.io/cluster: k8s-124.foo.bar.com
name: nodes-eu-west-1a
spec:
additionalUserData:
- content: |
apt-get update
apt-get install -y qemu-user-static
name: 0prereqs.sh
type: text/x-shellscript
cloudLabels:
k8s.io/cluster-autoscaler/enabled: ""
k8s.io/cluster-autoscaler/k8s-124.foo.bar.com: ""
image: ubuntu/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20240411
instanceMetadata:
httpPutResponseHopLimit: 2
httpTokens: required
machineType: t3a.medium
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: nodes-eu-west-1a
role: Node
rootVolumeEncryption: true
rootVolumeSize: 200
subnets:
- eu-west-1a
@SohamChakraborty could you check kube-apiserver.log file for hints on the issue?
Hi @hakman I have identified my issue. It was having some sort of problem with audit policy and audit webhook config files.
i recently wanted to upgrade my version from 1.28.10 and during the upgrade some of the node are not joining the cluster
this is the error that iam seeing when i run the kops validate
Validating cluster
INSTANCE GROUPS NAME ROLE MACHINETYPE MIN MAX SUBNETS master-us-east-2a ControlPlane m7a.large 1 1 us-east-2a master-us-east-2b ControlPlane m7a.large 1 1 us-east-2b master-us-east-2c ControlPlane m7a.large 1 1 us-east-2c nodes Node m6a.large 3 18 us-east-2a,us-east-2b,us-east-2c
NODE STATUS NAME ROLE READY node True node True node True
VALIDATION ERRORS KIND NAME MESSAGE Machine machine "" has not yet joined cluster Machine machine "" has not yet joined cluster Machine machine "" has not yet joined cluster
Validation Failed Error: validation failed: cluster not yet healthy
and when i run the kubelet log on the probamatic node Failed to contact API server when waiting for CSINode publishing: Get "https://127.0.0.1/apis/storage.k8s.io/v1/csinodes/i-": dial tcp 127.0.0.1:443: connect: connection refused