kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.95k stars 4.65k forks source link

Can not create cluster (following the getting_started) #15852

Closed wo9999999999 closed 7 months ago

wo9999999999 commented 1 year ago

/kind bug

1. What kops version are you running? The command kops version, will display this information. 1.27.0

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. Client Version: v1.28.1 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3

3. What cloud provider are you using? aws

4. What commands did you run? What is the simplest way to reproduce this issue? kops create cluster

5. What happened after the commands executed? cluster validation keep fail

NAME                ROLE        MACHINETYPE MIN MAX SUBNETS
control-plane-us-east-1a    ControlPlane    t3.medium   1   1   us-east-1a
nodes-us-east-1a        Node        t3.medium   1   1   us-east-1a

NODE STATUS
NAME            ROLE        READY
i-0be13550669d0b732 control-plane   True
i-0c78304642cc52ef8 node        True

VALIDATION ERRORS
KIND    NAME                        MESSAGE
Pod kube-system/ebs-csi-controller-75fc64d98f-4dbzk system-cluster-critical pod "ebs-csi-controller-75fc64d98f-4dbzk" is pending

Validation Failed
W0901 16:32:30.328628   45600 validate_cluster.go:232] (will retry): cluster not yet healthy

6. What did you expect to happen? start cluster successfully

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

kind: Cluster
metadata:
  creationTimestamp: "2023-09-01T08:22:42Z"
  name: woleung.k8s.local
spec:
  api:
    loadBalancer:
      class: Network
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://testwokops5-example-com-state-store/woleung.k8s.local
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-us-east-1a
      name: a
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-us-east-1a
      name: a
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
    useServiceAccountExternalPermissions: true
  kubeProxy:
    enabled: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  - ::/0
  kubernetesVersion: 1.27.5
  networkCIDR: 172.20.0.0/16
  networking:
    cilium:
      enableNodePort: true
  nonMasqueradeCIDR: 100.64.0.0/10
  serviceAccountIssuerDiscovery:
    discoveryStore: s3://testwokops4-example-com-oidc-store/woleung.k8s.local/discovery/woleung.k8s.local
    enableAWSOIDCProvider: true
  sshAccess:
  - 0.0.0.0/0
  - ::/0
  subnets:
  - cidr: 172.20.32.0/19
    name: us-east-1a
    type: Public
    zone: us-east-1a
  topology:
    dns:
      type: Private
    masters: public
    nodes: public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-09-01T08:22:43Z"
  labels:
    kops.k8s.io/cluster: woleung.k8s.local
  name: control-plane-us-east-1a
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230728
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - us-east-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-09-01T08:22:44Z"
  labels:
    kops.k8s.io/cluster: woleung.k8s.local
  name: nodes-us-east-1a
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230728
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  role: Node
  subnets:
  - us-east-1a

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know? describe the pending pod

Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      ebs-csi-controller-sa
Node:                 <none>
Labels:               app=ebs-csi-controller
                      app.kubernetes.io/instance=aws-ebs-csi-driver
                      app.kubernetes.io/name=aws-ebs-csi-driver
                      app.kubernetes.io/version=v1.14.1
                      kops.k8s.io/managed-by=kops
                      pod-template-hash=75fc64d98f
Annotations:          <none>
Status:               Pending
IP:                   
IPs:                  <none>
Controlled By:        ReplicaSet/ebs-csi-controller-75fc64d98f
Containers:
  ebs-plugin:
    Image:       registry.k8s.io/provider-aws/aws-ebs-csi-driver:v1.14.1@sha256:f0c5de192d832e7c1daa6580d4a62e8fa6fc8eabc0917ae4cb7ed4d15e95b59e
    Ports:       9808/TCP, 3301/TCP
    Host Ports:  0/TCP, 0/TCP
    Args:
      controller
      --endpoint=$(CSI_ENDPOINT)
      --logtostderr
      --k8s-tag-cluster-id=woleung.k8s.local
      --extra-tags=KubernetesCluster=woleung.k8s.local
      --http-endpoint=0.0.0.0:3301
      --v=5
    Liveness:   http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
    Readiness:  http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
    Environment:
      CSI_ENDPOINT:                 unix:///var/lib/csi/sockets/pluginproxy/csi.sock
      CSI_NODE_NAME:                 (v1:spec.nodeName)
      AWS_ACCESS_KEY_ID:            <set to the key 'key_id' in secret 'aws-secret'>      Optional: true
      AWS_SECRET_ACCESS_KEY:        <set to the key 'access_key' in secret 'aws-secret'>  Optional: true
      AWS_ROLE_ARN:                 arn:aws:iam::861611878732:role/ebs-csi-controller-sa.kube-system.sa.woleung.k8s.local
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/amazonaws.com/token
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/amazonaws.com/ from token-amazonaws-com (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pz7x2 (ro)
  csi-provisioner:
    Image:      registry.k8s.io/sig-storage/csi-provisioner:v3.1.0@sha256:122bfb8c1edabb3c0edd63f06523e6940d958d19b3957dc7b1d6f81e9f1f6119
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=$(ADDRESS)
      --v=5
      --feature-gates=Topology=true
      --extra-create-metadata
      --leader-election=true
      --default-fstype=ext4
    Environment:
      ADDRESS:                      /var/lib/csi/sockets/pluginproxy/csi.sock
      AWS_ROLE_ARN:                 arn:aws:iam::861611878732:role/ebs-csi-controller-sa.kube-system.sa.woleung.k8s.local
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/amazonaws.com/token
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/amazonaws.com/ from token-amazonaws-com (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pz7x2 (ro)
  csi-attacher:
    Image:      registry.k8s.io/sig-storage/csi-attacher:v3.4.0@sha256:8b9c313c05f54fb04f8d430896f5f5904b6cb157df261501b29adc04d2b2dc7b
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=$(ADDRESS)
      --v=5
      --leader-election=true
    Environment:
      ADDRESS:                      /var/lib/csi/sockets/pluginproxy/csi.sock
      AWS_ROLE_ARN:                 arn:aws:iam::861611878732:role/ebs-csi-controller-sa.kube-system.sa.woleung.k8s.local
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/amazonaws.com/token
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/amazonaws.com/ from token-amazonaws-com (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pz7x2 (ro)
  csi-resizer:
    Image:      registry.k8s.io/sig-storage/csi-resizer:v1.4.0@sha256:9ebbf9f023e7b41ccee3d52afe39a89e3ddacdbb69269d583abfc25847cfd9e4
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=$(ADDRESS)
      --v=5
    Environment:
      ADDRESS:                      /var/lib/csi/sockets/pluginproxy/csi.sock
      AWS_ROLE_ARN:                 arn:aws:iam::861611878732:role/ebs-csi-controller-sa.kube-system.sa.woleung.k8s.local
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/amazonaws.com/token
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/amazonaws.com/ from token-amazonaws-com (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pz7x2 (ro)
  liveness-probe:
    Image:      registry.k8s.io/sig-storage/livenessprobe:v2.6.0@sha256:406f59599991916d2942d8d02f076d957ed71b541ee19f09fc01723a6e6f5932
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=/csi/csi.sock
    Environment:
      AWS_ROLE_ARN:                 arn:aws:iam::861611878732:role/ebs-csi-controller-sa.kube-system.sa.woleung.k8s.local
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/amazonaws.com/token
    Mounts:
      /csi from socket-dir (rw)
      /var/run/secrets/amazonaws.com/ from token-amazonaws-com (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pz7x2 (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  socket-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  token-amazonaws-com:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  86400
  kube-api-access-pz7x2:
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    ConfigMapOptional:        <nil>
    DownwardAPI:              true
QoS Class:                    BestEffort
Node-Selectors:               <none>
Tolerations:                  node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  kubernetes.io/hostname:DoNotSchedule when max skew 1 is exceeded for selector app=ebs-csi-controller,app.kubernetes.io/instance=aws-ebs-csi-driver,app.kubernetes.io/name=aws-ebs-csi-driver
                              topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector app=ebs-csi-controller,app.kubernetes.io/instance=aws-ebs-csi-driver,app.kubernetes.io/name=aws-ebs-csi-driver
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  5m                     default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Warning  FailedScheduling  3m54s (x3 over 4m24s)  default-scheduler  0/2 nodes are available: 1 node(s) didn't match pod topology spread constraints, 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling..
johngmyers commented 1 year ago

Please see https://kops.sigs.k8s.io/operations/troubleshoot/

justinsb commented 1 year ago

/assign

I was able to reproduce this, the issue is that we're trying to bring up two CSI pods but we only have two nodes (one control plane, one node), and one of them is tainted.

mmadrid commented 1 year ago

I was having the same issue. It looks like it has been fixed and is part of the latest alpha release v1.29.0-alpha.1. I was able to get 2 node (1 master, 1 worker) cluster up using that version.

k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 7 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/kops/issues/15852#issuecomment-2026325785): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.