kubermatic / kubeone

Kubermatic KubeOne automate cluster operations on all your cloud, on-prem, edge, and IoT environments.
https://kubeone.io
Apache License 2.0
1.38k stars 234 forks source link

kubeone apply stops with SIGSEGV #2048

Closed DerPate closed 11 months ago

DerPate commented 2 years ago

What happened?

After starting kubeone -v apply -m kubeone/kubeone.yaml the Porgram runs until

this happens: INFO[14:14:29 UTC] Running cluster probes...
E0519 14:14:29.961918 10934 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 1 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x195ec80?, 0x2ce98e0}) k8s.io/apimachinery@v0.22.2/pkg/util/runtime/runtime.go:74 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000002340?}) k8s.io/apimachinery@v0.22.2/pkg/util/runtime/runtime.go:48 +0x75 panic({0x195ec80, 0x2ce98e0}) runtime/panic.go:838 +0x207 sigs.k8s.io/controller-runtime/pkg/client.(client).List(0x2?, {0x1f75580?, 0xc00003c058?}, {0x1f7af50?, 0xc0002f24d0?}, {0xc0004a2340?, 0xc00051de80?, 0x9?}) sigs.k8s.io/controller-runtime@v0.10.2/pkg/client/client.go:287 +0x3c9 k8c.io/kubeone/pkg/tasks.investigateCluster(0xc0003260f0) k8c.io/kubeone/pkg/tasks/probes.go:344 +0x7ee k8c.io/kubeone/pkg/tasks.runProbes(0xc0003260f0) k8c.io/kubeone/pkg/tasks/probes.go:159 +0x494 k8c.io/kubeone/pkg/tasks.(Task).Run.func1() k8c.io/kubeone/pkg/tasks/task.go:60 +0x94 k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x19cf480, 0x436d01}) k8s.io/apimachinery@v0.22.2/pkg/util/wait/wait.go:217 +0x1b k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x1f75580?, 0xc00003c060?}, 0xc000118230?) k8s.io/apimachinery@v0.22.2/pkg/util/wait/wait.go:230 +0x57 k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0x12a05f200?) k8s.io/apimachinery@v0.22.2/pkg/util/wait/wait.go:223 +0x39 k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff({0x2540be400, 0x4000000000000000, 0x0, 0x9, 0x0}, 0x2d18c40?) k8s.io/apimachinery@v0.22.2/pkg/util/wait/wait.go:418 +0x5f k8c.io/kubeone/pkg/tasks.(Task).Run(0xc000420980, 0xc0003260f0) k8c.io/kubeone/pkg/tasks/task.go:55 +0x11c k8c.io/kubeone/pkg/tasks.Tasks.Run({0xc0001a2460, 0x4, 0x0?}, 0xc0003ff970?) k8c.io/kubeone/pkg/tasks/tasks.go:41 +0xfd k8c.io/kubeone/pkg/cmd.runApply(0xc00007ea80) k8c.io/kubeone/pkg/cmd/apply.go:183 +0x207 k8c.io/kubeone/pkg/cmd.applyCmd.func1(0xc000654a00?, {0x1bbe2e7?, 0x3?, 0x3?}) k8c.io/kubeone/pkg/cmd/apply.go:111 +0x7e github.com/spf13/cobra.(Command).execute(0xc000654a00, {0xc0002ed350, 0x3, 0x3}) github.com/spf13/cobra@v1.1.3/command.go:852 +0x67c github.com/spf13/cobra.(Command).ExecuteC(0xc000654280) github.com/spf13/cobra@v1.1.3/command.go:960 +0x39c github.com/spf13/cobra.(Command).Execute(...) github.com/spf13/cobra@v1.1.3/command.go:897 k8c.io/kubeone/pkg/cmd.Execute() k8c.io/kubeone/pkg/cmd/root.go:52 +0x89 main.main() k8c.io/kubeone/main.go:24 +0x17 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x11d8c09]

goroutine 1 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000002340?}) k8s.io/apimachinery@v0.22.2/pkg/util/runtime/runtime.go:55 +0xd8 panic({0x195ec80, 0x2ce98e0}) runtime/panic.go:838 +0x207 sigs.k8s.io/controller-runtime/pkg/client.(client).List(0x2?, {0x1f75580?, 0xc00003c058?}, {0x1f7af50?, 0xc0002f24d0?}, {0xc0004a2340?, 0xc00051de80?, 0x9?}) sigs.k8s.io/controller-runtime@v0.10.2/pkg/client/client.go:287 +0x3c9 k8c.io/kubeone/pkg/tasks.investigateCluster(0xc0003260f0) k8c.io/kubeone/pkg/tasks/probes.go:344 +0x7ee k8c.io/kubeone/pkg/tasks.runProbes(0xc0003260f0) k8c.io/kubeone/pkg/tasks/probes.go:159 +0x494 k8c.io/kubeone/pkg/tasks.(Task).Run.func1() k8c.io/kubeone/pkg/tasks/task.go:60 +0x94 k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x19cf480, 0x436d01}) k8s.io/apimachinery@v0.22.2/pkg/util/wait/wait.go:217 +0x1b k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x1f75580?, 0xc00003c060?}, 0xc000118230?) k8s.io/apimachinery@v0.22.2/pkg/util/wait/wait.go:230 +0x57 k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0x12a05f200?) k8s.io/apimachinery@v0.22.2/pkg/util/wait/wait.go:223 +0x39 k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff({0x2540be400, 0x4000000000000000, 0x0, 0x9, 0x0}, 0x2d18c40?) k8s.io/apimachinery@v0.22.2/pkg/util/wait/wait.go:418 +0x5f k8c.io/kubeone/pkg/tasks.(Task).Run(0xc000420980, 0xc0003260f0) k8c.io/kubeone/pkg/tasks/task.go:55 +0x11c k8c.io/kubeone/pkg/tasks.Tasks.Run({0xc0001a2460, 0x4, 0x0?}, 0xc0003ff970?) k8c.io/kubeone/pkg/tasks/tasks.go:41 +0xfd k8c.io/kubeone/pkg/cmd.runApply(0xc00007ea80) k8c.io/kubeone/pkg/cmd/apply.go:183 +0x207 k8c.io/kubeone/pkg/cmd.applyCmd.func1(0xc000654a00?, {0x1bbe2e7?, 0x3?, 0x3?}) k8c.io/kubeone/pkg/cmd/apply.go:111 +0x7e github.com/spf13/cobra.(Command).execute(0xc000654a00, {0xc0002ed350, 0x3, 0x3}) github.com/spf13/cobra@v1.1.3/command.go:852 +0x67c github.com/spf13/cobra.(Command).ExecuteC(0xc000654280) github.com/spf13/cobra@v1.1.3/command.go:960 +0x39c github.com/spf13/cobra.(Command).Execute(...) github.com/spf13/cobra@v1.1.3/command.go:897 k8c.io/kubeone/pkg/cmd.Execute() k8c.io/kubeone/pkg/cmd/root.go:52 +0x89 main.main() k8c.io/kubeone/main.go:24 +0x17

Expected behavior

The Program should not run into SIGSERV and terminate with a clean cluster Install

How to reproduce the issue?

Install kubeone via the link in Getting started on Ubuntu 20. And try to install a cluster

What KubeOne version are you using?

```console $ kubeone version { "kubeone": { "major": "1", "minor": "4", "gitVersion": "1.4.3", "gitCommit": "717787f2287964e5793d80ec8ca2c2169936b0ac", "gitTreeState": "", "buildDate": "2022-05-11T14:18:03Z", "goVersion": "go1.18.1", "compiler": "gc", "platform": "linux/amd64" }, "machine_controller": { "major": "1", "minor": "43", "gitVersion": "v1.43.2", "gitCommit": "", "gitTreeState": "", "buildDate": "", "goVersion": "", "compiler": "", "platform": "linux/amd64" } } ```

Provide your KubeOneCluster manifest here (if applicable)

```yaml apiVersion: kubeone.k8c.io/v1beta2 kind: KubeOneCluster name: demo-cluster versions: kubernetes: "1.22.3" clusterNetwork: # the subnet used for pods (default: 10.244.0.0/16) podSubnet: "" # the subnet used for services (default: 10.96.0.0/12) serviceSubnet: "" # the domain name used for services (default: cluster.local) serviceDomainName: "" # a nodePort range to reserve for services (default: 30000-32767) nodePortRange: "" # kube-proxy configurations kubeProxy: # skipInstallation will skip the installation of kube-proxy skipInstallation: false # if this set, kube-proxy mode will be set to ipvs ipvs: # different schedulers can be configured: # * rr: round-robin # * lc: least connection (smallest number of open connections) # * dh: destination hashing # * sh: source hashing # * sed: shortest expected delay # * nq: never queue scheduler: rr #strictArp: false tcpTimeout: "0" tcpFinTimeout: "0" udpTimeout: "0" excludeCIDRs: [] # if mode is by default # iptables: {} # CNI plugin of choice. CNI can not be changed later at upgrade time. cni: # Only one CNI plugin can be defined at the same time # Supported CNI plugins: # * canal # * weave-net # * cilium # * external - The CNI plugin can be installed as an addon or manually canal: # MTU represents the maximum transmission unit. # Default MTU value depends on the specified provider: # * AWS - 8951 (9001 AWS Jumbo Frame - 50 VXLAN bytes) # * GCE - 1410 (GCE specific 1460 bytes - 50 VXLAN bytes) # * Hetzner - 1400 (Hetzner specific 1450 bytes - 50 VXLAN bytes) # * OpenStack - 1400 (OpenStack specific 1450 bytes - 50 VXLAN bytes) # * Default - 1450 mtu: 1450 # cilium: # # enableHubble to deploy Hubble relay and UI # enableHubble: true # # kubeProxyReplacement defines weather cilium relies on underlying Kernel support to replace kube-proxy functionality by eBPF (strict), # # or disables a subset of those features so cilium does not bail out if the kernel support is missing (disabled). # kubeProxyReplacement: "disabled" # weaveNet: # # When true is set, secret will be automatically generated and # # referenced in appropriate manifests. Currently only weave-net # # supports encryption. # encrypted: true # external: {} cloudProvider: # Only one cloud provider can be defined at the same time. # Possible values: # aws: {} # azure: {} # digitalocean: {} # gce: {} # hetzner: # networkID: "" # openstack: {} # equinixmetal: {} # vsphere: {} none: {} # aws: {} # Set the kubelet flag '--cloud-provider=external' and deploy the external CCM for supported providers #external: false # Path to file that will be uploaded and used as custom '--cloud-config' file. #cloudConfig: "" # CSIConfig is configuration passed to the CSI driver. # This is currently used only for vSphere clusters. #csiConfig: "" # Controls which container runtime will be installed on instances. # By default: # * Docker will be installed for Kubernetes clusters up to 1.20 # * containerd will be installed for Kubernetes clusters 1.21+ # Currently, it's not possible to migrate existing clusters from one to another # container runtime, however, migration from Docker to containerd is planned # for one of the upcoming KubeOne releases. # Only one container runtime can be present at the time. # # Note: Kubernetes has announced deprecation of Docker (dockershim) support. # It's expected that the Docker support will be removed in Kubernetes 1.24. # It's highly advised to use containerd for all newly created clusters. containerRuntime: # Installs containerd container runtime. # Default for 1.21+ Kubernetes clusters. # containerd: # registries: # k8s.gcr.io: # mirrors: # - https://self-signed.pull-through.cache.tld # tlsConfig: # insecureSkipVerify: true # docker.io: # mirrors: # - http://plain-text2.tld # auth: # # all of the following fields are optional # username: "u5er" # password: "myc00lp455w0rd" # auth: "base64(user:password)" # identityToken: "" # "*": # mirrors: # - https://secure.tld # Installs Docker container runtime. # Default for Kubernetes clusters up to 1.20. # This option will be removed once Kubernetes 1.23 reaches EOL. # docker: {} features: # Enable the PodNodeSelector admission plugin in API server. # More info: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#podnodeselector podNodeSelector: enable: false config: # configFilePath is a path on a local file system to the podNodeSelector # plugin config, which defines default and allowed node selectors. # configFilePath is is a required field. # More info: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#configuration-file-format-1 configFilePath: "" # Enables PodSecurityPolicy admission plugin in API server, as well as creates # default 'privileged' PodSecurityPolicy, plus RBAC rules to authorize # 'kube-system' namespace pods to 'use' it. podSecurityPolicy: enable: false # Enables and configures audit log backend. # More info: https://kubernetes.io/docs/tasks/debug-application-cluster/audit/#log-backend staticAuditLog: enable: false config: # PolicyFilePath is a path on local file system to the audit policy manifest # which defines what events should be recorded and what data they should include. # PolicyFilePath is a required field. # More info: https://kubernetes.io/docs/tasks/debug-application-cluster/audit/#audit-policy policyFilePath: "" # LogPath is path on control plane instances where audit log files are stored logPath: "/var/log/kubernetes/audit.log" # LogMaxAge is maximum number of days to retain old audit log files logMaxAge: 30 # LogMaxBackup is maximum number of audit log files to retain logMaxBackup: 3 # LogMaxSize is maximum size in megabytes of audit log file before it gets rotated logMaxSize: 100 # Enables dynamic audit logs. # After enablig this, operator should create auditregistration.k8s.io/v1alpha1 # AuditSink object. # More info: https://kubernetes.io/docs/tasks/debug-application-cluster/audit/#dynamic-backend dynamicAuditLog: enable: false # Opt-out from deploying metrics-server # more info: https://github.com/kubernetes-incubator/metrics-server metricsServer: # enabled by default enable: true # Enable OpenID-Connect support in API server # More info: https://kubernetes.io/docs/reference/access-authn-authz/authentication/#openid-connect-tokens openidConnect: enable: false config: # The URL of the OpenID issuer, only HTTPS scheme will be accepted. If # set, it will be used to verify the OIDC JSON Web Token (JWT). issuerUrl: "" # The client ID for the OpenID Connect client, must be set if # issuer_url is set. clientId: "kubernetes" # The OpenID claim to use as the user name. Note that claims other than # the default ('sub') is not guaranteed to be unique and immutable. This # flag is experimental in kubernetes, please see the kubernetes # authentication documentation for further details. usernameClaim: "sub" # If provided, all usernames will be prefixed with this value. If not # provided, username claims other than 'email' are prefixed by the issuer # URL to avoid clashes. To skip any prefixing, provide the value '-'. usernamePrefix: "oidc:" # If provided, the name of a custom OpenID Connect claim for specifying # user groups. The claim value is expected to be a string or array of # strings. This flag is experimental in kubernetes, please see the # kubernetes authentication documentation for further details. groupsClaim: "groups" # If provided, all groups will be prefixed with this value to prevent # conflicts with other authentication strategies. groupsPrefix: "oidc:" # Comma-separated list of allowed JOSE asymmetric signing algorithms. JWTs # with a 'alg' header value not in this list will be rejected. Values are # defined by RFC 7518 https://tools.ietf.org/html/rfc7518#section-3.1. signingAlgs: "RS256" # A key=value pair that describes a required claim in the ID Token. If # set, the claim is verified to be present in the ID Token with a matching # value. Only single pair is currently supported. requiredClaim: "" # If set, the OpenID server's certificate will be verified by one of the # authorities in the oidc-ca-file, otherwise the host's root CA set will # be used. caFile: "" # Enable Kubernetes Encryption Providers # For more information: https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/ encryptionProviders: # disabled by default enable: false # inline string customEncryptionConfiguration: "" ## Bundle of Root CA Certificates extracted from Mozilla ## can be found here: https://curl.se/ca/cacert.pem ## caBundle should be empty for default root CAs to be used caBundle: "" systemPackages: # will add Docker and Kubernetes repositories to OS package manager configureRepositories: true # it's true by default # registryConfiguration controls how images used for components deployed by # KubeOne and kubeadm are pulled from an image registry registryConfiguration: # overwriteRegistry specifies a custom Docker registry which will be used # for all images required for KubeOne and kubeadm. This also applies to # addons deployed by KubeOne. # This field doesn't modify the user/organization part of the image. For example, # if overwriteRegistry is set to 127.0.0.1:5000/example, image called # calico/cni would translate to 127.0.0.1:5000/example/calico/cni. overwriteRegistry: "" # InsecureRegistry configures Docker to threat the registry specified # in OverwriteRegistry as an insecure registry. This is also propagated # to the worker nodes managed by machine-controller and/or KubeOne. insecureRegistry: false # Addons are Kubernetes manifests to be deployed after provisioning the cluster addons: enable: false # In case when the relative path is provided, the path is relative # to the KubeOne configuration file. # This path is required only if you want to provide custom addons or override # embedded addons. path: "./addons" # globalParams is a key-value map of values passed to the addons templating engine, # to be used in the addons' manifests. The values defined here are passed to all # addons. globalParams: key: value # addons is used to enable addons embedded in the KubeOne binary. # Currently backups-restic, default-storage-class, and unattended-upgrades are # available addons. # Check out the documentation to find more information about what are embedded # addons and how to use them: # https://docs.kubermatic.com/kubeone/v1.4/guides/addons/ addons: # name of the addon to be enabled/deployed (e.g. backups-restic) - name: "" # delete triggers deletion of the deployed addon delete: false # params is a key-value map of values passed to the addons templating engine, # to be used in the addon's manifests. Values defined here override the values # defined in globalParams. params: key: value # The list of nodes can be overwritten by providing Terraform output. # You are strongly encouraged to provide an odd number of nodes and # have at least three of them. # Remember to only specify your *master* nodes. controlPlane: hosts: - publicAddress: '' privateAddress: '10.XX.XX.61' # bastion: '4.3.2.1' # bastionPort: 22 # can be left out if using the default (22) # bastionUser: 'root' # can be left out if using the default ('root') sshPort: 22 # can be left out if using the default (22) sshUsername: # # You usually want to configure either a private key OR an # # agent socket, but never both. The socket value can be # # prefixed with "env:" to refer to an environment variable. # sshPrivateKeyFile: '~/.ssh/id_rsa' sshAgentSocket: '/tmp/ssh-XXXXXX5BZFMa/agent.8277' # # Taints is used to apply taints to the node. # # If not provided defaults to TaintEffectNoSchedule, with key # # node-role.kubernetes.io/master for control plane nodes. # # Explicitly empty (i.e. taints: {}) means no taints will be applied. operatingSystem: ubuntu taints: - key: "node-role.kubernetes.io/master" effect: "NoSchedule" - publicAddress: '10.XX.XX.62' privateAddress: '10.XX.XX.62' # bastion: '4.3.2.1' # bastionPort: 22 # can be left out if using the default (22) # bastionUser: 'root' # can be left out if using the default ('root') sshPort: 22 # can be left out if using the default (22) sshUsername: # # You usually want to configure either a private key OR an # # agent socket, but never both. The socket value can be # # prefixed with "env:" to refer to an environment variable. # sshPrivateKeyFile: '/home//.ssh/id_rsa' sshAgentSocket: '/tmp/ssh-XXXXXX5BZFMa/agent.8277' # # Taints is used to apply taints to the node. # # If not provided defaults to TaintEffectNoSchedule, with key # # node-role.kubernetes.io/master for control plane nodes. # # Explicitly empty (i.e. taints: {}) means no taints will be applied. operatingSystem: ubuntu taints: - key: "node-role.kubernetes.io/master" effect: "NoSchedule" - publicAddress: '10.XX.XX.63' privateAddress: '10.XX.XX.63' # bastion: '4.3.2.1' # bastionPort: 22 # can be left out if using the default (22) # bastionUser: 'root' # can be left out if using the default ('root') sshPort: 22 # can be left out if using the default (22) sshUsername: # # You usually want to configure either a private key OR an # # agent socket, but never both. The socket value can be # # prefixed with "env:" to refer to an environment variable. # sshPrivateKeyFile: '/home//.ssh/id_rsa' sshAgentSocket: '/tmp/ssh-XXXXXX5BZFMa/agent.8277' # # Taints is used to apply taints to the node. # # If not provided defaults to TaintEffectNoSchedule, with key # # node-role.kubernetes.io/master for control plane nodes. # # Explicitly empty (i.e. taints: {}) means no taints will be applied. operatingSystem: ubuntu taints: - key: "node-role.kubernetes.io/master" effect: "NoSchedule" # A list of static workers, not managed by MachineController. # The list of nodes can be overwritten by providing Terraform output. staticWorkers: hosts: - publicAddress: '' privateAddress: '10.XX.XX.71' # bastion: '4.3.2.1' # bastionPort: 22 # can be left out if using the default (22) # bastionUser: 'root' # can be left out if using the default ('root') sshPort: 22 # can be left out if using the default (22) sshUsername: # # You usually want to configure either a private key OR an # # agent socket, but never both. The socket value can be # # prefixed with "env:" to refer to an environment variable. # sshPrivateKeyFile: '/home//.ssh/id_rsa' sshAgentSocket: 'env:SSH_AUTH_SOCK' # # Taints is used to apply taints to the node. # # Explicitly empty (i.e. taints: {}) means no taints will be applied. # # taints: # # - key: "" # # effect: "" operatingSystem: ubuntu - publicAddress: '' privateAddress: '10.XX.XX.72' # bastion: '4.3.2.1' # bastionPort: 22 # can be left out if using the default (22) # bastionUser: 'root' # can be left out if using the default ('root') sshPort: 22 # can be left out if using the default (22) sshUsername: # # You usually want to configure either a private key OR an # # agent socket, but never both. The socket value can be # # prefixed with "env:" to refer to an environment variable. # sshPrivateKeyFile: '/home//.ssh/id_rsa' sshAgentSocket: 'env:SSH_AUTH_SOCK' # # Taints is used to apply taints to the node. # # Explicitly empty (i.e. taints: {}) means no taints will be applied. # # taints: # # - key: "" # # effect: "" operatingSystem: ubuntu - publicAddress: '' privateAddress: '10.XX.XX.73' # bastion: '4.3.2.1' # bastionPort: 22 # can be left out if using the default (22) # bastionUser: 'root' # can be left out if using the default ('root') sshPort: 22 # can be left out if using the default (22) sshUsername: # # You usually want to configure either a private key OR an # # agent socket, but never both. The socket value can be # # prefixed with "env:" to refer to an environment variable. # sshPrivateKeyFile: '/home//.ssh/id_rsa' sshAgentSocket: 'env:SSH_AUTH_SOCK' # # Taints is used to apply taints to the node. # # Explicitly empty (i.e. taints: {}) means no taints will be applied. # # taints: # # - key: "" # # effect: "" operatingSystem: ubuntu - publicAddress: '' privateAddress: '10.XX.XX.74' # bastion: '4.3.2.1' # bastionPort: 22 # can be left out if using the default (22) # bastionUser: 'root' # can be left out if using the default ('root') sshPort: 22 # can be left out if using the default (22) sshUsername: # # You usually want to configure either a private key OR an # # agent socket, but never both. The socket value can be # # prefixed with "env:" to refer to an environment variable. # sshPrivateKeyFile: '/home//.ssh/id_rsa' sshAgentSocket: 'env:SSH_AUTH_SOCK' # # Taints is used to apply taints to the node. # # Explicitly empty (i.e. taints: {}) means no taints will be applied. # # taints: # # - key: "" # # effect: "" operatingSystem: ubuntu - publicAddress: '' privateAddress: '10.XX.XX.75' # bastion: '4.3.2.1' # bastionPort: 22 # can be left out if using the default (22) # bastionUser: 'root' # can be left out if using the default ('root') sshPort: 22 # can be left out if using the default (22) sshUsername: # # You usually want to configure either a private key OR an # # agent socket, but never both. The socket value can be # # prefixed with "env:" to refer to an environment variable. # sshPrivateKeyFile: '/home//.ssh/id_rsa' sshAgentSocket: 'env:SSH_AUTH_SOCK' # # Taints is used to apply taints to the node. # # Explicitly empty (i.e. taints: {}) means no taints will be applied. # # taints: # # - key: "" # # effect: "" operatingSystem: ubuntu - publicAddress: '' privateAddress: '10.XX.XX.76' # bastion: '4.3.2.1' # bastionPort: 22 # can be left out if using the default (22) # bastionUser: 'root' # can be left out if using the default ('root') sshPort: 22 # can be left out if using the default (22) sshUsername: # # You usually want to configure either a private key OR an # # agent socket, but never both. The socket value can be # # prefixed with "env:" to refer to an environment variable. # sshPrivateKeyFile: '/home//.ssh/id_rsa' sshAgentSocket: 'env:SSH_AUTH_SOCK' # # Taints is used to apply taints to the node. # # Explicitly empty (i.e. taints: {}) means no taints will be applied. # # taints: # # - key: "" # # effect: "" operatingSystem: ubuntu # The API server can also be overwritten by Terraform. Provide the # external address of your load balancer or the public addresses of # the first control plane nodes. apiEndpoint: host: '10.XX.XX.66' port: 6443 # alternativeNames: [] # If the cluster runs on bare metal or an unsupported cloud provider, # you can disable the machine-controller deployment entirely. In this # case, anything you configure in your "workers" sections is ignored. machineController: deploy: false # Proxy is used to configure HTTP_PROXY, HTTPS_PROXY and NO_PROXY # for Docker daemon and kubelet, and to be used when provisioning cluster # (e.g. for curl, apt-get..). # Also worker nodes managed by machine-controller will be configred according to # proxy settings here. The caveat is that only proxy.http and proxy.noProxy will # be used on worker machines. # proxy: # http: '' # https: '' # noProxy: '' # KubeOne can automatically create MachineDeployments to create # worker nodes in your cluster. Each element in this "workers" # list is a single deployment and must have a unique name. # dynamicWorkers: # - name: fra1-a # replicas: 1 # providerSpec: # labels: # mylabel: 'fra1-a' # # SSH keys can be inferred from Terraform if this list is empty # # and your tf output contains a "ssh_public_keys" field. # # sshPublicKeys: # # - 'ssh-rsa ......' # # cloudProviderSpec corresponds 'provider.name' config # cloudProviderSpec: # ### the following params could be inferred by kubeone from terraform # ### output JSON: # # ami: 'ami-0332a5c40cf835528', # # availabilityZone: 'eu-central-1a', # # instanceProfile: 'mycool-profile', # # region: 'eu-central-1', # # securityGroupIDs: ['sg-01f34ffd8447e70c0'] # # subnetId: 'subnet-2bff4f43', # # vpcId: 'vpc-819f62e9' # ### end of terraform inferred kubeone params # instanceType: 't3.medium' # diskSize: 50 # diskType: 'gp2' # operatingSystem: 'ubuntu' # operatingSystemSpec: # distUpgradeOnBoot: true # - name: fra1-b # replicas: 1 # providerSpec: # labels: # mylabel: 'fra1-b' # cloudProviderSpec: # instanceType: 't3.medium' # diskSize: 50 # diskType: 'gp2' # operatingSystem: 'ubuntu' # operatingSystemSpec: # distUpgradeOnBoot: true # - name: fra1-c # replicas: 1 # providerSpec: # labels: # mylabel: 'fra1-c' # cloudProviderSpec: # instanceType: 't3.medium' # diskSize: 50 # diskType: 'gp2' # operatingSystem: 'ubuntu' # operatingSystemSpec: # distUpgradeOnBoot: true loggingConfig: containerLogMaxSize: "100Mi" containerLogMaxFiles: 5 ```

What cloud provider are you running on?

Running on Baremetal with out Teraform

What operating system are you running in your cluster?

Ubuntu 20.04

kron4eg commented 2 years ago

@DerPate please paste the full log.

DerPate commented 2 years ago

There ist not more to it realy but I will paste it tomorrow. Worked around it with using version 1.3

DerPate commented 2 years ago

INFO[05:15:02 UTC] Determine hostname...
INFO[05:15:04 UTC] Determine operating system...
these two lines are missing from the log. @kron4eg

xmudrii commented 2 years ago

@DerPate Is your API server working at all? Can you do anything with kubectl, like kubectl get nodes?

DerPate commented 2 years ago

No not realy, The cluster wouldn't start or any thing it chrashes hard when using 1.4. It will work when using 1.3

kron4eg commented 2 years ago

@DerPate how does "crashes hard" manifests itself? Some logs from crashing pods would be good to see.

DerPate commented 2 years ago

@kron4eg The go sub process kubeone spawns to configure the cluster exits with signal 11 SIGSEGV the crash log for this "hard chrash" i called it is in the first post. Kubeone is not able to spin up the cluster because it chrases before even starting to building the first node, so i'm not able to provide any pod node logs as there is no active cluster. I hoipe this makes it more clear to you

kron4eg commented 2 years ago

But can you ssh to the control plane instance and take a look directly to the container logs?

MiroslavRepka commented 2 years ago

I have encountered the same panic issue on the already built cluster. I can access the cluster via kubectl but there is nothing out of the ordinary that would stand out. All pods are running, some had a restart, however, I cannot confirm that the restarts happened during the panic. The nodes are also healthy in the Ready state.

Kubeone version 1.4.3 Deployed via kubeone apply without terraform

kron4eg commented 2 years ago

@MiroslavRepka a way to reproduce would help to debug this case.

MiroslavRepka commented 2 years ago

I deployed a cluster on VMs I created in Hetzner cloud. The cluster was composed from 3 Master/2 Worker nodes. Here is the kubeone.yaml

apiVersion: kubeone.k8c.io/v1beta2
kind: KubeOneCluster
name: cluster

versions:
  kubernetes: 'v1.21.0'

clusterNetwork:
  cni:
    external: {}

cloudProvider:
  none: {}
  external: false

addons:
  enable: true
  path: "../addons"

apiEndpoint:
  host: 'my.fancy.url'
  port: 6443

controlPlane:
  hosts:
  - publicAddress: 'x.x.x.x'
    privateAddress: 'x.x.x.x'
    sshUsername: root
    sshPrivateKeyFile: './private.pem'
    taints:
    - key: "node-role.kubernetes.io/master"
      effect: "NoSchedule"
  - publicAddress: 'x.x.x.x'
    privateAddress: 'x.x.x.x'
    sshUsername: root
    sshPrivateKeyFile: './private.pem'
    taints:
    - key: "node-role.kubernetes.io/master"
      effect: "NoSchedule"
  - publicAddress: 'x.x.x.x'
    privateAddress: 'x.x.x.x'
    sshUsername: root
    sshPrivateKeyFile: './private.pem'
    taints:
    - key: "node-role.kubernetes.io/master"
      effect: "NoSchedule"

staticWorkers:
  hosts:
  - publicAddress: 'x.x.x.x'
    privateAddress: 'x.x.x.x'
    sshUsername: root
    sshPrivateKeyFile: './private.pem'
  - publicAddress: 'x.x.x.x'
    privateAddress: 'x.x.x.x'
    sshUsername: root
    sshPrivateKeyFile: './private.pem'

machineController:
  deploy: false

The addons are defining calico CNI.

The cluster was created via kubeone apply -m kubeone.yaml with no errors. After some small changes to the external Loadbalancer, I ran kubeone apply again, which threw a panic.

The cluster was still accessible with kubectl, as I mentioned before.

The panic the kubeone threw was identical to the one reported by OP.

kubermatic-bot commented 2 years ago

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

xmudrii commented 2 years ago

/remove-lifecycle stale

kubermatic-bot commented 1 year ago

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubermatic-bot commented 1 year ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kron4eg commented 1 year ago

@MiroslavRepka can I ask why cloudProvider.none is used?

kron4eg commented 1 year ago

@DerPate We've upgraded few times kubernetes dependencies since initial issue open data, including controller-runtime. Please try to reproduce again.

MiroslavRepka commented 1 year ago

can I ask why cloudProvider.none is used?

If I remember correctly, the machines used in the cluster were from multiple cloud providers.

kron4eg commented 1 year ago

machines used in the cluster were from multiple cloud providers.

This is definitely not recommended nether supported and is a perfect recipe to break the cluster.

MiroslavRepka commented 1 year ago

This is definitely not recommended nether supported and is a perfect recipe to break the cluster.

I agree that a multi-cloud cluster that depends on solely kubeOne would probably fail. However, a few adjustments, like installing a VPN before running kubeOne, installing and configuring some block storage solution like Longhorn, etc., should do the trick.

kubermatic-bot commented 1 year ago

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

xmudrii commented 1 year ago

/remove-lifecycle stale

kubermatic-bot commented 1 year ago

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubermatic-bot commented 12 months ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kubermatic-bot commented 11 months ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

kubermatic-bot commented 11 months ago

@kubermatic-bot: Closing this issue.

In response to [this](https://github.com/kubermatic/kubeone/issues/2048#issuecomment-1817831238): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.