kubermatic / kubeone

Kubermatic KubeOne automate cluster operations on all your cloud, on-prem, edge, and IoT environments.
https://kubeone.io
Apache License 2.0
1.38k stars 234 forks source link

kubeone on bare-metal fails with error: ssh: popen Process exited with status 1 #3047

Closed waqarkhan88 closed 8 months ago

waqarkhan88 commented 8 months ago

What happened?

When running kubeone apply on a bare-metal I am getting following error, I tried with Rocky Linux as well as Ubuntu but getting same error. I tried to use Kubeone 1.7.0 and 1.7.2

khw@FW56:~$ kubeone apply -m /mnt/d/kubeone.yml -dv
INFO[12:40:16 PKT] Determine hostname...
[192.168.207.134] + export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/sbin:/usr/local/bin:/opt/bin
[192.168.207.134] + PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/sbin:/usr/local/bin:/opt/bin
[192.168.207.134] ++ hostname -f
[192.168.207.134] + fqdn=master0.k8c-master-cluster0.local
[192.168.207.134] + '[' master0.k8c-master-cluster0.local = localhost ']'
[192.168.207.134] + echo -n master0.k8c-master-cluster0.local
[192.168.207.134] master0.k8c-master-cluster0.local
DEBU[12:40:17 PKT] Hostname is detected: "master0.k8c-master-cluster0.local"  node=192.168.207.134
[192.168.207.133] + export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/sbin:/usr/local/bin:/opt/bin
[192.168.207.133] + PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/sbin:/usr/local/bin:/opt/bin
[192.168.207.133] ++ hostname -f
[192.168.207.133] + fqdn=worker0.k8c-master-cluster0.local
[192.168.207.133] + '[' worker0.k8c-master-cluster0.local = localhost ']'
[192.168.207.133] + echo -n worker0.k8c-master-cluster0.local
[192.168.207.133] worker0.k8c-master-cluster0.local
DEBU[12:40:17 PKT] Hostname is detected: "worker0.k8c-master-cluster0.local"  node=192.168.207.133
INFO[12:40:17 PKT] Determine operating system...
ERRO[12:40:17 PKT] ssh: popen
Process exited with status 1       node=192.168.207.134
WARN[12:40:17 PKT] Task failed, error was: runtime: running task on "192.168.207.134"
ssh: popen
Process exited with status 1

Expected behavior

Cluster creation successful

How to reproduce the issue?

I created 2 VMs using Ubuntu 22.04.4 LTS x86_64, configured and tested SSH connectivity, Installed Kubeone on WSL running same version of Ubuntu and ran kubeone apply command with the manifest file.

What KubeOne version are you using?

```console khw@FW56:~$ kubeone version { "kubeone": { "major": "1", "minor": "7", "gitVersion": "1.7.2", "gitCommit": "00fd09d91da76e307f016afb3b4f42ad6281eb2c", "gitTreeState": "", "buildDate": "2024-01-05T15:30:12Z", "goVersion": "go1.21.3", "compiler": "gc", "platform": "linux/amd64" }, "machine_controller": { "major": "1", "minor": "57", "gitVersion": "v1.57.4", "gitCommit": "", "gitTreeState": "", "buildDate": "", "goVersion": "", "compiler": "", "platform": "linux/amd64" } } ```

Provide your KubeOneCluster manifest here (if applicable)

```yaml apiVersion: kubeone.k8c.io/v1beta2 kind: KubeOneCluster name: demo-cluster versions: kubernetes: "v1.27.9" clusterNetwork: # the subnet used for pods (default: 10.244.0.0/16) podSubnet: "10.244.0.0/16" # the subnet used for services (default: 10.96.0.0/12) serviceSubnet: "10.96.0.0/12" # the domain name used for services (default: cluster.local) serviceDomainName: "k8c-master-cluster0.local" # a nodePort range to reserve for services (default: 30000-32767) nodePortRange: "30000-32767" # kube-proxy configurations # kubeProxy: # # skipInstallation will skip the installation of kube-proxy # # skipInstallation: true # # if this set, kube-proxy mode will be set to ipvs # ipvs: # # different schedulers can be configured: # # * rr: round-robin # # * lc: least connection (smallest number of open connections) # # * dh: destination hashing # # * sh: source hashing # # * sed: shortest expected delay # # * nq: never queue # scheduler: rr # strictArp: false # tcpTimeout: "0" # tcpFinTimeout: "0" # udpTimeout: "0" # excludeCIDRs: [] # # if mode is by default # iptables: {} # CNI plugin of choice. CNI can not be changed later at upgrade time. # cni: # # Only one CNI plugin can be defined at the same time # # Supported CNI plugins: # # * canal # # * weave-net # # * cilium # # * external - The CNI plugin can be installed as an addon or manually # canal: # # MTU represents the maximum transmission unit. # # Default MTU value depends on the specified provider: # # * AWS - 8951 (9001 AWS Jumbo Frame - 50 VXLAN bytes) # # * GCE - 1410 (GCE specific 1460 bytes - 50 VXLAN bytes) # # * Hetzner - 1400 (Hetzner specific 1450 bytes - 50 VXLAN bytes) # # * OpenStack - 1400 (OpenStack specific 1450 bytes - 50 VXLAN bytes) # # * Default - 1450 # mtu: 1450 # # cilium: # # # enableHubble to deploy Hubble relay and UI # # enableHubble: true # # # kubeProxyReplacement defines weather cilium relies on underlying Kernel support to replace kube-proxy functionality by eBPF (strict), # # # or disables a subset of those features so cilium does not bail out if the kernel support is missing (disabled). # # kubeProxyReplacement: "disabled" # # weaveNet: # # # When true is set, secret will be automatically generated and # # # referenced in appropriate manifests. Currently only weave-net # # # supports encryption. # # encrypted: true # # external: {} cloudProvider: # Only one cloud provider can be defined at the same time. # Possible values: # aws: {} # azure: {} # digitalocean: {} # gce: {} # hetzner: # networkID: "" # openstack: {} # equinixmetal: {} # vsphere: {} none: {} # aws: {} # Set the kubelet flag '--cloud-provider=external' and deploy the external CCM for supported providers # external: false # Path to file that will be uploaded and used as custom '--cloud-config' file. # cloudConfig: "" # CSIConfig is configuration passed to the CSI driver. # This is currently used only for vSphere clusters. # csiConfig: "" # Controls which container runtime will be installed on instances. # By default: # * Docker will be installed for Kubernetes clusters up to 1.20 # * containerd will be installed for Kubernetes clusters 1.21+ # Currently, it's not possible to migrate existing clusters from one to another # container runtime, however, migration from Docker to containerd is planned # for one of the upcoming KubeOne releases. # Only one container runtime can be present at the time. # # Note: Kubernetes has announced deprecation of Docker (dockershim) support. # It's expected that the Docker support will be removed in Kubernetes 1.24. # It's highly advised to use containerd for all newly created clusters. containerRuntime: # Installs containerd container runtime. # Default for 1.21+ Kubernetes clusters. # containerd: # registries: # registry.k8s.io: # mirrors: # - https://self-signed.pull-through.cache.tld # tlsConfig: # insecureSkipVerify: true # docker.io: # mirrors: # - http://plain-text2.tld # auth: # # all of the following fields are optional # username: "u5er" # password: "myc00lp455w0rd" # auth: "base64(user:password)" # identityToken: "" # "*": # mirrors: # - https://secure.tld # Installs Docker container runtime. # Default for Kubernetes clusters up to 1.20. # This option will be removed once Kubernetes 1.23 reaches EOL. # docker: {} features: # Configure the CoreDNS deployment coreDNS: replicas: 2 deployPodDisruptionBudget: true # imageRepository allows users to specify the image registry to be used # for CoreDNS. Kubeadm automatically appends /coredns at the end, so it's # not necessary to specify it. # By default it's empty, which means it'll be defaulted based on kubeadm # defaults and if overwriteRegistry feature is used. # imageRepository has the highest priority, meaning that it'll override # overwriteRegistry if specified. # imageRepository: "" # nodeLocalDNS allows disabling deployment of node local DNS nodeLocalDNS: deploy: true # Enable the PodNodeSelector admission plugin in API server. # More info: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#podnodeselector podNodeSelector: enable: false config: # configFilePath is a path on a local file system to the podNodeSelector # plugin config, which defines default and allowed node selectors. # configFilePath is is a required field. # More info: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#configuration-file-format-1 # configFilePath: "" # Enables PodSecurityPolicy admission plugin in API server, as well as creates # default 'privileged' PodSecurityPolicy, plus RBAC rules to authorize # 'kube-system' namespace pods to 'use' it. podSecurityPolicy: enable: false # Enables and configures audit log backend. # More info: https://kubernetes.io/docs/tasks/debug-application-cluster/audit/#log-backend staticAuditLog: enable: false config: # PolicyFilePath is a path on local file system to the audit policy manifest # which defines what events should be recorded and what data they should include. # PolicyFilePath is a required field. # More info: https://kubernetes.io/docs/tasks/debug-application-cluster/audit/#audit-policy policyFilePath: "" # LogPath is path on control plane instances where audit log files are stored logPath: "/var/log/kubernetes/audit.log" # LogMaxAge is maximum number of days to retain old audit log files logMaxAge: 30 # LogMaxBackup is maximum number of audit log files to retain logMaxBackup: 3 # LogMaxSize is maximum size in megabytes of audit log file before it gets rotated logMaxSize: 100 # Enables dynamic audit logs. # After enablig this, operator should create auditregistration.k8s.io/v1alpha1 # AuditSink object. # More info: https://kubernetes.io/docs/tasks/debug-application-cluster/audit/#dynamic-backend dynamicAuditLog: enable: false # Opt-out from deploying metrics-server # more info: https://github.com/kubernetes-incubator/metrics-server metricsServer: # enabled by default enable: true # Enable OpenID-Connect support in API server # More info: https://kubernetes.io/docs/reference/access-authn-authz/authentication/#openid-connect-tokens openidConnect: enable: false config: # The URL of the OpenID issuer, only HTTPS scheme will be accepted. If # set, it will be used to verify the OIDC JSON Web Token (JWT). # issuerUrl: "" # The client ID for the OpenID Connect client, must be set if # issuer_url is set. # clientId: "kubernetes" # The OpenID claim to use as the user name. Note that claims other than # the default ('sub') is not guaranteed to be unique and immutable. This # flag is experimental in kubernetes, please see the kubernetes # authentication documentation for further details. # usernameClaim: "sub" # If provided, all usernames will be prefixed with this value. If not # provided, username claims other than 'email' are prefixed by the issuer # URL to avoid clashes. To skip any prefixing, provide the value '-'. # usernamePrefix: "oidc:" # If provided, the name of a custom OpenID Connect claim for specifying # user groups. The claim value is expected to be a string or array of # strings. This flag is experimental in kubernetes, please see the # kubernetes authentication documentation for further details. # groupsClaim: "groups" # If provided, all groups will be prefixed with this value to prevent # conflicts with other authentication strategies. # groupsPrefix: "oidc:" # Comma-separated list of allowed JOSE asymmetric signing algorithms. JWTs # with a 'alg' header value not in this list will be rejected. Values are # defined by RFC 7518 https://tools.ietf.org/html/rfc7518#section-3.1. # signingAlgs: "RS256" # A key=value pair that describes a required claim in the ID Token. If # set, the claim is verified to be present in the ID Token with a matching # value. Only single pair is currently supported. # requiredClaim: "" # If set, the OpenID server's certificate will be verified by one of the # authorities in the oidc-ca-file, otherwise the host's root CA set will # be used. # caFile: "" # Enable Kubernetes Encryption Providers # For more information: https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/ encryptionProviders: # disabled by default enable: false # inline string # customEncryptionConfiguration: "" ## Bundle of Root CA Certificates extracted from Mozilla ## can be found here: https://curl.se/ca/cacert.pem ## caBundle should be empty for default root CAs to be used caBundle: "" systemPackages: # will add Docker and Kubernetes repositories to OS package manager configureRepositories: true # it's true by default # registryConfiguration controls how images used for components deployed by # KubeOne and kubeadm are pulled from an image registry # registryConfiguration: # # overwriteRegistry specifies a custom Docker registry which will be used # # for all images required for KubeOne and kubeadm. This also applies to # # addons deployed by KubeOne. # # This field doesn't modify the user/organization part of the image. For example, # # if overwriteRegistry is set to 127.0.0.1:5000/example, image called # # calico/cni would translate to 127.0.0.1:5000/example/calico/cni. # overwriteRegistry: "" # # InsecureRegistry configures Docker to threat the registry specified # # in OverwriteRegistry as an insecure registry. This is also propagated # # to the worker nodes managed by machine-controller and/or KubeOne. # insecureRegistry: false # Addons are Kubernetes manifests to be deployed after provisioning the cluster # addons: # enable: false # # In case when the relative path is provided, the path is relative # # to the KubeOne configuration file. # # This path is required only if you want to provide custom addons or override # # embedded addons. # path: "./addons" # # globalParams is a key-value map of values passed to the addons templating engine, # # to be used in the addons' manifests. The values defined here are passed to all # # addons. # globalParams: # key: value # # addons is used to enable addons embedded in the KubeOne binary. # # Currently backups-restic, default-storage-class, and unattended-upgrades are # # available addons. # # Check out the documentation to find more information about what are embedded # # addons and how to use them: # # https://docs.kubermatic.com/kubeone/v1.7/guides/addons/ # addons: # # name of the addon to be enabled/deployed (e.g. backups-restic) # - name: "" # # delete triggers deletion of the deployed addon # delete: false # # params is a key-value map of values passed to the addons templating engine, # # to be used in the addon's manifests. Values defined here override the values # # defined in globalParams. # params: # key: value # The list of nodes can be overwritten by providing Terraform output. # You are strongly encouraged to provide an odd number of nodes and # have at least three of them. # Remember to only specify your *master* nodes. controlPlane: hosts: - publicAddress: '192.168.207.134' privateAddress: '10.0.0.1' # bastion: '4.3.2.1' # bastionPort: 22 # can be left out if using the default (22) # bastionUser: 'root' # can be left out if using the default ('root') # Optional ssh host public key for verification of the connection to the bastion host # bastionHostPublicKey: "AAAAC3NzaC1lZDI1NTE5AAAAIGpmWkI5dl7GB3E1hB9LDuju87x9hX5Umw9fih+xXNU+" # sshPort: 22 # can be left out if using the default (22) sshUsername: master0 # You usually want to configure either a private key OR an # agent socket, but never both. The socket value can be # prefixed with "env:" to refer to an environment variable. # sshPrivateKeyFile: '/home/me/.ssh/id_rsa' sshAgentSocket: 'env:SSH_AUTH_SOCK' # Optional ssh host public key for verification of the connection to the control plane host # sshHostPublicKey: "AAAAC3NzaC1lZDI1NTE5AAAAIPwEDvXiKfvXrysf86VW5dJTKDlQ09e2tV0+T3KeFKmI" # Taints are taints applied to nodes. If not provided (i.e. nil) for control plane nodes, # it defaults to: # * For Kubernetes 1.23 and older: TaintEffectNoSchedule with key node-role.kubernetes.io/master # * For Kubernetes 1.24 and newer: TaintEffectNoSchedule with keys # node-role.kubernetes.io/control-plane and node-role.kubernetes.io/master # Explicitly empty (i.e. []corev1.Taint{}) means no taints will be applied (this is default for worker nodes). # taints: # - key: "node-role.kubernetes.io/master" # effect: "NoSchedule" # labels: # # to add new custom label # "new-custom-label": "custom-value" # # to delete existing label (use minus symbol with empty value) # "node.kubernetes.io/exclude-from-external-load-balancers-": "" # kubelet is used to control kubelet configuration # uncomment the following to set those kubelet parameters. More into at: # https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/# # kubelet: # systemReserved: # cpu: 200m # memory: 200Mi # kubeReserved: # cpu: 200m # memory: 300Mi # evictionHard: {} # maxPods: 110 # A list of static workers, not managed by MachineController. # The list of nodes can be overwritten by providing Terraform output. staticWorkers: hosts: - publicAddress: '192.168.207.133' privateAddress: '10.0.1.1' # bastion: '4.3.2.1' # bastionPort: 22 # can be left out if using the default (22) # bastionUser: 'root' # can be left out if using the default ('root') # bastionHostPublicKey: "AAAAC3NzaC1lZDI1NTE5AAAAIGpmWkI5dl7GB3E1hB9LDuju87x9hX5Umw9fih+xXNU+" # sshPort: 22 # can be left out if using the default (22) sshUsername: worker0 # You usually want to configure either a private key OR an # agent socket, but never both. The socket value can be # prefixed with "env:" to refer to an environment variable. # sshPrivateKeyFile: '/home/me/.ssh/id_rsa' sshAgentSocket: 'env:SSH_AUTH_SOCK' # Optional ssh host public key for verification of the connection to the static worker host # sshHostPublicKey: "AAAAC3NzaC1lZDI1NTE5AAAAIMBejAkW4AARsZZkC6PqWGuB14fkPzEQoZ4im4TuOkdD" # Taints is used to apply taints to the node. # Explicitly empty (i.e. taints: {}) means no taints will be applied. # taints: # - key: "" # effect: "" # kubelet is used to control kubelet configuration # uncomment the following to set those kubelet parameters. More into at: # https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/# # kubelet: # systemReserved: # cpu: 200m # memory: 200Mi # kubeReserved: # cpu: 200m # memory: 300Mi # evictionHard: {} # maxPods: 110 # The API server can also be overwritten by Terraform. Provide the # external address of your load balancer or the public addresses of # the first control plane nodes. apiEndpoint: host: '' port: 6443 alternativeNames: [] # If the cluster runs on bare metal or an unsupported cloud provider, # you can disable the machine-controller deployment entirely. In this # case, anything you configure in your "workers" sections is ignored. machineController: deploy: false # Proxy is used to configure HTTP_PROXY, HTTPS_PROXY and NO_PROXY # for Docker daemon and kubelet, and to be used when provisioning cluster # (e.g. for curl, apt-get..). # Also worker nodes managed by machine-controller will be configured according to # proxy settings here. The caveat is that only proxy.http and proxy.noProxy will # be used on worker machines. # proxy: # http: '' # https: '' # noProxy: '' # KubeOne can automatically create MachineDeployments to create # worker nodes in your cluster. Each element in this "workers" # list is a single deployment and must have a unique name. # dynamicWorkers: # - name: fra1-a # replicas: 1 # providerSpec: # labels: # mylabel: 'fra1-a' # # SSH keys can be inferred from Terraform if this list is empty # # and your tf output contains a "ssh_public_keys" field. # # sshPublicKeys: # # - 'ssh-rsa ......' # # cloudProviderSpec corresponds 'provider.name' config # cloudProviderSpec: # ### the following params could be inferred by kubeone from terraform # ### output JSON: # # ami: 'ami-0332a5c40cf835528', # # availabilityZone: 'eu-central-1a', # # instanceProfile: 'mycool-profile', # # region: 'eu-central-1', # # securityGroupIDs: ['sg-01f34ffd8447e70c0'] # # subnetId: 'subnet-2bff4f43', # # vpcId: 'vpc-819f62e9' # ### end of terraform inferred kubeone params # instanceType: 't3.medium' # diskSize: 50 # diskType: 'gp2' # operatingSystem: 'ubuntu' # operatingSystemSpec: # distUpgradeOnBoot: true # - name: fra1-b # replicas: 1 # providerSpec: # labels: # mylabel: 'fra1-b' # cloudProviderSpec: # instanceType: 't3.medium' # diskSize: 50 # diskType: 'gp2' # operatingSystem: 'ubuntu' # operatingSystemSpec: # distUpgradeOnBoot: true # - name: fra1-c # replicas: 1 # providerSpec: # labels: # mylabel: 'fra1-c' # cloudProviderSpec: # instanceType: 't3.medium' # diskSize: 50 # diskType: 'gp2' # operatingSystem: 'ubuntu' # operatingSystemSpec: # distUpgradeOnBoot: true loggingConfig: containerLogMaxSize: "100Mi" containerLogMaxFiles: 5 ```

What cloud provider are you running on?

baremetal

What operating system are you running in your cluster?

Ubuntu 22.04.4 LTS x86_64

Additional information

cloudziu commented 8 months ago

I have the same issue.

kubeone version
{
  "kubeone": {
    "major": "1",
    "minor": "6",
    "gitVersion": "1.6.2",
    "gitCommit": "184adc3b7d0c1e2e7630ded518cbfdfab7300755",
    "gitTreeState": "",
    "buildDate": "2023-04-14T11:20:23Z",
    "goVersion": "go1.19.8",
    "compiler": "gc",
    "platform": "linux/amd64"
  },
  "machine_controller": {
    "major": "1",
    "minor": "56",
    "gitVersion": "v1.56.2",
    "gitCommit": "",
    "gitTreeState": "",
    "buildDate": "",
    "goVersion": "",
    "compiler": "",
    "platform": "linux/amd64"
  }
}

uname -a

Linux kube-eleven-6f4cc8f787-w6482 5.15.133+ #1 SMP Sat Dec 30 11:18:04 UTC 2023 x86_64 Linux
kron4eg commented 8 months ago

Minimal config to reproduce and simulate the baremetal in cloud:

apiVersion: kubeone.k8c.io/v1beta2
kind: KubeOneCluster

name: artioms
versions:
  kubernetes: 1.28.6

apiEndpoint:
  host: 172.31.82.15
  port: 6443

cloudProvider:
  none: {}

controlPlane:
  hosts:
  - bastion: 51.44.21.65
    bastionPort: 22
    bastionUser: ubuntu
    privateAddress: 172.31.82.15
    sshAgentSocket: env:SSH_AUTH_SOCK
    sshPort: 22
    sshUsername: ubuntu

machineController:
  deploy: false
$ kubeone version
{
  "kubeone": {
    "major": "1",
    "minor": "7",
    "gitVersion": "v1.7.2",
    "gitCommit": "00fd09d91da76e307f016afb3b4f42ad6281eb2c",
    "gitTreeState": "",
    "buildDate": "2024-02-22T12:30:21+02:00",
    "goVersion": "go1.22.0",
    "compiler": "gc",
    "platform": "linux/amd64"
  },
  "machine_controller": {
    "major": "1",
    "minor": "57",
    "gitVersion": "v1.57.4",
    "gitCommit": "",
    "gitTreeState": "",
    "buildDate": "",
    "goVersion": "",
    "compiler": "",
    "platform": "linux/amd64"
  }
}
$ kubeone apply -v -d --auto-approve -m kubeone_dump.yaml
INFO[12:24:05 EET] Determine hostname...
[172.31.82.15] + export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/sbin:/usr/local/bin:/opt/bin
[172.31.82.15] + PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/sbin:/usr/local/bin:/opt/bin
[172.31.82.15] ++ hostname -f
[172.31.82.15] + fqdn=ip-172-31-82-15.eu-west-3.compute.internal
[172.31.82.15] + '[' ip-172-31-82-15.eu-west-3.compute.internal = localhost ']'
[172.31.82.15] + echo -n ip-172-31-82-15.eu-west-3.compute.internal
[172.31.82.15] ip-172-31-82-15.eu-west-3.compute.internal
DEBU[12:24:11 EET] Hostname is detected: "ip-172-31-82-15.eu-west-3.compute.internal"  node=172.31.82.15
INFO[12:24:11 EET] Determine operating system...
DEBU[12:24:11 EET] Operating system detected: "ubuntu"           node=172.31.82.15
INFO[12:24:11 EET] Running host probes...
Host: "ip-172-31-82-15.eu-west-3.compute.internal"
        Host initialized: no
        containerd healthy: no (unknown)
        Kubelet healthy: no (unknown)

Everything else is cut for brevity as it's evident it worked past Determine operating system.... Please help me to help you. What do you have in the /etc/ssh/sshd_config?

cloudziu commented 8 months ago

sshd config:

Include /etc/ssh/sshd_config.d/*.conf
PasswordAuthentication no
KbdInteractiveAuthentication no
UsePAM yes
X11Forwarding yes
PrintMotd no
AcceptEnv LANG LC_*
Subsystem   sftp    /usr/lib/openssh/sftp-server
PermitRootLogin without-password
PubkeyAuthentication yes
PermitRootLogin without-password
PubkeyAuthentication yes
ClientAliveInterval 120
UseDNS no

I can access the targeted VM with ssh commands.

For more context: When I was running the kubeone, those are the ssh auth logs from the target VM:

Feb 22 09:58:56 e2e-xw7je8s-gcp-control-fxhqomq-1 sshd[2106364]: Accepted publickey for root from 35.198.92.172 port 65190 ssh2: RSA SHA256:E4mHS/VOYiD9cXGcB7s35lOLX8T5nifhvBGN+skKFFU
Feb 22 09:58:56 e2e-xw7je8s-gcp-control-fxhqomq-1 sshd[2106364]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Feb 22 09:58:56 e2e-xw7je8s-gcp-control-fxhqomq-1 systemd-logind[872]: New session 1767 of user root.
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo:     root : PWD=/root ; USER=root ; COMMAND=/usr/bin/cat /etc/os-release
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: pam_unix(sudo:session): session closed for user root
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo:     root : PWD=/root ; USER=root ; COMMAND=/usr/bin/cat /etc/kubernetes/pki/apiserver-etcd-client.crt
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: pam_unix(sudo:session): session closed for user root
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo:     root : PWD=/root ; USER=root ; COMMAND=/usr/bin/cat /etc/kubernetes/pki/apiserver-kubelet-client.crt
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: pam_unix(sudo:session): session closed for user root
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo:     root : PWD=/root ; USER=root ; COMMAND=/usr/bin/cat /etc/kubernetes/pki/apiserver.crt
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
Feb 22 09:58:57 e2e-xw7je8s-gcp-control-fxhqomq-1 sudo: pam_unix(sudo:session): session closed for user root

When I terminate kubeone process, I'm seeing those in auth log:

Feb 22 09:59:40 e2e-xw7je8s-gcp-control-fxhqomq-1 sshd[2106364]: pam_unix(sshd:session): session closed for user root
Feb 22 09:59:40 e2e-xw7je8s-gcp-control-fxhqomq-1 systemd-logind[872]: Session 1767 logged out. Waiting for processes to exit.
Feb 22 09:59:40 e2e-xw7je8s-gcp-control-fxhqomq-1 systemd-logind[872]: Removed session 1767.

I've tried to run kubeone apply from two different machines. From a VM in cloud and from my local PC. Both can access the target VM by ssh.

kron4eg commented 8 months ago

@cloudziu so what's the error message you're getting?

cloudziu commented 8 months ago

Output from kubeone apply:

INFO[12:02:49 CET] Running host probes...
ERRO[12:02:51 CET] ssh: popen
Process exited with status 1       node=34.147.205.117
WARN[12:02:51 CET] Task failed, error was: runtime: running task on "34.147.205.117"
ssh: popen
Process exited with status 1
WARN[12:03:01 CET] Retrying task...
INFO[12:03:01 CET] Running host probes...
ERRO[12:03:03 CET] ssh: popen
Process exited with status 1       node=34.147.205.117
WARN[12:03:03 CET] Task failed, error was: runtime: running task on "34.147.205.117"
ssh: popen
Process exited with status 1

I think I might find the issue, but I need to confirm it. Running strace I found that on one of my control plane I'm missing apiserver.crt. This is strace of sshd process on the target VM.

[pid  4982] openat(AT_FDCWD, "/etc/kubernetes/pki/apiserver.crt", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid  4982] write(2, "cat: ", 5)        = 5
[pid  4861] <... ppoll resumed>)        = 1 ([{fd=11, revents=POLLIN}], left {tv_sec=119, tv_nsec=874212158})

This is the point where the commands are looping, and I'm getting the ssh popen error. If that the issue I think that the error could be more descriptive

kron4eg commented 8 months ago

@cloudziu can you confirm that you have a passwordless sudo user? I.e. ssh user@host sudo id works.

cloudziu commented 8 months ago

@kron4eg yes that's correct. But I need to use a private key. (Which I also am providing in kubeone manifest)

ssh -o IdentitiesOnly=yes -i ./private.pem root@34.147.205.117 sudo id
uid=0(root) gid=0(root) groups=0(root),1001(google-sudoers)
kron4eg commented 8 months ago

Well... I don't know how to reproduce the problem, it works on my end.

cloudziu commented 8 months ago

Ok I was able to reproduce the issue.

  1. Create a new k8s cluster:
    
    apiVersion: kubeone.k8c.io/v1beta2
    kind: KubeOneCluster
    name: 'test'

versions: kubernetes: 'v1.26.3'

features: coreDNS: replicas: 2 deployPodDisruptionBudget: true

clusterNetwork: cni: cilium: enableHubble: true

cloudProvider: none: {} external: false

controlPlane: hosts:

staticWorkers: hosts:

machineController: deploy: false


2. SSH into control-plane node.
3. Move/remove files:

/etc/kubernetes/pki/apiserver.crt /etc/kubernetes/pki/apiserver.key


4. Try to run `./kubeone apply -m kubeone.yaml -y -d -v` again. You should get the `ssh: popen` error.

This was exactly why it was failing for me in first place. I think that its not a problem with `kubeone` itself. But it would be great if the error message would be more descriptive.
kron4eg commented 8 months ago

Without those filed apiserver wouldn't even start, isn't it?

cloudziu commented 8 months ago

Yep, it would not.