kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.16k stars 6.48k forks source link

Installation fail: kubeadm [Initialize first master] #5139

Closed johnzheng1975 closed 4 years ago

johnzheng1975 commented 5 years ago

Environment:

Kubespray version (commit) (git rev-parse --short HEAD): git checkout release-2.10 git checkout release-2.11

Network plugin used: cilium 1.3.7

Copy of your inventory file: inventory.zip

Command used to invoke ansible:

ssh-add  ~/.ssh/tempprivate
eval "$(ssh-agent -s)"
cd contrib/terraform/aws
vi terraform.tfvars
terraform init
terraform apply -var-file=credentials.tfvars
ansible-playbook -i ./inventory/hosts ./cluster.yml -e ansible_ssh_user=core -e bootstrap_os=coreos -b --become-user=root --flush-cache -e ansible_user=core

Output of ansible run:

image

Error TASK [kubernetes/master : kubeadm | Initialize first master] *** Tuesday 03 September 2019 07:14:25 +0000 (0:00:00.520) 0:22:02.910 * FAILED - RETRYING: kubeadm | Initialize first master (3 retries left). FAILED - RETRYING: kubeadm | Initialize first master (2 retries left). FAILED - RETRYING: kubeadm | Initialize first master (1 retries left). fatal: [kubernetes-dev0210john0903-master0]: FAILED! => {"attempts": 3, "changed": true, "cmd": ["timeout", "-k", "600s", "600s", "/opt/bin/kubeadm", "init", "--config=/etc/kubernetes/kubeadm-config.yaml", "--ignore-preflight-errors=all", "--skip-phases=addon/coredns", "--experimental-upload-certs", "--certificate-key=ecabe44f2d9ce1b2edbb702c8a9c77d5c84bb9cb4da05eb42fcba3dfe4ec5b5e"], "delta": "0:02:02.449063", "end": "2019-09-03 07:23:13.971380", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2019-09-03 07:21:11.522317", "stderr": "\t[WARNING Port-6443]: Port 6443 is in use\n\t[WARNING Port-10251]: Port 10251 is in use\n\t[WARNING Port-10252]: Port 10252 is in use\n\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists\n\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists\n\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists\n\t[WARNING IsDockerSystemdCheck]: detected \"cgroupfs\" as the Docker cgroup driver. The recommended driver is \"systemd\". Please follow the guide at https://kubernetes.io/docs/setup/cri/\n\t[WARNING Port-10250]: Port 10250 is in use\nerror execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition", "stderr_lines": ["\t[WARNING Port-6443]: Port 6443 is in use", "\t[WARNING Port-10251]: Port 10251 is in use", "\t[WARNING Port-10252]: Port 10252 is in use", "\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists", "\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists", "\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists", "\t[WARNING IsDockerSystemdCheck]: detected \"cgroupfs\" as the Docker cgroup driver. The recommended driver is \"systemd\". Please follow the guide at https://kubernetes.io/docs/setup/cri/", "\t[WARNING Port-10250]: Port 10250 is in use", "error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition"], "stdout": "[init] Using Kubernetes version: v1.14.6\n[preflight] Running pre-flight checks\n[preflight] Pulling images required for setting up a Kubernetes cluster\n[preflight] This might take a minute or two, depending on the speed of your internet connection\n[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Activating the kubelet service\n[certs] Using certificateDir folder \"/etc/kubernetes/ssl\"\n[certs] Using existing ca certificate authority\n[certs] Using existing apiserver certificate and key on disk\n[certs] Using existing apiserver-kubelet-client certificate and key on disk\n[certs] Using existing front-proxy-ca certificate authority\n[certs] Using existing front-proxy-client certificate and key on disk\n[certs] External etcd mode: Skipping etcd/ca certificate authority generation\n[certs] External etcd mode: Skipping etcd/healthcheck-client certificate authority generation\n[certs] External etcd mode: Skipping etcd/server certificate authority generation\n[certs] External etcd mode: Skipping etcd/peer certificate authority generation\n[certs] External etcd mode: Skipping apiserver-etcd-client certificate authority generation\n[certs] Using the existing \"sa\" key\n[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"\n[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/admin.conf\"\n[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/kubelet.conf\"\n[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/controller-manager.conf\"\n[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/scheduler.conf\"\n[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"\n[control-plane] Creating static Pod manifest for \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"\n[control-plane] Creating static Pod manifest for \"kube-controller-manager\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"\n[control-plane] Creating static Pod manifest for \"kube-scheduler\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"\n[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 5m0s\n[apiclient] All control plane components are healthy after 0.010987 seconds\n[upload-config] storing the configuration used in ConfigMap \"kubeadm-config\" in the \"kube-system\" Namespace\n[kubelet] Creating a ConfigMap \"kubelet-config-1.14\" in namespace kube-system with the configuration for the kubelets in the cluster\n[kubelet-check] Initial timeout of 40s passed.", "stdout_lines": ["[init] Using Kubernetes version: v1.14.6", "[preflight] Running pre-flight checks", "[preflight] Pulling images required for setting up a Kubernetes cluster", "[preflight] This might take a minute or two, depending on the speed of your internet connection", "[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'", "[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"", "[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"", "[kubelet-start] Activating the kubelet service", "[certs] Using certificateDir folder \"/etc/kubernetes/ssl\"", "[certs] Using existing ca certificate authority", "[certs] Using existing apiserver certificate and key on disk", "[certs] Using existing apiserver-kubelet-client certificate and key on disk", "[certs] Using existing front-proxy-ca certificate authority", "[certs] Using existing front-proxy-client certificate and key on disk", "[certs] External etcd mode: Skipping etcd/ca certificate authority generation", "[certs] External etcd mode: Skipping etcd/healthcheck-client certificate authority generation", "[certs] External etcd mode: Skipping etcd/server certificate authority generation", "[certs] External etcd mode: Skipping etcd/peer certificate authority generation", "[certs] External etcd mode: Skipping apiserver-etcd-client certificate authority generation", "[certs] Using the existing \"sa\" key", "[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"", "[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/admin.conf\"", "[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/kubelet.conf\"", "[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/controller-manager.conf\"", "[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/scheduler.conf\"", "[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"", "[control-plane] Creating static Pod manifest for \"kube-apiserver\"", "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"", "[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"", "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"", "[control-plane] Creating static Pod manifest for \"kube-controller-manager\"", "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"", "[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"", "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"", "[control-plane] Creating static Pod manifest for \"kube-scheduler\"", "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"", "[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"", "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"", "[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 5m0s", "[apiclient] All control plane components are healthy after 0.010987 seconds", "[upload-config] storing the configuration used in ConfigMap \"kubeadm-config\" in the \"kube-system\" Namespace", "[kubelet] Creating a ConfigMap \"kubelet-config-1.14\" in namespace kube-system with the configuration for the kubelets in the cluster", "[kubelet-check] Initial timeout of 40s passed."]}

Anything else do we need to know: Raise both on: release-2.10 release-2.11

Previously, seems runs correctly for some version, but now always fail.

Is below the root reason? WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd".

alijahnas commented 5 years ago

Do your nodes already have kubernetes services? The ports are already in use. It seems like some people already got the same problems here: https://github.com/kubernetes/kubeadm/issues/1438 Maybe you could find some help there.

ppcololo commented 5 years ago

same problem on openstack

Sryther commented 5 years ago

I received this kind of error yesterday and I understood the error can't be one of the warnings you can find in the logs, since it's only warnings and mostly because the error is thrown after the timeout is reached (and 3 retries).

In my case, I had an issue with the kubeadm join command, but the way I handled the issue could be the same for you. You can try to manually perform the kubeadm init command with the verbose option (-v 5 for example):

# This is the command Ansible is trying to run in your original post
timeout -k 600s 600s \
  /opt/bin/kubeadm init \
  --config=/etc/kubernetes/kubeadm-config.yaml \
  --ignore-preflight-errors=all \
  --skip-phases=addon/coredns \
  --experimental-upload-certs \
  --certificate-key=ecabe44f2d9ce1b2edbb702c8a9c77d5c84bb9cb4da05eb42fcba3dfe4ec5b5e \
  -v 5

This should give you some hints.

rguichard commented 5 years ago

Hi,

Had the same issue. In my case it was a IP<>Address problem in group_vars/all/all.yml

## External LB example config
apiserver_loadbalancer_domain_name: "kubernetes.tld"
loadbalancer_apiserver:
  address: 10.1.10.127
  port: 443

kubernetes.tld should match 10.1.10.127. So either update the DNS record or change the variable.

Hope it will help someone !

olehhordiienko commented 5 years ago

any updates? have the same issue with cloud_provider: aws, the same scenario (

rstriedl5c commented 5 years ago

same problem on openstack. I've tried the above but the install does not finish the TASK [kubernetes/master : kubeadm | Initialize first master] . Any help would be appreciated.

ppcololo commented 5 years ago

same problem on openstack. I've tried the above but the install does not finish the TASK [kubernetes/master : kubeadm | Initialize first master] . Any help would be appreciated.

try to add kubelet_cgroup_driver: "cgroupfs" to group_vars/k8s-cluster/k8s-cluster.yaml

olehhordiienko commented 5 years ago

same problem on openstack. I've tried the above but the install does not finish the TASK [kubernetes/master : kubeadm | Initialize first master] . Any help would be appreciated.

try to add kubelet_cgroup_driver: "cgroupfs" to group_vars/k8s-cluster/k8s-cluster.yaml

doesn't help in case AWS, still ports in use and [kubernetes/master : kubeadm | Initialize first master] failed

rstriedl5c commented 5 years ago

same problem on openstack. I've tried the above but the install does not finish the TASK [kubernetes/master : kubeadm | Initialize first master] . Any help would be appreciated.

try to add kubelet_cgroup_driver: "cgroupfs" to group_vars/k8s-cluster/k8s-cluster.yaml

Still the same issue on OpenStack.


FAILED - RETRYING: kubeadm | Initialize first master (2 retries left).Result was: {
    "attempts": 2,
    "changed": true,
    "cmd": [
        "timeout",
        "-k",
        "300s",
        "300s",
        "/usr/local/bin/kubeadm",
        "init",
        "--config=/etc/kubernetes/kubeadm-config.yaml",
        "--ignore-preflight-errors=all",
        "--skip-phases=addon/coredns",
        "--upload-certs"
    ],
    "delta": "0:05:00.007142",
    "end": "2019-10-02 15:31:43.120801",
    "failed_when_result": true,
    "invocation": {
        "module_args": {
            "_raw_params": "timeout -k 300s 300s /usr/local/bin/kubeadm init --config=/etc/kubernetes/kubeadm-config.yaml --ignore-preflight-errors=all --skip-phases=addon/coredns   --upload-certs  ",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "warn": true
        }
    },
    "msg": "non-zero return code",
    "rc": 124,
    "retries": 4,
    "start": "2019-10-02 15:26:43.113659",
    "stderr": "\t[WARNING Port-10251]: Port 10251 is in use\n\t[WARNING Port-10252]: Port 10252 is in use\n\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists\n\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists\n\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists\n\t[WARNING IsDockerSystemdCheck]: detected \"cgroupfs\" as the Docker cgroup driver. The recommended driver is \"systemd\". Please follow the guide at https://kubernetes.io/docs/setup/cri/\n\t[WARNING Port-10250]: Port 10250 is in use",
    "stderr_lines": [
        "\t[WARNING Port-10251]: Port 10251 is in use",
        "\t[WARNING Port-10252]: Port 10252 is in use",
        "\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists",
        "\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists",
        "\t[WARNING FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists",
        "\t[WARNING IsDockerSystemdCheck]: detected \"cgroupfs\" as the Docker cgroup driver. The recommended driver is \"systemd\". Please follow the guide at https://kubernetes.io/docs/setup/cri/",
        "\t[WARNING Port-10250]: Port 10250 is in use"
    ],
    "stdout": "[init] Using Kubernetes version: v1.15.3\n[preflight] Running pre-flight checks\n[preflight] Pulling images required for setting up a Kubernetes cluster\n[preflight] This might take a minute or two, depending on the speed of your internet connection\n[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Activating the kubelet service\n[certs] Using certificateDir folder \"/etc/kubernetes/ssl\"\n[certs] Using existing front-proxy-ca certificate authority\n[certs] Using existing front-proxy-client certificate and key on disk\n[certs] External etcd mode: Skipping etcd/ca certificate authority generation\n[certs] External etcd mode: Skipping etcd/healthcheck-client certificate authority generation\n[certs] External etcd mode: Skipping etcd/server certificate authority generation\n[certs] External etcd mode: Skipping etcd/peer certificate authority generation\n[certs] External etcd mode: Skipping apiserver-etcd-client certificate authority generation\n[certs] Using existing ca certificate authority\n[certs] Using existing apiserver certificate and key on disk\n[certs] Using existing apiserver-kubelet-client certificate and key on disk\n[certs] Using the existing \"sa\" key\n[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"\n[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/admin.conf\"\n[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/kubelet.conf\"\n[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/controller-manager.conf\"\n[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/scheduler.conf\"\n[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"\n[control-plane] Creating static Pod manifest for \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"\n[control-plane] Creating static Pod manifest for \"kube-controller-manager\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"\n[control-plane] Creating static Pod manifest for \"kube-scheduler\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"\n[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"\n[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 5m0s\n[kubelet-check] Initial timeout of 40s passed.",
    "stdout_lines": [
        "[init] Using Kubernetes version: v1.15.3",
        "[preflight] Running pre-flight checks",
        "[preflight] Pulling images required for setting up a Kubernetes cluster",
        "[preflight] This might take a minute or two, depending on the speed of your internet connection",
        "[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'",
        "[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"",
        "[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"",
        "[kubelet-start] Activating the kubelet service",
        "[certs] Using certificateDir folder \"/etc/kubernetes/ssl\"",
        "[certs] Using existing front-proxy-ca certificate authority",
        "[certs] Using existing front-proxy-client certificate and key on disk",
        "[certs] External etcd mode: Skipping etcd/ca certificate authority generation",
        "[certs] External etcd mode: Skipping etcd/healthcheck-client certificate authority generation",
        "[certs] External etcd mode: Skipping etcd/server certificate authority generation",
        "[certs] External etcd mode: Skipping etcd/peer certificate authority generation",
        "[certs] External etcd mode: Skipping apiserver-etcd-client certificate authority generation",
        "[certs] Using existing ca certificate authority",
        "[certs] Using existing apiserver certificate and key on disk",
        "[certs] Using existing apiserver-kubelet-client certificate and key on disk",
        "[certs] Using the existing \"sa\" key",
        "[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"",
        "[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/admin.conf\"",
        "[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/kubelet.conf\"",
        "[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/controller-manager.conf\"",
        "[kubeconfig] Using existing kubeconfig file: \"/etc/kubernetes/scheduler.conf\"",
        "[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"",
        "[control-plane] Creating static Pod manifest for \"kube-apiserver\"",
        "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"",
        "[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"",
        "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"",
        "[control-plane] Creating static Pod manifest for \"kube-controller-manager\"",
        "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"",
        "[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"",
        "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"",
        "[control-plane] Creating static Pod manifest for \"kube-scheduler\"",
        "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-apiserver\"",
        "[controlplane] Adding extra host path mount \"usr-share-ca-certificates\" to \"kube-apiserver\"",
        "[controlplane] Adding extra host path mount \"cloud-config\" to \"kube-controller-manager\"",
        "[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 5m0s",
        "[kubelet-check] Initial timeout of 40s passed."
    ]
}```
ppcololo commented 5 years ago

@rstriedl5c you know, i faced with lot of problem with kubespray+terraform+openstack. for example: init cant be success because of master cant connect to openstack - cant resolve hostname. try to connect to master via SSH and do journalctl -u kubelet - you will see why kubelet can't start.

in future you will face with problems in openstack like - insufficient rules in SG groups and so on...

rstriedl5c commented 5 years ago

@ppcololo Thanks for the information.

What's odd it can't talk the api endpoint on 6443 on my master. I've open the world to the security group. see logs below. I'm trying to use flannel vs calico to start as the cni.

Also you can see my /etc/hosts file is updated with private ip's but not the floating ip's.

# Ansible inventory hosts BEGIN
10.0.0.1 my-cluster-master-nf-1.k8s-os-lab.cluster.local my-cluster-master-nf-1
10.0.0.2 my-cluster-master-nf-2.k8s-os-lab.cluster.local my-cluster-master-nf-2
10.0.0.3 my-cluster-master-nf-3.k8s-os-lab.cluster.local my-cluster-master-nf-3
10.0.0.4 my-cluster-node-nf-1.k8s-os-lab.cluster.local my-cluster-node-nf-1
10.0.0.5 my-cluster-node-nf-2.k8s-os-lab.cluster.local my-cluster-node-nf-2
# Ansible inventory hosts END

Oct 02 17:09:14 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:14.081591   28559 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.0.1:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dk8s-os-la
Oct 02 17:09:14 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:14.116793   28559 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Oct 02 17:09:14 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:14.163476   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:14 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:14.264056   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:14 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:14.281876   28559 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://10.0.0.1:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-os-lab-k8s-
Oct 02 17:09:14 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:14.364658   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:14 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:14.431705   28559 controller.go:125] failed to ensure node lease exists, will retry in 7s, error: Get https://10.0.0.1:6443/apis/coordination.k8s.io/v1beta1/namespaces/kube-node-lease/leases/
Oct 02 17:09:14 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:14.464956   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:14 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:14.481289   28559 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://10.0.0.1:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.0.5
Oct 02 17:09:14 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:14.565172   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:14 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:14.665444   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:14 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:14.681411   28559 reflector.go:125] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.RuntimeClass: Get https://10.0.0.1:6443/apis/node.k8s.io/v1beta1/runtimeclasses?limit=500&res
Oct 02 17:09:14 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:14.765792   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:14 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:14.866077   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:14 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:14.881913   28559 reflector.go:125] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.CSIDriver: Get https://10.0.0.1:6443/apis/storage.k8s.io/v1beta1/csidrivers?limit=500&resourc
Oct 02 17:09:14 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:14.966325   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:15.066548   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:15.082374   28559 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.0.1:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dk8s-os-la
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:15.166787   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:15.267103   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:15.283370   28559 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://10.0.0.1:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-os-lab-k8s-
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:15.367429   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:15.467807   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:15.482494   28559 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://10.0.0.1:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.0.5
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:15.568113   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:15.668455   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: I1002 17:09:15.676684   28559 kubelet_node_status.go:286] Setting node annotation to enable volume controller attach/detach
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: I1002 17:09:15.676988   28559 setters.go:73] Using node IP: "10.0.0.1"
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: I1002 17:09:15.680066   28559 kubelet_node_status.go:471] Recording NodeHasSufficientMemory event message for node my-cluster-master-nf-1
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: I1002 17:09:15.680122   28559 kubelet_node_status.go:471] Recording NodeHasNoDiskPressure event message for node my-cluster-master-nf-1
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: I1002 17:09:15.680140   28559 kubelet_node_status.go:471] Recording NodeHasSufficientPID event message for node my-cluster-master-nf-1
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:15.681140   28559 pod_workers.go:190] Error syncing pod bcb3aff273a63df587968bf0c241649e ("kube-apiserver-my-cluster-master-nf-1_kube-system(bcb3aff273a63df587968bf0c241649e)"), skipping:
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:15.682221   28559 reflector.go:125] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.RuntimeClass: Get https://10.0.0.1:6443/apis/node.k8s.io/v1beta1/runtimeclasses?limit=500&res
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:15.768769   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:15.869116   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:15.882533   28559 reflector.go:125] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.CSIDriver: Get https://10.0.0.1:6443/apis/storage.k8s.io/v1beta1/csidrivers?limit=500&resourc
Oct 02 17:09:15 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:15.969459   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:16 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:16.029084   28559 event.go:249] Unable to write event: 'Patch https://10.0.0.1:6443/api/v1/namespaces/default/events/my-cluster-master-nf-1.15c9de871039b848: dial tcp 10.0.0.1:6443: conne
Oct 02 17:09:16 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:16.070139   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:16 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:16.083251   28559 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.0.1:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dk8s-os-la
Oct 02 17:09:16 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:16.170395   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:16 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:16.270668   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:16 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:16.284576   28559 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://10.0.0.1:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-os-lab-k8s-
Oct 02 17:09:16 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:16.371004   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 17:09:16 my-cluster-master-nf-1 kubelet[28559]: E1002 17:09:16.471221   28559 kubelet.go:2248] node "my-cluster-master-nf-1" not found```
ppcololo commented 5 years ago

do you use flannel cni? i had this problem. try to check like this way: create file on ur master1 /etc/cni/net.d/10-flannel.conflist with content:

{
  "name": "cbr0",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}

and then see journal and systemctl status kubelet

rstriedl5c commented 5 years ago

@ppcololo I did not have this file, I've added to my master. I don't see any change to my journal or kubelet status. Do I need to run the ansible playbook?

ppcololo commented 5 years ago

what network plugin do you use? flannel/calico/canal? after adding file send kubelet status and journal last logs. and mb try to restart kubelet systemctl restart kubelet

rstriedl5c commented 5 years ago

@ppcololo I'm using flannel. I did the restart of kubelet below.


● kubelet.service - Kubernetes Kubelet Server
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-10-02 20:46:59 UTC; 1min 14s ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
 Main PID: 6481 (kubelet)
    Tasks: 0 (limit: 4915)
   CGroup: /system.slice/kubelet.service
           └─6481 /usr/local/bin/kubelet --logtostderr=true --v=2 --node-ip=10.0.0.1 --hostname-override=my-cluster-master-nf-1 --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --config=/etc/kubernetes/kubelet-config.yaml --kubeconfig=/etc/kubernetes

Oct 02 20:48:12 my-cluster-master-nf-1 kubelet[6481]: E1002 20:48:12.567131    6481 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 20:48:12 my-cluster-master-nf-1 kubelet[6481]: E1002 20:48:12.667356    6481 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 20:48:12 my-cluster-master-nf-1 kubelet[6481]: E1002 20:48:12.725590    6481 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.0.1:6443/api/v1/pods?fieldSelector=spec.nodeName%3D-k8s-os-lab
Oct 02 20:48:12 my-cluster-master-nf-1 kubelet[6481]: E1002 20:48:12.767684    6481 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 20:48:12 my-cluster-master-nf-1 kubelet[6481]: E1002 20:48:12.868096    6481 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 20:48:12 my-cluster-master-nf-1 kubelet[6481]: E1002 20:48:12.925268    6481 reflector.go:125] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.RuntimeClass: Get https://10.0.0.1:6443/apis/node.k8s.io/v1beta1/runtimeclasses?limit=500&reso
Oct 02 20:48:12 my-cluster-master-nf-1 kubelet[6481]: E1002 20:48:12.968432    6481 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 20:48:13 my-cluster-master-nf-1 kubelet[6481]: E1002 20:48:13.068759    6481 kubelet.go:2248] node "my-cluster-master-nf-1" not found
Oct 02 20:48:13 my-cluster-master-nf-1 kubelet[6481]: E1002 20:48:13.125289    6481 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://10.0.0.1:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.0.0.1
Oct 02 20:48:13 my-cluster-master-nf-1 kubelet[6481]: E1002 20:48:13.168976    6481 kubelet.go:2248] node "my-cluster-master-nf-1" not found```
ppcololo commented 5 years ago

@rstriedl5c kubelet running but need to see full journalctl -u kubelet and just for fun try to use canal driver - with this network plugin I have no problem on openstack

olehhordiienko commented 5 years ago

Guys, look on this https://github.com/kubernetes-sigs/kubespray/pull/4338/files https://github.com/kubernetes-sigs/kubespray/pull/4338

In my case it helps.

rstriedl5c commented 5 years ago

@olehhordiienko Thanks. We are both using OpenStack. The fix appears to be for AWS.

ppcololo commented 5 years ago

@rstriedl5c all changes i made: in k8s-cluster.yml:

kube_version: v1.14.5
kube_network_plugin: canal
resolvconf_mode: host_resolvconf
node_volume_attach_limit: 26
kubelet_cgroup_driver: "cgroupfs"

all.yml

cloud_provider: openstack
upstream_dns_servers:
  - x.x.x.x
  - x.x.x.x

resource "openstack_networking_secgroup_rule_v2" "k8s_master_etcd" { direction = "ingress" ethertype = "IPv4" protocol = "tcp" port_range_min = "2370" port_range_max = "2380" remote_ip_prefix = "0.0.0.0/0" security_group_id = "${openstack_networking_secgroup_v2.k8s_master.id}" }

resource "openstack_networking_secgroup_rule_v2" "k8s_master_kube" { direction = "ingress" ethertype = "IPv4" protocol = "tcp" port_range_min = "10240" port_range_max = "10260" remote_ip_prefix = "0.0.0.0/0" security_group_id = "${openstack_networking_secgroup_v2.k8s_master.id}"


now I have well installed k8s cluster in openstack. but you know - i tried another tool (kops) and got cluster without a lot of pain (they added openstack support with LB)
rstriedl5c commented 5 years ago

@ppcololo I believe you're using CentOS, correct?

here are my configs since, I'm using Ubuntu.

# Can be docker_dns, host_resolvconf or none
# Default:
resolvconf_mode: docker_dns
# For Container Linux by CoreOS:
# resolvconf_mode: host_resolvconf```

in my k8s-cluster.yml:

kube_version: v1.14.5
kube_network_plugin: flannel
resolvconf_mode: host_resolvconf
node_volume_attach_limit: 26
kubelet_cgroup_driver: "cgroupfs"

I've added the SG rules in compute module.

I've used Kops on AWS, and other clouds before. Can you send me the Kops command you ran to create the OpenStack cluster? Plus any other things to consider with Kops OpenStack K8s cluster. Thanks in advance.

I will try the above Kubespray changes and let you know if it works for me. Thanks again.

rstriedl5c commented 5 years ago

@ppcololo silly question, how you generating your inventory.ini file? Does your's look similar to the following....I'm trying to setup GlusterFS nodes too. At this time, not using a bastion host.


# ## Configure 'ip' variable to bind kubernetes services on a
# ## different ip than the default iface
# ## We should set etcd_member_name for etcd cluster. The node that is not a etcd member do not need to set the value, or can set the empty string value.
[all]
my-cluster-k8s-master-nf-1 ansible_host={floating_ip} ansible_become=yes ansible_user=ubuntu ansible_become_method=sudo ip=10.0.0.7 etcd_member_name=etcd1
my-cluster-k8s-master-nf-2 ansible_host={floating_ip} ansible_become=yes ansible_user=ubuntu ansible_become_method=sudo ip=10.0.0.8 etcd_member_name=etcd2
my-cluster-k8s-master-nf-3 ansible_host={floating_ip} ansible_become=yes ansible_user=ubuntu ansible_become_method=sudo ip=10.0.0.9 etcd_member_name=etcd3
my-cluster-k8s-node-nf-1 ansible_host={floating_ip} ansible_become=yes ansible_user=ubuntu ansible_become_method=sudo ip=10.0.0.10
my-cluster-k8s-node-nf-2 ansible_host={floating_ip} ansible_become=yes ansible_user=ubuntu ansible_become_method=sudo ip=10.0.0.11
my-cluster-gfs-node-nf-1 ansible_host=10.0.0.3 ansible_become=yes ansible_user=ubuntu ansible_become_method=sudo ip=10.0.0.3
my-cluster-gfs-node-nf-2 ansible_host=10.0.0.4 ansible_become=yes ansible_user=ubuntu ansible_become_method=sudo ip=10.0.0.4
my-cluster-gfs-node-nf-3 ansible_host=10.0.0.5 ansible_become=yes ansible_user=ubuntu ansible_become_method=sudo ip=10.0.0.5

[all:vars]
ansible_python_interpreter=/usr/bin/python3

# ## configure a bastion host if your nodes are not directly reachable
# bastion ansible_host=x.x.x.x ansible_user=some_user
# [bastion]
# my-cluster-bastion-1 ansible_host={floating_ip} ip=10.0.0.7 ansible_become=yes ansible_user=ubuntu ansible_become_method=sudo

# [bastion:vars]
# ansible_python_interpreter=/usr/bin/python3

[kube-master]
my-cluster-k8s-master-nf-1
my-cluster-k8s-master-nf-2
my-cluster-k8s-master-nf-3

[kube-master:vars]
# ansible_ssh_extra_args="-o StrictHostKeyChecking=no"
ansible_python_interpreter=/usr/bin/python3

[kube-node]
my-cluster-k8s-node-nf-1
my-cluster-k8s-node-nf-2

[kube-node:vars]
# ansible_ssh_extra_args="-o StrictHostKeyChecking=no"
ansible_python_interpreter=/usr/bin/python3

[etcd]
my-cluster-k8s-master-nf-1
my-cluster-k8s-master-nf-2
my-cluster-k8s-master-nf-3

[etcd:vars]
ansible_python_interpreter=/usr/bin/python3

[gfs-cluster]
my-cluster-gfs-node-nf-1
my-cluster-gfs-node-nf-2
my-cluster-gfs-node-nf-3

[gfs-cluster:vars]
ansible_python_interpreter=/usr/bin/python3

[network-storage]
my-cluster-gfs-node-nf-1
my-cluster-gfs-node-nf-2
my-cluster-gfs-node-nf-3

[network-storage:vars]
ansible_python_interpreter=/usr/bin/python3

[calico-rr]

[k8s-cluster:children]
kube-master
kube-node
calico-rr
ppcololo commented 5 years ago

@rstriedl5c kubespray uses python script which parse terraform.tfstate file - this is inventory for ansible, i dont have another. when you deploy VM's via terraform hosts have groups name in metadata. that's how ansible knows in which group each host is. I can share command for kops, but not here - bcs it's kubespray repo and issue :)

rushins commented 5 years ago

hello,

i got very similar issue with kubespray with first master results every time error with the following error. i have the hosts on DNS and i can ping . also kubelet service is error due to files missing (/etc/kubernetes/ssl/ca.crt:)

fatal: [lvpaldbsvm28]: FAILED! => {"attempts": 3, "changed": true, "cmd": ["timeout", "-k", "300s", "300s", "/usr/local/bin/kubeadm", "init", "--config=/etc/kubernetes/kubeadm-config.yaml", "--ignore-preflight-errors=all", "--skip-phases=addon/coredns", "--upload-certs"], "delta": "0:00:00.108208", "end": "2019-10-20 15:59:24.267983", "failed_when_result": true, "msg": "non-zero return code", "rc": 3, "start": "2019-10-20 15:59:24.159775", "stderr": "[apiServer.certSANs: Invalid value: \"lvpaldbsvm28.pal.sap.corp\u00a0\": altname is not a valid IP address, DNS label or a DNS label

kubelet service shows the following error.

erver.go:251] unable to load client CA file /etc/kubernetes/ssl/ca.crt: open /etc/kubernetes/ssl/ca.crt: no such ... or directory Hint: Some lines were ellipsized, use -l to show in full.

khande-incomm commented 4 years ago

@johnzheng1975

Were you able to resolve the error that caused Failure on TASK [kubernetes/master : kubeadm | Initialize first master] I'm have a similar error on my Kubernetes install. I create a Github issue for the same https://github.com/kubernetes-sigs/kubespray/issues/5404.

Please let us know, it might help me with my issue. Thanks

posix4e commented 4 years ago

Hit with the same but only on my second and third test install. I'll see if i can find what config option i'm using which triggers it.

alter commented 4 years ago

any news here?

alijahnas commented 4 years ago

any news here?

You have a similar problem?

alter commented 4 years ago

Yeah, I've found that if I disable loadbalancer_apiserver option

#loadbalancer_apiserver:
#  address: 1.2.3.4
#  port: 443

setup is going successfully. And setup is stucking when I uncomment it(but it 100% worked half an year ago).

I use the latest release of kubespray from github and kubernetes 1.16.6

alter commented 4 years ago

I'm really sorry, but in this case it was my own issue, I've firewalled balancer host half an year ago and forgot about it, master hosts couldn't initialize b/c they didn't have connect to balancer.

johnzheng1975 commented 4 years ago

Close it since this is old issue

ak2766 commented 4 years ago

I'm really sorry, but in this case it was my own issue, I've firewalled balancer host half an year ago and forgot about it, master hosts couldn't initialize b/c they didn't have connect to balancer.

I believe kubespray scripts ought to make sure that all hosts (including the load balancer where an external one is being used) are reachable. In my case, I getting this same error using KVM VM's and in my case, the load balancer VM was stuck during boot waiting to do a fsck (or ctrl+D) in order to complete boot up.

Vulturem commented 1 year ago

https://github.com/kubernetes-sigs/kubespray/issues/5139#issuecomment-598141207

worked for me realised that keepalive server wasn't work properly.

deba10106 commented 1 year ago

facing the same issue on openstack. Trying to install kubernetes with kubespray with octavia external load balancer.

tompscanlan commented 9 months ago

Fresh install of 6 vms at ubuntu 20.04, after failing because of a missing "/etc/kubernetes" on the control plane nodes, I get hung at failure to "Initialize first master":

TASK [kubernetes/control-plane : Kubeadm | Initialize first master] ****************************************************************************************************************************************************************************************************************************************************************************************************************
task path: /home/tscanlan/projects/terraform-irr/terraform-kubespreay-infra/ansible/kubespray/roles/kubernetes/control-plane/tasks/kubeadm-setup.yml:178
fatal: [192.168.8.195]: FAILED! => {
    "attempts": 3,
    "changed": true,
    "cmd": [
        "timeout",
        "-k",
        "300s",
        "300s",
        "/usr/local/bin/kubeadm",
        "init",
        "--config=/etc/kubernetes/kubeadm-config.yaml",
        "--ignore-preflight-errors=all",
        "--v=8",
        "--skip-phases=addon/coredns",
        "--upload-certs"
    ],
    "delta": "0:00:00.307801",
    "end": "2024-01-29 15:04:36.878803",
    "failed_when_result": true,
    "invocation": {
        "module_args": {
            "_raw_params": "timeout -k 300s 300s /usr/local/bin/kubeadm init --config=/etc/kubernetes/kubeadm-config.yaml --ignore-preflight-errors=all --v=8 --skip-phases=addon/coredns --upload-certs",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "stdin_add_newline": true,
            "strip_empty_ends": true
        }
    },
    "msg": "non-zero return code",
    "rc": 1,
    "start": "2024-01-29 15:04:36.571002",
    "stderr": "I0129 15:04:36.659732   11762 initconfiguration.go:255] loading configuration from \"/etc/kubernetes/kubeadm-config.yaml\"\nW0129 15:04:36.671137   11762 utils.go:69] The recommended value for \"clusterDNS\" in \"KubeletConfiguration\" is: [10.233.0.10]; the provided value is: [169.254.25.10]\nI0129 15:04:36.671342   11762 kubelet.go:196] the value of KubeletConfiguration.cgroupDriver is empty; setting it to \"systemd\"\nW0129 15:04:36.683469   11762 checks.go:1064] [preflight] WARNING: Couldn't create the interface used for talking to the container runtime: crictl is required by the container runtime: executable file not found in $PATH\nI0129 15:04:36.683523   11762 checks.go:563] validating Kubernetes and kubeadm version\nI0129 15:04:36.683570   11762 checks.go:168] validating if the firewall is enabled and active\nI0129 15:04:36.707112   11762 checks.go:203] validating availability of port 6443\nI0129 15:04:36.707496   11762 checks.go:203] validating availability of port 10259\nI0129 15:04:36.707563   11762 checks.go:203] validating availability of port 10257\nI0129 15:04:36.707616   11762 checks.go:280] validating the existence of file /etc/kubernetes/manifests/kube-apiserver.yaml\nI0129 15:04:36.707658   11762 checks.go:280] validating the existence of file /etc/kubernetes/manifests/kube-controller-manager.yaml\nI0129 15:04:36.707675   11762 checks.go:280] validating the existence of file /etc/kubernetes/manifests/kube-scheduler.yaml\nI0129 15:04:36.707688   11762 checks.go:280] validating the existence of file /etc/kubernetes/manifests/etcd.yaml\nI0129 15:04:36.707702   11762 checks.go:430] validating if the connectivity type is via proxy or direct\nI0129 15:04:36.707781   11762 checks.go:469] validating http connectivity to first IP address in the CIDR\nI0129 15:04:36.707824   11762 checks.go:469] validating http connectivity to first IP address in the CIDR\nI0129 15:04:36.707845   11762 checks.go:639] validating whether swap is enabled or not\nI0129 15:04:36.707918   11762 checks.go:370] validating the presence of executable crictl\n\t[WARNING FileExisting-crictl]: crictl not found in system path\nI0129 15:04:36.708034   11762 checks.go:370] validating the presence of executable conntrack\nI0129 15:04:36.708074   11762 checks.go:370] validating the presence of executable ip\nI0129 15:04:36.708126   11762 checks.go:370] validating the presence of executable iptables\nI0129 15:04:36.708164   11762 checks.go:370] validating the presence of executable mount\nI0129 15:04:36.708210   11762 checks.go:370] validating the presence of executable nsenter\nI0129 15:04:36.708246   11762 checks.go:370] validating the presence of executable ebtables\nI0129 15:04:36.708284   11762 checks.go:370] validating the presence of executable ethtool\nI0129 15:04:36.708320   11762 checks.go:370] validating the presence of executable socat\nI0129 15:04:36.708356   11762 checks.go:370] validating the presence of executable tc\nI0129 15:04:36.708398   11762 checks.go:370] validating the presence of executable touch\nI0129 15:04:36.708435   11762 checks.go:516] running all checks\nI0129 15:04:36.749160   11762 checks.go:401] checking whether the given node name is valid and reachable using net.LookupHost\nI0129 15:04:36.749369   11762 checks.go:605] validating kubelet version\n\t[WARNING KubeletVersion]: couldn't get kubelet version: cannot execute 'kubelet --version': executable file not found in $PATH\nI0129 15:04:36.749732   11762 checks.go:130] validating if the \"kubelet\" service is enabled and active\n\t[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'\nI0129 15:04:36.769970   11762 checks.go:203] validating availability of port 10250\nI0129 15:04:36.770132   11762 checks.go:329] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables\n\t[WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist\nI0129 15:04:36.770212   11762 checks.go:329] validating the contents of file /proc/sys/net/ipv4/ip_forward\nI0129 15:04:36.770290   11762 checks.go:680] validating the external etcd version\nI0129 15:04:36.874511   11762 checks.go:304] validating the existence of file /etc/ssl/etcd/ssl/ca.pem\nI0129 15:04:36.874608   11762 checks.go:304] validating the existence of file /etc/ssl/etcd/ssl/node-192.168.8.195.pem\nI0129 15:04:36.874673   11762 checks.go:304] validating the existence of file /etc/ssl/etcd/ssl/node-192.168.8.195-key.pem\n[preflight] Some fatal errors occurred:\ncrictl is required by the container runtime: executable file not found in $PATH[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`\nerror execution phase preflight\nk8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1\n\tcmd/kubeadm/app/cmd/phases/workflow/runner.go:260\nk8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll\n\tcmd/kubeadm/app/cmd/phases/workflow/runner.go:446\nk8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run\n\tcmd/kubeadm/app/cmd/phases/workflow/runner.go:232\nk8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1\n\tcmd/kubeadm/app/cmd/init.go:111\ngithub.com/spf13/cobra.(*Command).execute\n\tvendor/github.com/spf13/cobra/command.go:940\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tvendor/github.com/spf13/cobra/command.go:1068\ngithub.com/spf13/cobra.(*Command).Execute\n\tvendor/github.com/spf13/cobra/command.go:992\nk8s.io/kubernetes/cmd/kubeadm/app.Run\n\tcmd/kubeadm/app/kubeadm.go:50\nmain.main\n\tcmd/kubeadm/kubeadm.go:25\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598",
    "stderr_lines": [
        "I0129 15:04:36.659732   11762 initconfiguration.go:255] loading configuration from \"/etc/kubernetes/kubeadm-config.yaml\"",
        "W0129 15:04:36.671137   11762 utils.go:69] The recommended value for \"clusterDNS\" in \"KubeletConfiguration\" is: [10.233.0.10]; the provided value is: [169.254.25.10]",
        "I0129 15:04:36.671342   11762 kubelet.go:196] the value of KubeletConfiguration.cgroupDriver is empty; setting it to \"systemd\"",
        "W0129 15:04:36.683469   11762 checks.go:1064] [preflight] WARNING: Couldn't create the interface used for talking to the container runtime: crictl is required by the container runtime: executable file not found in $PATH",
        "I0129 15:04:36.683523   11762 checks.go:563] validating Kubernetes and kubeadm version",
        "I0129 15:04:36.683570   11762 checks.go:168] validating if the firewall is enabled and active",
        "I0129 15:04:36.707112   11762 checks.go:203] validating availability of port 6443",
        "I0129 15:04:36.707496   11762 checks.go:203] validating availability of port 10259",
        "I0129 15:04:36.707563   11762 checks.go:203] validating availability of port 10257",
        "I0129 15:04:36.707616   11762 checks.go:280] validating the existence of file /etc/kubernetes/manifests/kube-apiserver.yaml",
        "I0129 15:04:36.707658   11762 checks.go:280] validating the existence of file /etc/kubernetes/manifests/kube-controller-manager.yaml",
        "I0129 15:04:36.707675   11762 checks.go:280] validating the existence of file /etc/kubernetes/manifests/kube-scheduler.yaml",
        "I0129 15:04:36.707688   11762 checks.go:280] validating the existence of file /etc/kubernetes/manifests/etcd.yaml",
        "I0129 15:04:36.707702   11762 checks.go:430] validating if the connectivity type is via proxy or direct",
        "I0129 15:04:36.707781   11762 checks.go:469] validating http connectivity to first IP address in the CIDR",
        "I0129 15:04:36.707824   11762 checks.go:469] validating http connectivity to first IP address in the CIDR",
        "I0129 15:04:36.707845   11762 checks.go:639] validating whether swap is enabled or not",
        "I0129 15:04:36.707918   11762 checks.go:370] validating the presence of executable crictl",
        "\t[WARNING FileExisting-crictl]: crictl not found in system path",
        "I0129 15:04:36.708034   11762 checks.go:370] validating the presence of executable conntrack",
        "I0129 15:04:36.708074   11762 checks.go:370] validating the presence of executable ip",
        "I0129 15:04:36.708126   11762 checks.go:370] validating the presence of executable iptables",
        "I0129 15:04:36.708164   11762 checks.go:370] validating the presence of executable mount",
        "I0129 15:04:36.708210   11762 checks.go:370] validating the presence of executable nsenter",
        "I0129 15:04:36.708246   11762 checks.go:370] validating the presence of executable ebtables",
        "I0129 15:04:36.708284   11762 checks.go:370] validating the presence of executable ethtool",
        "I0129 15:04:36.708320   11762 checks.go:370] validating the presence of executable socat",
        "I0129 15:04:36.708356   11762 checks.go:370] validating the presence of executable tc",
        "I0129 15:04:36.708398   11762 checks.go:370] validating the presence of executable touch",
        "I0129 15:04:36.708435   11762 checks.go:516] running all checks",
        "I0129 15:04:36.749160   11762 checks.go:401] checking whether the given node name is valid and reachable using net.LookupHost",
        "I0129 15:04:36.749369   11762 checks.go:605] validating kubelet version",
        "\t[WARNING KubeletVersion]: couldn't get kubelet version: cannot execute 'kubelet --version': executable file not found in $PATH",
        "I0129 15:04:36.749732   11762 checks.go:130] validating if the \"kubelet\" service is enabled and active",
        "\t[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'",
        "I0129 15:04:36.769970   11762 checks.go:203] validating availability of port 10250",
        "I0129 15:04:36.770132   11762 checks.go:329] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables",
        "\t[WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist",
        "I0129 15:04:36.770212   11762 checks.go:329] validating the contents of file /proc/sys/net/ipv4/ip_forward",
        "I0129 15:04:36.770290   11762 checks.go:680] validating the external etcd version",
        "I0129 15:04:36.874511   11762 checks.go:304] validating the existence of file /etc/ssl/etcd/ssl/ca.pem",
        "I0129 15:04:36.874608   11762 checks.go:304] validating the existence of file /etc/ssl/etcd/ssl/node-192.168.8.195.pem",
        "I0129 15:04:36.874673   11762 checks.go:304] validating the existence of file /etc/ssl/etcd/ssl/node-192.168.8.195-key.pem",
        "[preflight] Some fatal errors occurred:",
        "crictl is required by the container runtime: executable file not found in $PATH[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`",
        "error execution phase preflight",
        "k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1",
        "\tcmd/kubeadm/app/cmd/phases/workflow/runner.go:260",
        "k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll",
        "\tcmd/kubeadm/app/cmd/phases/workflow/runner.go:446",
        "k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run",
        "\tcmd/kubeadm/app/cmd/phases/workflow/runner.go:232",
        "k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1",
        "\tcmd/kubeadm/app/cmd/init.go:111",
        "github.com/spf13/cobra.(*Command).execute",
        "\tvendor/github.com/spf13/cobra/command.go:940",
        "github.com/spf13/cobra.(*Command).ExecuteC",
        "\tvendor/github.com/spf13/cobra/command.go:1068",
        "github.com/spf13/cobra.(*Command).Execute",
        "\tvendor/github.com/spf13/cobra/command.go:992",
        "k8s.io/kubernetes/cmd/kubeadm/app.Run",
        "\tcmd/kubeadm/app/kubeadm.go:50",
        "main.main",
        "\tcmd/kubeadm/kubeadm.go:25",
        "runtime.main",
        "\t/usr/local/go/src/runtime/proc.go:250",
        "runtime.goexit",
        "\t/usr/local/go/src/runtime/asm_amd64.s:1598"
    ],
    "stdout": "[init] Using Kubernetes version: v1.28.6\n[preflight] Running pre-flight checks\n[preflight] Pulling images required for setting up a Kubernetes cluster\n[preflight] This might take a minute or two, depending on the speed of your internet connection\n[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'",
    "stdout_lines": [
        "[init] Using Kubernetes version: v1.28.6",
        "[preflight] Running pre-flight checks",
        "[preflight] Pulling images required for setting up a Kubernetes cluster",
        "[preflight] This might take a minute or two, depending on the speed of your internet connection",
        "[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'"
    ]
}