alban commented 6 years ago

To Reproduce:

Install Fedora 28 from https://cloud.fedoraproject.org/ (GP2 image) on AWS:
- m4.large
- Disk: at least 50GiB
- ssh: ssh -i ~/.ssh/$KEY fedora@$IP

Start a kube-spawn Kubernetes cluster on the AWS EC2 instance:


export KUBERNETES_VERSION=v1.9.9 # or other version
export KUBERNETES_VERSION=v1.10.5 # or other version
export KUBERNETES_VERSION=v1.11.0 # or other version
export KUBE_SPAWN_VERSION=master # FIXME

Workarounds

sudo setenforce 0

Install dependencies

sudo dnf install -y btrfs-progs git go iptables libselinux-utils polkit qemu-img systemd-container make docker mkdir go export GOPATH=$HOME/go curl -fsSL -O https://github.com/containernetworking/plugins/releases/download/v0.6.0/cni-plugins-amd64-v0.6.0.tgz sudo mkdir -p /opt/cni/bin sudo tar -C /opt/cni/bin -xvf cni-plugins-amd64-v0.6.0.tgz sudo curl -Lo /usr/local/bin/kubectl https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_VERSION}/bin/linux/amd64/kubectl sudo chmod +x /usr/local/bin/kubectl

Compile and install

mkdir -p $GOPATH/src/github.com/kinvolk cd $GOPATH/src/github.com/kinvolk git clone https://github.com/kinvolk/kube-spawn.git cd kube-spawn/ git checkout $KUBE_SPAWN_VERSION make DOCKERIZED=n sudo make install

First attempt to use kube-spawn

cd sudo -E kube-spawn create --kubernetes-version $KUBERNETES_VERSION sudo -E kube-spawn start --nodes=3 sudo -E kube-spawn destroy

Workaround for "no space left on device": https://github.com/kinvolk/kube-spawn/issues/281

sudo umount /var/lib/machines sudo qemu-img resize -f raw /var/lib/machines.raw $((1010241024*1024)) sudo mount -t btrfs -o loop /var/lib/machines.raw /var/lib/machines sudo btrfs filesystem resize max /var/lib/machines sudo btrfs quota disable /var/lib/machines

Start kube-spawn

cd sudo -E kube-spawn create --kubernetes-version $KUBERNETES_VERSION sudo -E kube-spawn start --nodes=3


Then the error message:

Download of https://alpha.release.flatcar-linux.net/amd64-usr/current/flatcar_developer_container.bin.bz2 complete. Created new local image 'flatcar'. Operation completed successfully. Exiting. nf_conntrack module is not loaded: stat /sys/module/nf_conntrack/parameters/hashsize: no such file or directory Warning: nf_conntrack module is not loaded. loading nf_conntrack module... making iptables FORWARD chain defaults to ACCEPT... setting iptables rule to allow CNI traffic... Starting 3 nodes in cluster default ... Waiting for machine kube-spawn-default-worker-fjxan9 to start up ... Waiting for machine kube-spawn-default-master-5y7clq to start up ... Waiting for machine kube-spawn-default-worker-2ujr2f to start up ... Started kube-spawn-default-worker-2ujr2f Bootstrapping kube-spawn-default-worker-2ujr2f ... Started kube-spawn-default-master-5y7clq Bootstrapping kube-spawn-default-master-5y7clq ... Cluster "default" started Failed to start machine kube-spawn-default-worker-fjxan9: timeout waiting for "kube-spawn-default-worker-fjxan9" to start Note: kubeadm init can take several minutes master-5y7clq I0630 14:22:29.999557 380 feature_gate.go:230] feature gates: &{map[]} [init] using Kubernetes version: v1.11.0 [preflight] running pre-flight checks [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service' [WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist [WARNING FileExisting-crictl]: crictl not found in system path I0630 14:22:30.050775 380 kernel_validator.go:81] Validating kernel version I0630 14:22:30.051083 380 kernel_validator.go:96] Validating kernel config [WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.05.0-ce. Max validated version: 17.03 [WARNING Hostname]: hostname "kube-spawn-default-master-5y7clq" could not be reached [WARNING Hostname]: hostname "kube-spawn-default-master-5y7clq" lookup kube-spawn-default-master-5y7clq on 8.8.8.8:53: no such host reflight/images] Pulling images required for setting up a Kubernetes cluster [preflight/images] This might take a minute or two, depending on the speed of your internet connection [preflight/images] You can also perform this action in beforehand using 'kubeadm config images pull' [kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [preflight] Activating the kubelet service [certificates] Generated ca certificate and key. [certificates] Generated apiserver certificate and key. [certificates] apiserver serving cert is signed for DNS names [kube-spawn-default-master-5y7clq kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.22.0.3] [certificates] Generated apiserver-kubelet-client certificate and key. [certificates] Generated sa key and public key. [certificates] Generated front-proxy-ca certificate and key. [certificates] Generated front-proxy-client certificate and key. [certificates] Generated etcd/ca certificate and key. [certificates] Generated etcd/server certificate and key. [certificates] etcd/server serving cert is signed for DNS names [kube-spawn-default-master-5y7clq localhost] and IPs [127.0.0.1 ::1] [certificates] Generated etcd/peer certificate and key. [certificates] etcd/peer serving cert is signed for DNS names [kube-spawn-default-master-5y7clq localhost] and IPs [10.22.0.3 127.0.0.1 ::1] [certificates] Generated etcd/healthcheck-client certificate and key. [certificates] Generated apiserver-etcd-client certificate and key. [certificates] valid certificates and keys now exist in "/etc/kubernetes/pki" [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf" [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf" [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf" [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf" [controlplane] wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml" [controlplane] wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml" [controlplane] wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml" [etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml" [init] waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests" [init] this might take a minute or longer if the control plane images have to be pulled [apiclient] All control plane components are healthy after 42.001677 seconds [uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace [kubelet] Creating a ConfigMap "kubelet-config-1.11" in namespace kube-system with the configuration for the kubelets in the cluster [markmaster] Marking the node kube-spawn-default-master-5y7clq as master by adding the label "node-role.kubernetes.io/master=''" [markmaster] Marking the node kube-spawn-default-master-5y7clq as master by adding the taints [node-role.kubernetes.io/master:NoSchedule] [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "kube-spawn-default-master-5y7clq" as an annotation [bootstraptoken] using token: 1o71nu.v7s48wncryhbdmm7 [bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials [bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token [bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster [bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace [addons] Applied essential addon: CoreDNS [addons] Applied essential addon: kube-proxy Your Kubernetes master has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ You can now join any number of machines by running the following on each node as root: kubeadm join 10.22.0.3:6443 --token 1o71nu.v7s48wncryhbdmm7 --discovery-token-ca-cert-hash sha256:c8ac2337adc7ed01725bed7d78605661dc759257fce213838f1cb89486fe263c I0630 14:23:47.569329 1140 feature_gate.go:230] feature gates: &{map[]} aaaaaa.bbbbbbbbbbbbbbbb serviceaccount/weave-net created clusterrole.rbac.authorization.k8s.io/weave-net created clusterrolebinding.rbac.authorization.k8s.io/weave-net created daemonset.extensions/weave-net created worker-2ujr2f [preflight] running pre-flight checks [WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh] or no builtin kernel ipvs support: map[ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{}] you can solve this problem with following methods:

Run 'modprobe -- ' to load missing kernel modules;
Provide the missing builtin kernel ipvs support [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service' [WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist [WARNING FileExisting-crictl]: crictl not found in system path I0630 14:23:49.919029 449 kernel_validator.go:81] Validating kernel version I0630 14:23:49.919338 449 kernel_validator.go:96] Validating kernel config [WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.05.0-ce. Max validated version: 17.03 [WARNING Hostname]: hostname "kube-spawn-default-worker-2ujr2f" could not be reached [WARNING Hostname]: hostname "kube-spawn-default-worker-2ujr2f" lookup kube-spawn-default-worker-2ujr2f on 8.8.8.8:53: no such host [discovery] Trying to connect to API Server "10.22.0.3:6443" [discovery] Created cluster-info discovery client, requesting info from "https://10.22.0.3:6443" [discovery] Failed to connect to API Server "10.22.0.3:6443": token id "aaaaaa" is invalid for this cluster or it has expired. Use "kubeadm token create" on the master node to creating a new valid token [discovery] Trying to connect to API Server "10.22.0.3:6443" [discovery] Created cluster-info discovery client, requesting info from "https://10.22.0.3:6443" [discovery] Cluster info signature and contents are valid and no TLS pinning was specified, will use API Server "10.22.0.3:6443" [discovery] Successfully established connection with API Server "10.22.0.3:6443" [kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.11" ConfigMap in the kube-system namespace [kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [preflight] Activating the kubelet service [tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap... [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "kube-spawn-default-worker-2ujr2f" as an annotation This node has joined the cluster:
- Certificate signing request was sent to master and a response was received.
- The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the master to see this node join the cluster. Failed to start cluster: provisioning the worker nodes with kubeadm didn't succeed

More debug info:

$ kubectl get nodes
NAME                               STATUS    ROLES     AGE       VERSION
kube-spawn-default-master-5y7clq   Ready     master    1m        v1.11.0
kube-spawn-default-worker-2ujr2f   Ready     <none>    46s       v1.11.0
$ machinectl 
MACHINE                          CLASS     SERVICE        OS      VERSION  ADDRESSES
kube-spawn-default-master-5y7clq container systemd-nspawn flatcar 1814.0.0 10.22.0.3...
kube-spawn-default-worker-2ujr2f container systemd-nspawn flatcar 1814.0.0 10.22.0.2...

2 machines listed.
$ df -h /var/lib/machines
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0       10G  1.7G  7.8G  18% /var/lib/machines

The third machine does not exist anymore?

alban commented 6 years ago

After a second attempt, it works.

arcolife commented 5 years ago

I get this timeout just as @alban described, except it's reproducible every time.

$ kube-spawn start
Warning: kube-proxy could crash due to insufficient nf_conntrack hashsize.
setting nf_conntrack hashsize to 131072... 
making iptables FORWARD chain defaults to ACCEPT...
new poolSize to be : 5490739200
Starting 3 nodes in cluster default ...
Waiting for machine kube-spawn-default-worker-naz6fc to start up ...
Waiting for machine kube-spawn-default-master-yz3twq to start up ...
Waiting for machine kube-spawn-default-worker-u5fu6n to start up ...
Failed to start machine kube-spawn-default-master-yz3twq: timeout waiting for "kube-spawn-default-master-yz3twq" to start
Failed to start machine kube-spawn-default-worker-naz6fc: timeout waiting for "kube-spawn-default-worker-naz6fc" to start
Failed to start cluster: starting the cluster didn't succeed

Note:

I face the same timeout issue, regardless of when I destroy the cluster and start again. Or if I mount a formatted btrfs and redo this.
The first time I launched kube-spawn, it was with a manually formatted and mounted btrfs volume. That's when it complained "machine.raw" not found. So I unmounted and re-ran. So the systemd-nspawn did its job and created a machine.raw. I tried to re-spawn the cluster afterwards, except this time it didn't complain about .raw file obviously. But it timed out regardless.
Even though I've been through the troubleshooting.md guide, SELinux has been a pita and as a result I've had to create about a dozen policies and semanage it all. Not the cake I was digging. pfft

for debugging, is there any place this things logs itself into?

kube-spawn v0.3.0

FS:


/dev/loop2     btrfs      40G  1.7G   39G   5% /var/lib/machines

OR

/dev/sda4 btrfs 56G 1.7G 54G 4% /var/lib/machines

- `systemd-container-238-10.git438ac26.fc28.x86_64`
- `qemu-img-2.11.2-4.fc28.x86_64`
- machinectl limit to 40G with loopback mount (as evident in the df output above too):

machinectl show

PoolPath=/var/lib/machines PoolUsage=1866190848 PoolLimit=42949672960


- OS: `Linux 4.18.17-200.fc28.x86_64 GNU/Linux`

arcolife commented 5 years ago

ok nevermind.

all I had to do was:

export KUBERNETES_VERSION=v1.12.0 (didn't do it earlier before create step)
kube-spawn destroy
kube-spawn create (this time, it populated /var/lib/kube-spawn/clusters. It was an empty trail of subdirs earlier.)
kube-spawn start

and it works. jeez

krnowak commented 5 years ago

Seems to be related to #325.

arcolife commented 5 years ago

Seems to be related to #325.

sure, except I didn't destroy it first. Got the timeout from start as per https://github.com/kinvolk/kube-spawn/issues/282#issuecomment-437786972 (so to speak, after creating the cluster) ..then resolved issue with https://github.com/kinvolk/kube-spawn/issues/282#issuecomment-437790311

apologies if that order in step 2 of resolution comment, created a confusion.

also I can't reproduce it now. :/

kinvolk / kube-spawn

fails to start with a timeout with Kubernetes 1.11 #282

To Reproduce:

Workarounds

Install dependencies

Compile and install

First attempt to use kube-spawn

Workaround for "no space left on device": https://github.com/kinvolk/kube-spawn/issues/281

Start kube-spawn

machinectl show