Failed to interact with cluster in wsl2 #707

Closed anjiawei1991 closed 4 years ago

anjiawei1991 commented 4 years ago

I setup the cluster in wsl2, but can't interact with it.

How to reproduce it (as minimally and precisely as possible):

Then after a lot of time, I see Error from server (InternalError): an error on the server ("") has prevented the request from succeeding

Anything else we need to know?:

the output of cat $KUBECONFIG is:

apiVersion: v1
- cluster:
  name: kind
- context:
    cluster: kind
    user: kubernetes-admin
  name: kubernetes-admin@kind
current-context: kubernetes-admin@kind
kind: Config
preferences: {}
- name: kubernetes-admin

the output of netstat -ano | grep 45991 is:

tcp        0      0*               LISTEN      off (0.00/0/0)`

the output of docker exec kind-control-plane ps aux is:

root         1  0.0  0.0  17656  9320 ?        Ss   09:51   0:00 /sbin/init
root        28  0.1  0.0  22672 10996 ?        S<s  09:51   0:00 /lib/systemd/systemd-journald
root        39  2.2  0.3 2203172 52276 ?       Ssl  09:51   0:17 /usr/bin/containerd
root       193  2.4  0.5 1550088 86272 ?       Ssl  09:51   0:18 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock --fail-swap-on=false --node-ip= --fail-swap-on=false
root       223  0.0  0.0  10744  4316 ?        Sl   09:52   0:00 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/9ac4b869efb1aeb0f1ecb9e355ab98694ea5fcd8bc85f241da3f9e611be49833 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd
root       235  0.0  0.0   9336  4964 ?        Sl   09:52   0:00 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/b8ecacb545e4e5494d0f9f9091559351c37ac623be77979876983e623599b5a0 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd
root       254  0.0  0.0   1024     4 ?        Ss   09:52   0:00 /pause
root       257  0.0  0.0   1024     4 ?        Ss   09:52   0:00 /pause
root       273  0.0  0.0   9336  4856 ?        Sl   09:52   0:00 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/16ff3c7ea1eb204351dfd6965c21e0a3dcebe61f6a51643fb18b02a5a49e0fc4 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd
root       286  0.0  0.0  10744  4900 ?        Sl   09:52   0:00 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/9f72767303cf43748717f43bc0b5c3179185eeba66aaef22298000ac26cf34e2 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd
root       325  0.0  0.0   1024     4 ?        Ss   09:52   0:00 /pause
root       345  0.0  0.0   1024     4 ?        Ss   09:52   0:00 /pause
root       451  0.2  0.0  11800  5840 ?        Sl   09:52   0:01 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/886f4a122b83039bf4be05f5c51190bb594bffb9ed5434df815abf278a9255e4 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd
root       459  0.0  0.0  10744  5016 ?        Sl   09:52   0:00 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/331e944e695051e09ebcbd12b0bee78543ee79946fe5b6bf1a4a5f32154d1c6c -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd
root       483  0.0  0.0   9336  3976 ?        Sl   09:52   0:00 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/a0d18fd3bdc8fd3c64f754d631b14cd219eebda1fe03167992e9dc84427ddc7c -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd
root       485  2.3  0.2 10536552 43808 ?      Ssl  09:52   0:17 etcd --advertise-client-urls= --cert-file=/etc/kubernetes/pki/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --initial-advertise-peer-urls= --initial-cluster=kind-control-plane= --key-file=/etc/kubernetes/pki/etcd/server.key --listen-client-urls=, --listen-peer-urls= --name=kind-control-plane --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt --peer-client-cert-auth=true --peer-key-file=/etc/kubernetes/pki/etcd/peer.key --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt --snapshot-count=10000 --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
root       490  4.1  1.8 403296 278432 ?       Ssl  09:52   0:30 kube-apiserver --advertise-address= --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers= --insecure-port=0 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-cluster-ip-range= --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
root       494  0.0  0.0   9336  5016 ?        Sl   09:52   0:00 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/3a3269e0a50fc0318fb17bdb8695b0f8c6e2bc80f7af11658506caecc6c5ac7c -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd
root       530  0.3  0.2 141492 38792 ?        Ssl  09:52   0:02 kube-scheduler --bind-address= --kubeconfig=/etc/kubernetes/scheduler.conf --leader-elect=true
root       551  1.9  0.6 217304 105244 ?       Ssl  09:52   0:14 kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf --bind-address= --client-ca-file=/etc/kubernetes/pki/ca.crt --cluster-cidr= --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt --cluster-signing-key-file=/etc/kubernetes/pki/ca.key --controllers=*,bootstrapsigner,tokencleaner --enable-hostpath-provisioner=true --kubeconfig=/etc/kubernetes/controller-manager.conf --leader-elect=true --node-cidr-mask-size=24 --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --root-ca-file=/etc/kubernetes/pki/ca.crt --service-account-private-key-file=/etc/kubernetes/pki/sa.key --use-service-account-credentials=true
root       834  0.0  0.0  10744  4988 ?        Sl   09:53   0:00 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/f72eb97e6cf183a96c59bed33893957c25f4b510b94e2a7f3a091f83d781ee1d -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd
root       846  0.0  0.0   9336  4900 ?        Sl   09:53   0:00 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/7bed5b9f1b5f0b56c1511821fc1ff47807aa6ae809215610b61f55d30eaaae7f -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd
root       857  0.0  0.0   1024     4 ?        Ss   09:53   0:00 /pause
root       871  0.0  0.0   1024     4 ?        Ss   09:53   0:00 /pause
root       936  0.0  0.0  10744  4892 ?        Sl   09:53   0:00 containerd-shim -namespace k8s.io -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/k8s.io/2b426d366b0fdeefa7e90600b40de834303c065544cd38f0275b829724861f55 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd
root       959  0.0  0.2 139736 32216 ?        Ssl  09:53   0:00 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=kind-control-plane
root      7272  0.0  0.0   5836  2856 ?        Rs   10:04   0:00 ps aux
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:40:16Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding
 Version:           18.09.7
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        2d0083d
 Built:             Thu Jun 27 17:56:23 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
  Version:          18.09.7
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       2d0083d
  Built:            Thu Jun 27 17:23:02 2019
  OS/Arch:          linux/amd64
  Experimental:     false
VERSION="18.04.2 LTS (Bionic Beaver)"
PRETTY_NAME="Ubuntu 18.04.2 LTS"
anjiawei1991 commented 4 years ago

more infomations:

the outupt of docker logs kind-control-plane is:

Initializing machine ID from random generator.
Detected virtualization docker.
Detected architecture x86-64.
Failed to create symlink /sys/fs/cgroup/cpu: File exists
Failed to create symlink /sys/fs/cgroup/cpuacct: File exists

Welcome to Ubuntu Disco Dingo (development branch)!

Set hostname to <kind-control-plane>.
Configuration file /kind/systemd/kubelet.service is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
Configuration file /etc/systemd/system/kubelet.service.d/10-kubeadm.conf is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Set up automount Arbitrary…s File System Automount Point.
[  OK  ] Reached target Local File Systems.
[  OK  ] Reached target Slices.
[  OK  ] Listening on Journal Socket.
         Starting Journal Service...
         Starting Create System Users...
         Mounting Kernel Debug File System...
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Swap.
[  OK  ] Started Dispatch Password …ts to Console Directory Watch.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Local Encrypted Volumes.
         Mounting FUSE Control File System...
         Starting Apply Kernel Variables...
         Mounting Huge Pages File System...
[  OK  ] Mounted Kernel Debug File System.
[  OK  ] Mounted FUSE Control File System.
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[  OK  ] Mounted Huge Pages File System.
[  OK  ] Started Apply Kernel Variables.
[  OK  ] Started Journal Service.
         Starting Flush Journal to Persistent Storage...
[  OK  ] Started Flush Journal to Persistent Storage.
[  OK  ] Started Create System Users.
         Starting Create Static Device Nodes in /dev...
[  OK  ] Started Create Static Device Nodes in /dev.
[  OK  ] Reached target System Initialization.
[  OK  ] Reached target Basic System.
[  OK  ] Started kubelet: The Kubernetes Node Agent.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Timers.
         Starting containerd container runtime...
[  OK  ] Started containerd container runtime.
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Started Update UTMP about System Runlevel Changes.
BenTheElder commented 4 years ago

cc @PatrickLang

aojea commented 4 years ago

@anjiawei1991 what´s the output of docker exec --privileged kind-control-plane kubectl --kubeconfig=/etc/kubernetes/admin.conf get pods --all-namespaces ?

anjiawei1991 commented 4 years ago
root@kind-control-plane:/# export KUBECONFIG=/etc/kubernetes/admin.conf
root@kind-control-plane:/# kubectl get pods -A
NAMESPACE     NAME                                         READY   STATUS             RESTARTS   AGE
kube-system   coredns-5c98db65d4-5xscs                     0/1     Pending            0          6m22s
kube-system   coredns-5c98db65d4-gqzfm                     0/1     Pending            0          6m22s
kube-system   etcd-kind-control-plane                      1/1     Running            0          5m19s
kube-system   kindnet-fnmdn                                0/1     CrashLoopBackOff   6          6m21s
kube-system   kube-apiserver-kind-control-plane            1/1     Running            0          5m16s
kube-system   kube-controller-manager-kind-control-plane   1/1     Running            0          5m23s
kube-system   kube-proxy-tqsp6                             1/1     Running            0          6m21s
kube-system   kube-scheduler-kind-control-plane            1/1     Running            0          5m14s

root@kind-control-plane:/# kubectl describe pod coredns-5c98db65d4-5xscs --namespace=kube-system
Name:                 coredns-5c98db65d4-5xscs
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 <none>
Labels:               k8s-app=kube-dns
Annotations:          <none>
Status:               Pending
Controlled By:        ReplicaSet/coredns-5c98db65d4
    Image:       k8s.gcr.io/coredns:1.3.1
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP
      memory:  170Mi
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-xk67m (ro)
  Type           Status
  PodScheduled   False
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-xk67m
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/os=linux
Tolerations:     CriticalAddonsOnly
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  40s (x10 over 8m48s)  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

root@kind-control-plane:/# kubectl describe pod kindnet-fnmdn --namespace=kube-system
Name:           kindnet-fnmdn
Namespace:      kube-system
Priority:       0
Node:           kind-control-plane/
Start Time:     Tue, 16 Jul 2019 07:32:32 +0000
Labels:         app=kindnet
Annotations:    <none>
Status:         Running
Controlled By:  DaemonSet/kindnet
    Container ID:   containerd://d43a35e20c94947395b46ad71b07c000459b9b0621655fc33eb8e984558b96f6
    Image:          kindest/kindnetd:0.5.0
    Image ID:       sha256:ef97cccdfdb5048fe112cf868b2779e06dea11b0d742aad14d4bea690f653549
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Tue, 16 Jul 2019 07:38:18 +0000
      Finished:     Tue, 16 Jul 2019 07:38:18 +0000
    Ready:          False
    Restart Count:  6
      cpu:     100m
      memory:  50Mi
      cpu:     100m
      memory:  50Mi
      HOST_IP:      (v1:status.hostIP)
      POD_IP:       (v1:status.podIP)
      /etc/cni/net.d from cni-cfg (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kindnet-token-5dmqc (ro)
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kindnet-token-5dmqc
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     :NoSchedule
  Type     Reason     Age                     From                         Message
  ----     ------     ----                    ----                         -------
  Normal   Scheduled  9m34s                   default-scheduler            Successfully assigned kube-system/kindnet-fnmdn to kind-control-plane
  Normal   Pulled     7m58s (x5 over 9m34s)   kubelet, kind-control-plane  Container image "kindest/kindnetd:0.5.0" already present on machine
  Normal   Created    7m58s (x5 over 9m32s)   kubelet, kind-control-plane  Created container kindnet-cni
  Normal   Started    7m58s (x5 over 9m32s)   kubelet, kind-control-plane  Started container kindnet-cni
  Warning  BackOff    4m26s (x26 over 9m31s)  kubelet, kind-control-plane  Back-off restarting failed container

root@kind-control-plane:/# kubectl logs kindnet-fnmdn --namespace=kube-system
hostIP =
podIP =
panic: failed to ensure that nat chain KIND-MASQ-AGENT jumps to MASQUERADE: error appending rule: exit status 1: iptables: No chain/target/match by that name.

goroutine 34 [running]:
        /src/main.go:76 +0x67
created by main.main
        /src/main.go:73 +0x318


BenTheElder commented 4 years ago

Not a WSL2 expert but it seems our Network setup is failing, can you get the logs from the kindnetd pod? (kind export logs can grab these without kubectl)

Also, were you following https://kind.sigs.k8s.io/docs/user/using-wsl2/ ?

aojea commented 4 years ago

panic: failed to ensure that nat chain KIND-MASQ-AGENT jumps to MASQUERADE: error appending rule: exit status 1: iptables: No chain/target/match by that name.

can you check if iptables is working inside the container? what´s the output of iptables-save?

anjiawei1991 commented 4 years ago
$ cat /tmp/051921006/kind-control-plane/pods/kube-system_kindnet-fnmdn_122abfb6-a0c0-4567-8447-147f7ee6a1de/kindnet-cni/10.log
2019-07-16T07:58:53.6839625Z stdout F hostIP =
2019-07-16T07:58:53.6839959Z stdout F podIP =
2019-07-16T07:58:53.7999526Z stderr F panic: failed to ensure that nat chain KIND-MASQ-AGENT jumps to MASQUERADE: error appending rule: exit status 1: iptables: No chain/target/match by that name.
2019-07-16T07:58:53.7999883Z stderr F
2019-07-16T07:58:53.7999966Z stderr F
2019-07-16T07:58:53.8000016Z stderr F goroutine 22 [running]:
2019-07-16T07:58:53.8000103Z stderr F main.main.func1(0xc00021f440)
2019-07-16T07:58:53.8000154Z stderr F   /src/main.go:76 +0x67
2019-07-16T07:58:53.8000203Z stderr F created by main.main
2019-07-16T07:58:53.8000258Z stderr F   /src/main.go:73 +0x318

@BenTheElder this is the log of kindnetd pod。

I was not following the using-wsl2 doc but the Microsoft's official doc.

anjiawei1991 commented 4 years ago


root@kind-control-plane:/# iptables-save
# Generated by iptables-save v1.6.1 on Tue Jul 16 08:05:33 2019
:INPUT ACCEPT [198810:33399528]
:OUTPUT ACCEPT [198822:33400248]
# Completed on Tue Jul 16 08:05:33 2019
# Generated by iptables-save v1.6.1 on Tue Jul 16 08:05:33 2019
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
# Completed on Tue Jul 16 08:05:33 2019
BenTheElder commented 4 years ago

can you please try following the WSL2 doc? those instructions are from a microsoft employee :-)

BenTheElder commented 4 years ago

2019-07-16T07:58:53.7999526Z stderr F panic: failed to ensure that nat chain KIND-MASQ-AGENT jumps to MASQUERADE: error appending rule: exit status 1: iptables: No chain/target/match by that name.

that's super helpful, that shouldn't happen.

BenTheElder commented 4 years ago

seems we might need a modprobe ipt_MASQUERADE (though it's pretty strange that this isn't loaded already..?), what's the output of that on your WSL2 host?

if that fixes things, we can probably ensure kindnetd runs it.

aojea commented 4 years ago

great catch @BenTheElder , seems that the WSL2 kernel doesn´t have the netfilter modules needed https://blog.simos.info/how-to-run-lxd-containers-in-wsl2/

anjiawei1991 commented 4 years ago

can you please try following the WSL2 doc? those instructions are from a microsoft employee :-)

I have glanced over those instructions, and I found no special in it, it's almost the same with I had done on WSL2 installation.

But I do think there is some problem in WSL2's docker enviroment, because I just found that I can't run a very simple nginx in docker.

I'm going to install some other vm. Also , I will save the WSL2 enviroment for this issue tracking.

BenTheElder commented 4 years ago

They do mention a particular way of starting docker fwiw, but the missing module is the real problem here. (https://github.com/kubernetes-sigs/kind/issues/707#issuecomment-511992131)

brunowego commented 4 years ago

Similar issue with docker-machine:

kubectl get pods -A                                                                                                                                                            
NAMESPACE     NAME                                         READY   STATUS             RESTARTS   AGE
kube-system   coredns-5c98db65d4-h57mp                     0/1     Pending            0          8m5s
kube-system   coredns-5c98db65d4-tv76c                     0/1     Pending            0          8m5s
kube-system   etcd-kind-control-plane                      1/1     Running            0          7m27s
kube-system   kindnet-77trs                                0/1     CrashLoopBackOff   6          7m50s
kube-system   kindnet-rbhzn                                0/1     CrashLoopBackOff   6          7m50s
kube-system   kindnet-tvdtg                                0/1     CrashLoopBackOff   6          8m6s
kube-system   kube-apiserver-kind-control-plane            1/1     Running            0          7m4s
kube-system   kube-controller-manager-kind-control-plane   1/1     Running            0          7m14s
kube-system   kube-proxy-2hnkr                             1/1     Running            0          8m6s
kube-system   kube-proxy-ddvbr                             1/1     Running            0          7m50s
kube-system   kube-proxy-slxkg                             1/1     Running            0          7m50s
kube-system   kube-scheduler-kind-control-plane            1/1     Running            0          7m23s
kube-system   tiller-deploy-7bf78cdbf7-xwrfq               0/1     Pending            0          5m27s

Running way:

tee ~/.kind-config.yaml << EOF
kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
  apiServerAddress: $(docker-machine ip)
  - role: control-plane
  - role: worker
  - role: worker
kind create cluster --config ~/.kind-config.yaml
OS X Mojave                                                                                                                                     
docker-machine version 0.16.1, build cce350d7                                                                                                                                                       
kind version v0.4.0                                                                                                                                                    
Docker version 18.09.2, build 6247962
kubectl Client Version: v1.15.0
kubectl Server Version: v1.15.0
BenTheElder commented 4 years ago

@PatrickLang I suspect lacking the MASQUERADE module is a bit of a problem for many Kubernetes setups? Is it totally not available or just not loaded?

PatrickLang commented 4 years ago

https://github.com/microsoft/WSL/issues/4165 looks related

what kernel config does that relate to?

gunzip /proc/config.gz -c | grep -i masq                                 
# CONFIG_NFT_MASQ is not set
BenTheElder commented 4 years ago



aside: modules on a working host:

lsmod | grep -i masq
ipt_MASQUERADE         16384  4
nf_nat_ipv4            16384  3 ipt_MASQUERADE,nft_chain_nat_ipv4,iptable_nat
nf_conntrack          163840  8 xt_conntrack,nf_nat,ipt_MASQUERADE,nf_nat_ipv4,xt_nat,nf_conntrack_netlink,xt_connmark,ip_vs
x_tables               45056  13 xt_conntrack,xt_statistic,iptable_filter,nft_compat,xt_tcpudp,ipt_MASQUERADE,xt_addrtype,xt_nat,xt_comment,ipt_REJECT,xt_connmark,ip_tables,xt_mark
BenTheElder commented 4 years ago

on that same host (my linux workstation, just the first handy linux box I know it works on :^)):

$ cat /lib/modules/$(uname -r)/build/.config | grep -i masq
PatrickLang commented 4 years ago

thanks, will do some more digging there. my initial guess was that it was actually missing the chains since WSL2 doesn't run init.d or systemd by default. I'll see if I can duplicate the default chains from a normal Ubuntu setup.

PatrickLang commented 4 years ago

actually looks like CONFIG_NFT_MASQ=y is the first thing to try

BenTheElder commented 4 years ago

I'm pretty sure the error is somewhat misleading and that MASQUERADE is a target setup by the kernel module rather than like, a chain you would setup yourself ... (or systemd or ...) :sweat_smile:

I'd definitely go with CONFIG_NFT_MASQ :+1:

PatrickLang commented 4 years ago

making some progress...

kconfig diff

diff --git a/Microsoft/config-wsl b/Microsoft/config-wsl
index 646309095..be2158f8c 100644
--- a/Microsoft/config-wsl
+++ b/Microsoft/config-wsl
@@ -1,13 +1,13 @@
 # Automatically generated file; DO NOT EDIT.
-# Linux/x86 4.19.52 Kernel Configuration
+# Linux/x86 4.19.57 Kernel Configuration

-# Compiler: x86_64-msft-linux-gcc (GCC) 7.3.0
+# Compiler: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
@@ -869,7 +869,7 @@ CONFIG_NF_TABLES_INET=y
 # CONFIG_NFT_LOG is not set
 # CONFIG_NFT_LIMIT is not set
-# CONFIG_NFT_MASQ is not set
 # CONFIG_NFT_REDIR is not set
 # CONFIG_NFT_NAT is not set
 # CONFIG_NFT_TUNNEL is not set
@@ -1033,6 +1033,7 @@ CONFIG_NF_REJECT_IPV4=y
@@ -1066,6 +1067,7 @@ CONFIG_IP_NF_ARP_MANGLE=y
 # CONFIG_NFT_DUP_IPV6 is not set
 # CONFIG_NFT_FIB_IPV6 is not set
 # CONFIG_NF_DUP_IPV6 is not set

now kindnet gets further before crashing

2019-07-19T19:20:51.7131909Z stdout F hostIP =
2019-07-19T19:20:51.7132396Z stdout F podIP =
2019-07-19T19:20:52.1266101Z stdout F Handling iptables: *nat
2019-07-19T19:20:52.1267564Z stdout F 
2019-07-19T19:20:52.1267789Z stdout F Handling iptables: *nat
2019-07-19T19:20:52.1267897Z stdout F :KIND-MASQ-AGENT - [0:0]
2019-07-19T19:20:52.1267973Z stdout F 
2019-07-19T19:20:52.1268055Z stdout F Handling iptables: *nat
2019-07-19T19:20:52.126849Z stdout F :KIND-MASQ-AGENT - [0:0]
2019-07-19T19:20:52.1268596Z stdout F -A KIND-MASQ-AGENT -m comment --comment "kind-masq-agent: local traffic is not subject to MASQUERADE" -d -j RETURN
2019-07-19T19:20:52.1268708Z stdout F 
2019-07-19T19:20:52.1268787Z stdout F Handling iptables: *nat
2019-07-19T19:20:52.1268865Z stdout F :KIND-MASQ-AGENT - [0:0]
2019-07-19T19:20:52.1268944Z stdout F -A KIND-MASQ-AGENT -m comment --comment "kind-masq-agent: local traffic is not subject to MASQUERADE" -d -j RETURN
2019-07-19T19:20:52.126903Z stdout F -A KIND-MASQ-AGENT -m comment --comment "ip-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain)" -j MASQUERADE
2019-07-19T19:20:52.1269104Z stdout F 
2019-07-19T19:20:52.126948Z stdout F Handling iptables: *nat
2019-07-19T19:20:52.1269586Z stdout F :KIND-MASQ-AGENT - [0:0]
2019-07-19T19:20:52.1269671Z stdout F -A KIND-MASQ-AGENT -m comment --comment "kind-masq-agent: local traffic is not subject to MASQUERADE" -d -j RETURN
2019-07-19T19:20:52.1269751Z stdout F -A KIND-MASQ-AGENT -m comment --comment "ip-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain)" -j MASQUERADE
2019-07-19T19:20:52.1269824Z stdout F COMMIT
2019-07-19T19:20:52.1269899Z stdout F 
2019-07-19T19:21:21.9410885Z stderr F panic: Get dial tcp i/o timeout
2019-07-19T19:21:21.9411799Z stderr F 
2019-07-19T19:21:21.9413272Z stderr F goroutine 1 [running]:
2019-07-19T19:21:21.9415423Z stderr F main.main()
2019-07-19T19:21:21.9415715Z stderr F   /src/main.go:89 +0x4aa
PatrickLang commented 4 years ago

oh nevermind - kindnet restarted after that and now it's running :)

BenTheElder commented 4 years ago


We should retry the node listing / rewrite as a proper controller 😅

Right now we panic a little too aggressively, but it does help surface problems and it should restart.

PatrickLang commented 4 years ago

Digging into a kube-proxy problem next:

2019-07-19T22:48:01.7326624Z stderr F W0719 22:48:01.732310       1 proxier.go:500] Failed to read file /lib/modules/4.19.57-microsoft-standard+/modules.builtin with error open /lib/modules/4.19.57-microsoft-standard+/modules.builtin: no such file or directory. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
2019-07-19T22:48:01.7370165Z stderr F W0719 22:48:01.736387       1 proxier.go:513] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
2019-07-19T22:48:01.7394852Z stderr F W0719 22:48:01.738833       1 proxier.go:513] Failed to load kernel module ip_vs_rr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
2019-07-19T22:48:01.7441959Z stderr F W0719 22:48:01.743002       1 proxier.go:513] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
2019-07-19T22:48:01.7457162Z stderr F W0719 22:48:01.745034       1 proxier.go:513] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
2019-07-19T22:48:01.7511346Z stderr F W0719 22:48:01.750078       1 proxier.go:513] Failed to load kernel module nf_conntrack with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
2019-07-19T22:48:01.7589703Z stderr F W0719 22:48:01.758509       1 server_others.go:249] Flag proxy-mode="" unknown, assuming iptables proxy
2019-07-19T22:48:01.7892125Z stderr F I0719 22:48:01.788965       1 server_others.go:143] Using iptables Proxier.
2019-07-19T22:48:01.7913181Z stderr F I0719 22:48:01.789627       1 server.go:534] Version: v1.15.0
2019-07-19T22:48:01.8155826Z stderr F I0719 22:48:01.815006       1 conntrack.go:52] Setting nf_conntrack_max to 131072
2019-07-19T22:48:01.8159714Z stderr F I0719 22:48:01.815649       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
2019-07-19T22:48:01.8163704Z stderr F I0719 22:48:01.816048       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
2019-07-19T22:48:01.8361069Z stderr F I0719 22:48:01.835141       1 config.go:96] Starting endpoints config controller
2019-07-19T22:48:01.8361483Z stderr F I0719 22:48:01.835416       1 controller_utils.go:1029] Waiting for caches to sync for endpoints config controller
2019-07-19T22:48:01.8377322Z stderr F I0719 22:48:01.835829       1 config.go:187] Starting service config controller
2019-07-19T22:48:01.8377734Z stderr F I0719 22:48:01.836680       1 controller_utils.go:1029] Waiting for caches to sync for service config controller
2019-07-19T22:48:01.9480658Z stderr F I0719 22:48:01.947179       1 controller_utils.go:1036] Caches are synced for service config controller
2019-07-19T22:48:01.9481047Z stderr F I0719 22:48:01.947403       1 controller_utils.go:1036] Caches are synced for endpoints config controller
2019-07-19T22:48:21.8033521Z stderr F E0719 22:48:21.801615       1 proxier.go:1442] Failed to delete stale service IP connections, error: error deleting connection tracking state for UDP service IP:, error: conntrack command returned: "conntrack v1.4.4 (conntrack-tools): Operation failed: invalid parameters\n", error message: exit status 1
2019-07-19T22:48:23.6152989Z stderr F E0719 22:48:23.610658       1 proxier.go:1402] Failed to execute iptables-restore: exit status 1 (iptables-restore: line 58 failed
2019-07-19T22:48:23.6153673Z stderr F )
2019-07-19T22:48:53.7783046Z stderr F E0719 22:48:53.778140       1 proxier.go:1402] Failed to execute iptables-restore: exit status 1 (iptables-restore: line 58 failed
2019-07-19T22:48:53.7783618Z stderr F )
2019-07-19T22:49:23.8560315Z stderr F E0719 22:49:23.855245       1 proxier.go:1402] Failed to execute iptables-restore: exit status 1 (iptables-restore: line 58 failed
2019-07-19T22:49:23.8561166Z stderr F )
PatrickLang commented 4 years ago

The module load failures at the top are not relevant. All the mods it's probing are compiled in already. The best clue as this point is the iptables-restore error which is truncated

BenTheElder commented 4 years ago

might have to up the kube-proxy verbosity to see more of what it was doing.

BenTheElder commented 4 years ago

I think technically #726 fixed this, but ideally long term we will hopefully not need those workarounds in a future version of WSL2

jpvosloo commented 4 years ago

If you are using docker with WSL 2 in May 2020 then do the following in a terminal: cd /d %LOCALAPPDATA%\Docker\pki copy apiserver-etcd-client.crt apiserver.crt Now try to re-start docker.

BenTheElder commented 4 years ago

@jpvosloo is this something we need to add to the guide? unfortunately I haven't had a chance to do much with this myself yet, and I think Patrick has moved on to other things

goodwill commented 3 years ago

May I know how to change the flag and recompile my wsl2 kernel? Seems this is also stopping docker directly installed inside wsl2 from working (as Docker Desktop is not there for Surface Pro X yet, so wanna try to get docker running inside wsl2 as a workaround for now)

goodwill commented 3 years ago

Related issue: https://github.com/docker/roadmap/issues/91

BenTheElder commented 3 years ago

I don't use WSL2, if anyone does and knows about this, please also look at https://github.com/kubernetes-sigs/kind/issues/1740

razlani commented 1 year ago

Used https://kubernetes.io/blog/2020/05/21/wsl-docker-kubernetes-on-the-windows-desktop/ And https://kind.sigs.k8s.io/docs/user/using-wsl2/ And reflashed kernal but same issue for me :<

 k logs -n kube-system pod/kube-proxy-bfbvn                                           ✔  at kind-wslkind ⎈  at 21:48:31 
W1218 10:44:51.175215       1 proxier.go:598] Failed to read file /lib/modules/ with error open /lib/modules/ no such file or directory. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1218 10:44:51.175927       1 proxier.go:608] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1218 10:44:51.176608       1 proxier.go:608] Failed to load kernel module ip_vs_rr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1218 10:44:51.177156       1 proxier.go:608] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1218 10:44:51.177734       1 proxier.go:608] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1218 10:44:51.178371       1 proxier.go:608] Failed to load kernel module nf_conntrack with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1218 10:44:51.179227       1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I1218 10:44:51.181599       1 node.go:135] Successfully retrieved node IP:
I1218 10:44:51.181616       1 server_others.go:145] Using iptables Proxier.
I1218 10:44:51.182888       1 server.go:571] Version: v1.17.0
I1218 10:44:51.183073       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 655360
F1218 10:44:51.183087       1 server.go:485] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied
docker version
Client: Docker Engine - Community
 Cloud integration: v1.0.29
 Version:           20.10.21
 API version:       1.41
 Go version:        go1.18.7
 Git commit:        baeda1f
 Built:             Tue Oct 25 18:02:28 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Desktop
  Version:          20.10.21
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.7
  Git commit:       3056208
  Built:            Tue Oct 25 18:00:19 2022
  OS/Arch:          linux/amd64
  Experimental:     false
  Version:          1.6.10
  GitCommit:        770bd0108c32f3fb5c73ae1264f7e503fe7b2661
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
  Version:          0.19.0
  GitCommit:        de40ad0
 k version                                                                            ✔  at kind-wslkind ⎈  at 22:00:44 
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.2", GitCommit:"5835544ca568b757a8ecae5c153f317e5736700e", GitTreeState:"clean", BuildDate:"2022-09-21T14:33:49Z", GoVersion:"go1.19.1", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2020-01-14T00:09:19Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.25) and server (1.17) exceeds the supported minor version skew of +/-1

In writing that I see a pretty ominous server/client version warning there..

 uname -r                                                                                                ✔  at 22:00:30 

(flashed as per the wsl guide in k8s

BenTheElder commented 1 year ago

In writing that I see a pretty ominous server/client version warning there..

That doesn't matter yet, kubectl from the host isn't used by kind itself. You'll want to avoid unsupported (by Kubernetes upstream) skew when youg et to using the cluster but that's not why setup is failing.

Please file a new bug with the full bug template details instead of commenting on a 2+ year old issue, thing will have changed quite a bit and we'll need to know things like the kind version you're using (which is in the bug template) if anyone can help.

razlani commented 1 year ago

@BenTheElder Thanks for your comment.

For whoever else this could help: I solved it earlier using https://github.com/kubernetes-sigs/kind/issues/1740#issuecomment-704559467 and by upgrading my kind version.

I am not sure if updating kind alone would have worked, in conjunction with the k8s/kind guides I linked above (I was not using the latest kind version as per the guides).

Since it's a known issue as linked above, I don't think it's worth creating a new issue, but thank you all for the help!