weaveworks / weave

Simple, resilient multi-host containers networking and more.
https://www.weave.works
Apache License 2.0
6.61k stars 665 forks source link

Crashing weave-net pod when adding node to k8 cluster without supplying network-CIDR #3758

Open ryan-g2 opened 4 years ago

ryan-g2 commented 4 years ago

What you expected to happen?

To not have to supply a --pod-network-cidr=10.32.0.0/12 command when setting up a weave network when using kubeadm init. For the weave-net pod to remain stable when adding a node to the cluster.

What happened?

When I setup a k8 cluster using kubeadm init --apiserver-advertise-address=192.168.1.31 and add one node, the newly created weave-net pod enters a CrashLoop when setting up the 2nd container. This does not allow the new node to exit the NotReady state.

The weave-net pod for the master node looks healthy and has 2/2 RUNNING the entire time.

How to reproduce it?

NOTE - The k8 master and node are Ubuntu 18.04 VMs running on an Ubuntu 19.10 box.

  1. Tear down existing k8 cluster to get to square 1
    • drain and delete all nodes
    • kubeadm reset on all nodes and master
    • On master: delete /etc/cni/net.d and $HOME/.kube/config folders.
  2. On master - run kubeadm init --apiserver-advertise-address=192.168.1.31
    • Run commands the kubeadm says to run at the end to sset up the kubeconfig correctly (mkdir....)
    • Run kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')" to deploy weave
  3. Wait for all pods to correctly come online
  4. Add one node to the cluster with join cmd in the kubeadm output from the master.
  5. On master - run kubeadm get pods --all-namespaces

At this point you can monitor the pods being created. If the issue occurs, the 2nd container in the newest weave-net pod will start crashing and not come online - which keeps the node in a NotReady state.

Anything else we need to know?

I have recreated this issue a few times now to debug - the reproduction rate is not 100% (I have had it happen to me about 4/5 times with the above steps.

NOTE - adding --pod-network-cidr=10.32.0.0/12 to my init command when creating the cluster - I have not had this issue 4/4. I see all pods/containers create as expected.

I opened an issue with K8 thinking this was just a documentation issue (I did not see the CIDR command in the K8 setup docs or in the Weave docs. Opening an issue here since we did not see a CIDR address supplied in the log files when reproducing this bug, but saw one once I got a working cluster up.

One time before trying Weave I setup a Calico network for my cluster, but kept seeing crashing pods with that, so I moved to Weave.

Versions:

KubeCtl:

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:30:10Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:22:30Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}

Weave

Using Weaving CNI plugin for Kubernetes

Docker:

Client:
 Version:           18.09.7
 API version:       1.39
 Go version:        go1.10.1
 Git commit:        2d0083d
 Built:             Fri Aug 16 14:20:06 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.09.7
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.1
  Git commit:       2d0083d
  Built:            Wed Aug 14 19:41:23 2019
  OS/Arch:          linux/amd64
  Experimental:     false

uname -a

Linux kubemaster 5.3.0-26-generic #28~18.04.1-Ubuntu SMP Wed Dec 18 16:40:14 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Logs:

The kube-proxy output once I have setup the master (before adding the node that has the crashing net pod):

apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: ""
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
udpIdleTimeout: 0s
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://192.168.1.31:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2020-01-24T01:15:43Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "238"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: f12e3e4b-73b2-439e-8be9-7289d2bce49a

And the output once I added the one node and started seeing the crashing net pod:

apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: ""
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
udpIdleTimeout: 0s
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://192.168.1.31:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2020-01-24T01:15:43Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "238"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: f12e3e4b-73b2-439e-8be9-7289d2bce49a

And for a compare, here is the output after I add the one node when I supply the CIDR in the init command:

apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: 10.32.0.0/12
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
udpIdleTimeout: 0s
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://192.168.1.31:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2020-01-24T01:27:20Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "242"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: 10ce9974-f193-4b5c-9efb-78c0317746d2
murali-reddy commented 4 years ago

When you specific --pod-network-cidr=10.32.0.0/12 to kubeamd init which will result in passing the specified CIDR to kube-proxy. Which will help kube-proxy to know what is internal and external traffic. That should not in anyway will cause weave-net pods or any pods.

Plese check the logs why the second container which is weave-npc is crashing for you.

ryan-g2 commented 4 years ago

I recreated the issue, here is the description from the crashing weave-net container:

ERROR: logging before flag.Parse: E0128 00:36:32.053295 24752 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:321: Failed to list *v1.Pod: Get https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout

There are many other errors in the logs like the one above - they all say 'timeout' with the same address/port listed.

And here is the log from the crashing weave container:

kubemaster@kubemaster:~/git_repo/test$ kubectl logs -n kube-system pod/weave-net-qhgz5 weave FATA: 2020/01/28 00:29:09.098127 [kube-peers] Could not get peers: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: i/o timeout Failed to get peers

The 10.96.0.1:443 address/port combo maps to my service/kubernetes. And here is the description of that service:

Name: kubernetes Namespace: default Labels: component=apiserver provider=kubernetes Annotations: Selector: Type: ClusterIP IP: 10.96.0.1 Port: https 443/TCP TargetPort: 6443/TCP Endpoints: 192.168.1.31:6443 Session Affinity: None Events: :

And here is the description of the crashing weave pod - just in case.

Name: weave-net-qhgz5 Namespace: kube-system Priority: 0 Node: kube-node-1/192.168.0.11 Start Time: Mon, 27 Jan 2020 16:27:16 -0800 Labels: controller-revision-hash=7f54576664 name=weave-net pod-template-generation=1 Annotations: Status: Running IP: 192.168.0.11 IPs: IP: 192.168.0.11 Controlled By: DaemonSet/weave-net Containers: weave: Container ID: docker://8256c6077ed0b2cf2eefb5d3a359500c87e998140a3043cd7e79f8b9ebade9df Image: docker.io/weaveworks/weave-kube:2.6.0 Image ID: docker-pullable://weaveworks/weave-kube@sha256:e4a3a5b9bf605a7ff5ad5473c7493d7e30cbd1ed14c9c2630a4e409b4dbfab1c Port: Host Port: Command: /home/weave/launch.sh State: Running Started: Mon, 27 Jan 2020 16:28:38 -0800 Last State: Terminated Reason: Error Exit Code: 1 Started: Mon, 27 Jan 2020 16:27:51 -0800 Finished: Mon, 27 Jan 2020 16:28:22 -0800 Ready: False Restart Count: 2 Requests: cpu: 10m Readiness: http-get http://127.0.0.1:6784/status delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: HOSTNAME: (v1:spec.nodeName) Mounts: /host/etc from cni-conf (rw) /host/home from cni-bin2 (rw) /host/opt from cni-bin (rw) /host/var/lib/dbus from dbus (rw) /lib/modules from lib-modules (rw) /run/xtables.lock from xtables-lock (rw) /var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-nbhwn (ro) /weavedb from weavedb (rw) weave-npc: Container ID: docker://e128a80d16db155238c1ce17382de7b68790f9a13942056a023672558b87071e Image: docker.io/weaveworks/weave-npc:2.6.0 Image ID: docker-pullable://weaveworks/weave-npc@sha256:985de9ff201677a85ce78703c515466fe45c9c73da6ee21821e89d902c21daf8 Port: Host Port: State: Running Started: Mon, 27 Jan 2020 16:27:39 -0800 Ready: True Restart Count: 0 Requests: cpu: 10m Environment: HOSTNAME: (v1:spec.nodeName) Mounts: /run/xtables.lock from xtables-lock (rw) /var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-nbhwn (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: weavedb: Type: HostPath (bare host directory volume) Path: /var/lib/weave HostPathType: cni-bin: Type: HostPath (bare host directory volume) Path: /opt HostPathType: cni-bin2: Type: HostPath (bare host directory volume) Path: /home HostPathType: cni-conf: Type: HostPath (bare host directory volume) Path: /etc HostPathType: dbus: Type: HostPath (bare host directory volume) Path: /var/lib/dbus HostPathType: lib-modules: Type: HostPath (bare host directory volume) Path: /lib/modules HostPathType: xtables-lock: Type: HostPath (bare host directory volume) Path: /run/xtables.lock HostPathType: FileOrCreate weave-net-token-nbhwn: Type: Secret (a volume populated by a Secret) SecretName: weave-net-token-nbhwn Optional: false QoS Class: Burstable Node-Selectors: Tolerations: :NoSchedule node.kubernetes.io/disk-pressure:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/network-unavailable:NoSchedule node.kubernetes.io/not-ready:NoExecute node.kubernetes.io/pid-pressure:NoSchedule node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unschedulable:NoSchedule

Events:

Type Reason Age From Message


Normal Scheduled 108s default-scheduler Successfully assigned kube-system/weave-net-qhgz5 to kube-node-1 Normal Pulled 89s kubelet, kube-node-1 Container image "docker.io/weaveworks/weave-npc:2.6.0" already present on machine Normal Created 85s kubelet, kube-node-1 Created container weave-npc Normal Started 83s kubelet, kube-node-1 Started container weave-npc Warning BackOff 40s kubelet, kube-node-1 Back-off restarting failed container Normal Pulled 29s (x3 over 95s) kubelet, kube-node-1 Container image "docker.io/weaveworks/weave-kube:2.6.0" already present on machine Normal Created 26s (x3 over 91s) kubelet, kube-node-1 Created container weave Normal Started 24s (x3 over 89s) kubelet, kube-node-1 Started container weave Warning Unhealthy 6s (x6 over 76s) kubelet, kube-node-1 Readiness probe failed: Get http://127.0.0.1:6784/status: dial tcp 127.0.0.1:6784: connect: connection refus

For a compare, here is the description of the other weave-net pod which is reporting 2/2 Running:

Name: weave-net-gn9vq Namespace: kube-system Priority: 0 Node: kubemaster/192.168.0.10 Start Time: Mon, 27 Jan 2020 16:25:20 -0800 Labels: controller-revision-hash=7f54576664 name=weave-net pod-template-generation=1 Annotations: Status: Running IP: 192.168.0.10 IPs: IP: 192.168.0.10 Controlled By: DaemonSet/weave-net Containers: weave: Container ID: docker://47bc973aa1a9360519bacc4e449102b95e54ea29ceffbe356ad681cd2b33e93e Image: docker.io/weaveworks/weave-kube:2.6.0 Image ID: docker-pullable://weaveworks/weave-kube@sha256:e4a3a5b9bf605a7ff5ad5473c7493d7e30cbd1ed14c9c2630a4e409b4dbfab1c Port: Host Port: Command: /home/weave/launch.sh State: Running Started: Mon, 27 Jan 2020 16:25:29 -0800 Ready: True Restart Count: 0 Requests: cpu: 10m Readiness: http-get http://127.0.0.1:6784/status delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: HOSTNAME: (v1:spec.nodeName) Mounts: /host/etc from cni-conf (rw) /host/home from cni-bin2 (rw) /host/opt from cni-bin (rw) /host/var/lib/dbus from dbus (rw) /lib/modules from lib-modules (rw) /run/xtables.lock from xtables-lock (rw) /var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-nbhwn (ro) /weavedb from weavedb (rw) weave-npc: Container ID: docker://dc826b837d21f0165bc4b4a7f0aaa45f020991a8ae0ad36d36e664d9e4b08e22 Image: docker.io/weaveworks/weave-npc:2.6.0 Image ID: docker-pullable://weaveworks/weave-npc@sha256:985de9ff201677a85ce78703c515466fe45c9c73da6ee21821e89d902c21daf8 Port: Host Port: State: Running Started: Mon, 27 Jan 2020 16:25:33 -0800 Ready: True Restart Count: 0 Requests: cpu: 10m Environment: HOSTNAME: (v1:spec.nodeName) Mounts: /run/xtables.lock from xtables-lock (rw) /var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-nbhwn (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: weavedb: Type: HostPath (bare host directory volume) Path: /var/lib/weave HostPathType: cni-bin: Type: HostPath (bare host directory volume) Path: /opt HostPathType: cni-bin2: Type: HostPath (bare host directory volume) Path: /home HostPathType: cni-conf: Type: HostPath (bare host directory volume) Path: /etc HostPathType: dbus: Type: HostPath (bare host directory volume) Path: /var/lib/dbus HostPathType: lib-modules: Type: HostPath (bare host directory volume) Path: /lib/modules HostPathType: xtables-lock: Type: HostPath (bare host directory volume) Path: /run/xtables.lock HostPathType: FileOrCreate weave-net-token-nbhwn: Type: Secret (a volume populated by a Secret) SecretName: weave-net-token-nbhwn Optional: false QoS Class: Burstable Node-Selectors: Tolerations: :NoSchedule node.kubernetes.io/disk-pressure:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/network-unavailable:NoSchedule node.kubernetes.io/not-ready:NoExecute node.kubernetes.io/pid-pressure:NoSchedule node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unschedulable:NoSchedule

Events: Type Reason Age From Message


Normal Scheduled 23m default-scheduler Successfully assigned kube-system/weave-net-gn9vq to kubemaster Normal Pulled 23m kubelet, kubemaster Container image "docker.io/weaveworks/weave-kube:2.6.0" already present on machine Normal Created 23m kubelet, kubemaster Created container weave Normal Started 23m kubelet, kubemaster Started container weave Normal Pulled 23m kubelet, kubemaster Container image "docker.io/weaveworks/weave-npc:2.6.0" already present on machine Normal Created 23m kubelet, kubemaster Created container weave-npc Normal Started 23m kubelet, kubemaster Started container weave-npc Warning Unhealthy 23m (x2 over 23m) kubelet, kubemaster Readiness probe failed: Get http://127.0.0.1:6784/status: dial tcp 127.0.0.1:6784: connect: connection refused

murali-reddy commented 4 years ago

github.com/weaveworks/weave/prog/weave-npc/main.go:321: Failed to list *v1.Pod: Get https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout

FATA: 2020/01/28 00:29:09.098127 [kube-peers] Could not get peers: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: i/o timeout Failed to get peers

above errors from weave container logs indicate service IP 10.96.0.1 is not accessible on the node, as its fatal conditaion weave-net pod shutsdown. You need to debug why services are not accessible. Do you have kube-proxy running on the node? Check for any errors in kube-proxy logs.

ryan-g2 commented 4 years ago

Yes, kube-proxy is running on the node - Node-1. Here is the log:

kubemaster@kubemaster:~/git_repo/test$ kubectl logs -n kube-system pod/kube-proxy-7d82k W0128 05:05:17.169516 1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy I0128 05:05:17.328464 1 node.go:135] Successfully retrieved node IP: 192.168.0.11 I0128 05:05:17.328527 1 server_others.go:145] Using iptables Proxier. W0128 05:05:17.328913 1 proxier.go:286] clusterCIDR not specified, unable to distinguish between internal and external traffic I0128 05:05:17.329278 1 server.go:571] Version: v1.17.2 I0128 05:05:17.330095 1 conntrack.go:52] Setting nf_conntrack_max to 131072 I0128 05:05:17.331975 1 config.go:313] Starting service config controller I0128 05:05:17.332004 1 shared_informer.go:197] Waiting for caches to sync for service config I0128 05:05:17.332154 1 config.go:131] Starting endpoints config controller I0128 05:05:17.332172 1 shared_informer.go:197] Waiting for caches to sync for endpoints config I0128 05:05:17.432301 1 shared_informer.go:204] Caches are synced for endpoints config I0128 05:05:17.432495 1 shared_informer.go:204] Caches are synced for service config

And the events for the kube-proxy pod on Node-1:

Events:

Type Reason Age From Message


Normal Scheduled 3m59s default-scheduler Successfully assigned kube-system/kube-proxy-7d82k to kube-node-1 Normal Pulled 3m50s kubelet, kube-node-1 Container image "k8s.gcr.io/kube-proxy:v1.17.2" already present on machine Normal Created 3m45s kubelet, kube-node-1 Created container kube-proxy Normal Started 3m42s kubelet, kube-node-1 Started container kube-proxy

ryan-g2 commented 4 years ago

Looking at the kube-poxy logs I see iptables mentioned. Could this have anything to do with the fact I am running all my VMs on an Ubuntu 19.10 system? I read that Weave only likes iptables 1.6 and 19.10 has 1.8.

murali-reddy commented 4 years ago

I read that Weave only likes iptables 1.6 and 19.10 has 1.8.

Its requisite for weave-net pods to be able to reach kubernetes api server to even start as you have noticed. So the real problem is not with weave-net but the service proxy.

you need to debug and ensure kubernetes service IP 10.96.0.1 is accessible from the node

ryan-g2 commented 4 years ago

I'm not sure why weave-npc can't see kube-proxy.

Kube proxy and all the pods associated with it are running. Are there any other logs that would help? I'm new to k8 and Weave so I am not sure what all needs checking.

neolit123 commented 4 years ago

you need to debug and ensure kubernetes service IP 10.96.0.1 is accessible from the node

you can do: kubectl get svc kubernetes which should give you the IP / port of the kubernetes service.

telnet <ip> <port> will then tell you if the node has connection to the service.

ryan-g2 commented 4 years ago

Thanks for the response.

I telnetted from node-1 to the kubernetes service telnet 10.96.0.1 443 and telnet 10.96.0.1 6443 and both attempts just sat there trying with no response.

443 is the port seen open when I use kubectl get svc kubernetes.

neolit123 commented 4 years ago

did it say the following?

Trying 10.96.0.1... Connected to 10.96.0.1.

if yes, the connection is fine. this verifies that the node has connectivity.

neolit123 commented 4 years ago

if no, i have no immediate explanation what can cause that. do you have firewall rules enabled?

ryan-g2 commented 4 years ago

No, it just tried and never connected. I only let it sit maybe less than 20 seconds - I think it would have connected by then if things were ok.

I have a firewall, but I don't think this would touch the firewall at all since the 10.x.x.x addresses are a virtual network hosted on my linux box which has my kube cluster running on it with 3 VMs.

Plus, all this works if I supply the CIDR command when running the initial init command. So I don't need this fixed since I can just remake my cluster, supply the CIDR and have everything work. Maybe a note can be added to the setup process that supplying the CIDR could help people who run into this - for whatever reason.

neolit123 commented 4 years ago

Maybe a note can be added to the setup process that supplying the CIDR could help people who run into this - for whatever reason.

if weavenet does not require a CIDR, but in some cases it does (other than https://www.weave.works/docs/net/latest/kubernetes/kube-addon/#-things-to-watch-out-for), then this is breaking a UX contract and is better to understand the reason.

murali-reddy commented 4 years ago

Plus, all this works if I supply the CIDR command when running the initial init command.

When you specifiy CIDR for kubeadm init, it goes to the --cluster-cidr argument of kube-proxy (please see https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/). This would help kube-proxy to figure what is internal traffic and exntenal traffic.

If you dont specify CIDR, possibly traffic is getting masquraded (not masqurade not SNAT). And if traffic is not going through then it means wrong source IP address is perhaps picked and is unroutable.

This sounds similar to https://github.com/kubernetes/kubeadm/issues/102. Are you using host with multiple interafaces?

ryan-g2 commented 4 years ago

yes, the master node has multiple interfaces:

datapath: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1376 inet6 fe80::10b7:87ff:fe21:f821 prefixlen 64 scopeid 0x20 ether 12:b7:87:21:f8:21 txqueuelen 1000 (Ethernet) RX packets 6788 bytes 498949 (498.9 KB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 2603 bytes 226461 (226.4 KB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255 ether 02:42:d5:e3:f5:16 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.0.10 netmask 255.255.255.0 broadcast 192.168.0.255 inet6 fe80::9bdb:ca2a:31ee:beba prefixlen 64 scopeid 0x20 ether 08:00:27:ab:c8:91 txqueuelen 1000 (Ethernet) RX packets 1119999 bytes 634571114 (634.5 MB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 395661 bytes 145955863 (145.9 MB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.31 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::a00:27ff:fe61:3d37 prefixlen 64 scopeid 0x20 ether 08:00:27:61:3d:37 txqueuelen 1000 (Ethernet) RX packets 3591560 bytes 337031073 (337.0 MB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3193847 bytes 1423071293 (1.4 GB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1000 (Local Loopback) RX packets 195097965 bytes 28953233045 (28.9 GB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 195097965 bytes 28953233045 (28.9 GB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vethwe-bridge: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1376 inet6 fe80::d083:e0ff:fe25:5337 prefixlen 64 scopeid 0x20 ether d2:83:e0:25:53:37 txqueuelen 0 (Ethernet) RX packets 6385 bytes 558005 (558.0 KB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 4089 bytes 357758 (357.7 KB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vethwe-datapath: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1376 inet6 fe80::d483:37ff:fee3:ef7d prefixlen 64 scopeid 0x20 ether d6:83:37:e3:ef:7d txqueuelen 0 (Ethernet) RX packets 4089 bytes 357758 (357.7 KB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 6385 bytes 558005 (558.0 KB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vethwepl7096ee8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1376 inet6 fe80::e434:64ff:febe:f8ea prefixlen 64 scopeid 0x20 ether e6:34:64:be:f8:ea txqueuelen 0 (Ethernet) RX packets 659 bytes 55426 (55.4 KB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 746 bytes 207812 (207.8 KB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vethweplecda175: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1376 inet6 fe80::583f:ecff:fe34:67c8 prefixlen 64 scopeid 0x20 ether 5a:3f:ec:34:67:c8 txqueuelen 0 (Ethernet) RX packets 649 bytes 54957 (54.9 KB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 737 bytes 207364 (207.3 KB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vxlan-6784: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 65535 inet6 fe80::4850:93ff:fe79:a9d0 prefixlen 64 scopeid 0x20 ether 4a:50:93:79:a9:d0 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

weave: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1376 inet 10.32.0.1 netmask 255.240.0.0 broadcast 10.47.255.255 inet6 fe80::48ee:bcff:fe1d:6319 prefixlen 64 scopeid 0x20 ether 4a:ee:bc:1d:63:19 txqueuelen 1000 (Ethernet) RX packets 4836959 bytes 329922997 (329.9 MB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 5229000 bytes 1453684418 (1.4 GB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

And here are the interfaces for the Linux host which is hosting the Master and Worker nodes through VirtualBox:

docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255 ether 02:42:0b:d5:50:03 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.0.30 netmask 255.255.255.0 broadcast 192.168.0.255 inet6 fe80::153:d098:f582:d701 prefixlen 64 scopeid 0x20 ether 18:03:73:1e:27:a9 txqueuelen 1000 (Ethernet) RX packets 3645726 bytes 3221355692 (3.2 GB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1503963 bytes 196857989 (196.8 MB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1000 (Local Loopback) RX packets 27871 bytes 2507853 (2.5 MB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 27871 bytes 2507853 (2.5 MB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vboxnet0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.30 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::800:27ff:fe00:0 prefixlen 64 scopeid 0x20 ether 0a:00:27:00:00:00 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 15743 bytes 1366878 (1.3 MB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

ryan-g2 commented 4 years ago

This also sounds like https://github.com/weaveworks/weave/issues/3363