microsoft / SDN

This repo includes PowerShell scripts and VMM service templates for setting up the Microsoft Software Defined Networking (SDN) Stack using Windows Server 2016
Other
485 stars 541 forks source link

flannel doesn't work properly on windows node following readme #256

Closed WenlongWang closed 4 years ago

WenlongWang commented 5 years ago

flanneld and flannel cni plugin uses a env file with args --subnet-file witch defaults to the location "/run/flannel/subnet.env",but this is not a windows-style path. Aparently, the guid here uses the default value, so after following this readme to build a kubernetes windows cluster and start a windows app, error like "createPodSandbox for pod xxx failed...... network: open /run/flannel/subnet.env: The system cannot find the path specified." I have to edit "C:\k\cni\config\cni.conf" to add subnetFile location manually, this should be added to the start-kubelet.ps1 or at least a reminder should be in the readme.

mgroux520 commented 5 years ago

So I am having a similar problem/error. First, I don't have a subnet.env file on my Windows node - only on the Linux Master. Are you manually placing a file somewhere in the filesystem on that Windows node ?
Also, whats the format/naming you are using (i.e. json callouts) for the file location in the cni.conf file on the Windows node ? I could not find any documentation on configuring the cni.conf file on a Windows node, let alone what the subnet.env file setting should be, and the few things I tried didnt' make a difference. Edit: So I did a bit more digging, and found that the kubelet (1.11.3) for Windows has cni config flags/options - you can see some of them in use in the start-kubelet.ps1 file. So I'm still trying to find a solution to this specific error...

alfianabdi commented 5 years ago

I find similar issue with windows server 1709, but then I start cluster from 0 again (by deleting etcd), then using windows server 1803 it works like charm, iis pod is succesfully deployed in windows worker, get ip address as specified using cni flannel, my problem now is, the multi-host networking does not work. I had alpine container in host 1 (linux) with ip address 10.33.0.180 and iis container in host 2 (windows 1803), but they cannot communicate each other. where should I start debugging? in fact, the windows container cannot connect to anywhere.

pablodav commented 5 years ago

Probably comment int: https://github.com/pablodav/kubernetes-for-windows/issues/13#issuecomment-425274501 could help.

daschott commented 5 years ago

@alfianabdi did you enable MAC spoofing or promiscuity mode on your VM network adapter?

@WenlongWang I have never hit this issue myself, but you are not the only person reporting it. Can you copy /run/flannel/subnet.env from master to c:/run/flannel/subnet.env?

We are working on improving docs and I have a PR open. However the main time squeeze right now until 1.13 coding milestone is making sure the base network plugins are stable on WS2019 and conformance tests are passing so that we can graduate from beta.

dz-pyps commented 5 years ago

@daschott Copying /run/flannel/subnet.env to c:/run/flannel/subnet.env on a Windows Worker node (1803) actually helped in my case. I followed an official Microsoft guide https://onedrive.live.com/view.aspx?resid=E2B6765015E5FA01!339&ithint=file%2cdocx&app=Word&authkey=!AGvs_s_hWs7xHGs of how to add a windows worker and it wasn't mentioned there. Adding an env file and keeping track of flannel networks assigned to each windows node seems very confusing. In addition hostgw_windows.go correctly allocates /24 subnet from a flannel's range.

daschott commented 4 years ago

sorry for the long delay here... We have made a number of improvements since last year such that Windows Servers are supported on Kubernetes v1.14 or above. Can you confirm you continue seeing this issue on cluster runnings K8s v1.14 or above? Otherwise, the first place I would check is that all the CIDRs are correct & in place, and that Flannel is configured correctly:

kubectl exec -n kube-system kube-flannel-ds-amd64-<someid> cat /etc/kube-flannel/net-conf.json
kubectl exec -n kube-system kube-flannel-ds-amd64-<someid> cat /etc/kube-flannel/cni-conf.json

Please also make sure you are using Flannel v0.11 or above.

Also, FYI NodePort access from the node itself fails on Windows: https://docs.microsoft.com/en-us/virtualization/windowscontainers/kubernetes/common-problems#my-windows-node-cannot-access-a-nodeport-service

More information on how to troubleshoot Kubernetes networking can be found here: https://techcommunity.microsoft.com/t5/Networking-Blog/Troubleshooting-Kubernetes-Networking-on-Windows-Part-1/ba-p/508648