kubernetes-sigs / sig-windows-tools

Repository for tools and artifacts related to the sig-windows charter in Kubernetes. Scripts to assist kubeadm and wincat and flannel will be hosted here.
Apache License 2.0
123 stars 123 forks source link

no connection to service net in windows pod when useing flannel vxlan (overlay) network #340

Closed uli-fischer closed 3 months ago

uli-fischer commented 1 year ago

Describe the bug I've problem when installing a Windows node.

  1. Is there a typo in the documentation? in the Guide for flannel guides/flannel.md was a refferenz to install flannel for WIndows:

    controlPlaneEndpoint=$(kubectl get configmap -n kube-system kube-proxy -o jsonpath="{.data['kubeconfig\.conf']}" | grep server: | sed 's/.*\:\/\///g')
    kubernetesServiceHost=$(echo $controlPlaneEndpoint | cut -d ":" -f 1)
    kubernetesServicePort=$(echo $controlPlaneEndpoint | cut -d ":" -f 2)
    curl -L https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/hostprocess/flannel/flanneld/flannel-overlay.yml | sed 's/FLANNEL_VERSION/v0.21.5/g' | sed "s/KUBERNETES_SERVICE_HOST_VALUE/$kubernetesServiceHost/g" | sed "s/KUBERNETES_SERVICE_PORT_VALUE/$kubernetesServicePort/g" | kubectl apply -f -

    It reffers to Version v0.21.5 but the newes version i could found is Version v0.14.0-hostprocess i changed it to mik4sa/flannel:v0.21.5 but not clear if this is correct. with this change the Proxy and Host Process is up. but unfortunately is fail in the next error no connection to Service LAN

  2. when i Install the Node and start a pod (see config) i could ping alls Networks expected the Service LAN. so the DNS is not working.

To Reproduce Do on a running Cluster:

$curl -L https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/hostprocess/flannel/flanneld/flannel-overlay.yml | sed 's/sigwindowstools\/flannel:FLANNEL_VERSION/mik4sa\/flannel:v0.21.5/g' | kubectl apply -f -
$curl -L https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/hostprocess/flannel/kube-proxy/kube-proxy.yml | sed 's/KUBE_PROXY_VERSION/v1.27.3/g' | kubectl apply -f -
$kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/hostprocess/flannel/flanneld/kube-flannel-rbac.yml
$ kubectl get pods -n kube-flannel
NAME                                  READY   STATUS    RESTARTS        AGE
kube-flannel-ds-8vvv2                 1/1     Running   1 (14d ago)     14d
kube-flannel-ds-94v42                 1/1     Running   1 (4d16h ago)   14d
kube-flannel-ds-hhzhk                 1/1     Running   0               14d
kube-flannel-ds-windows-amd64-4wkmb   1/1     Running   0               23h
 $ kubectl describe pod kube-flannel-ds-windows-amd64-4wkmb -n kube-flannel
Name:             kube-flannel-ds-windows-amd64-4wkmb
Namespace:        kube-flannel
Priority:         0
Service Account:  flannel
Node:             k8t-win-node-1/10.10.13.204
Start Time:       Fri, 04 Aug 2023 18:37:22 +0200
Labels:           app=flannel
                  controller-revision-hash=64d67796cc
                  pod-template-generation=8
                  tier=node
Annotations:      <none>
Status:           Running
IP:               10.10.13.204
IPs:
  IP:           10.10.13.204
Controlled By:  DaemonSet/kube-flannel-ds-windows-amd64
Containers:
  kube-flannel:
    Container ID:   containerd://7b86da67e60a8c0d41b0ecdb6523aa84b542dd85f3c7345ec89ab288e44ca331
    Image:          mik4sa/flannel:v0.21.5-hostprocess
    Image ID:       docker.io/mik4sa/flannel@sha256:71b187a72810d9da27d304bbe8557487c69e95c60942f43940074e0d8caecf96
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Fri, 04 Aug 2023 18:37:24 +0200
    Ready:          True
    Restart Count:  0
    Environment:
      CNI_BIN_PATH:             C:\\opt\\cni\\bin
      CNI_CONFIG_PATH:          C:\\etc\\cni\\net.d
      SERVICE_SUBNET:           10.96.0.0/12
      KUBERNETES_SERVICE_HOST:  10.10.13.201
      KUBERNETES_SERVICE_PORT:  6443
      POD_NAME:                 kube-flannel-ds-windows-amd64-4wkmb (v1:metadata.name)
      POD_NAMESPACE:            kube-flannel (v1:metadata.namespace)
    Mounts:
      /mounts/kube-flannel-windows/ from flannel-windows-cfg (rw)
      /mounts/kube-flannel/ from flannel-cfg (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sdkzs (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  flannel-cfg:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-flannel-cfg
    Optional:  false
  flannel-windows-cfg:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-flannel-windows-cfg
    Optional:  false
  kube-api-access-sdkzs:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 :NoSchedule op=Exists
                             :NoExecute op=Exists
                             CriticalAddonsOnly op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:                      <none>

Uses this Testpod Config:

$ cat winTest.yaml
# windows-pod-with-pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-claim
spec:
  storageClassName: synology-iscsi-win
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1G
---
apiVersion: v1
kind: Pod
metadata:
  name: my-windows-pod
spec:
  containers:
  - name: windows-server-container
    image: mcr.microsoft.com/windows/servercore:ltsc2019
    command:
    - powershell.exe
    args:
    - "-NoLogo"
    - "-Command"
    - "while ($true) { Write-Host 'Hello from Windows Server 2019'; Start-Sleep -Seconds 5 }"
    volumeMounts:
    - name: my-pvc-volume
      mountPath: "D:"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/os
            operator: In
            values:
            - windows
  volumes:
  - name: my-pvc-volume
    persistentVolumeClaim:
      claimName: test-claim

and make this Tests

$ kubectl get pods  -n kube-system -o=wide
NAME                                   READY   STATUS    RESTARTS          AGE   IP             NODE             NOMINATED NODE   READINESS GATES
coredns-f47c568f5-l4twx                1/1     Running   0                 7d    10.244.0.20    k8t-master-1     <none>           <none>
coredns-f47c568f5-wzpcv                1/1     Running   0                 7d    10.244.1.31    k8t-node-1       <none>           <none>
etcd-k8t-master-1                      1/1     Running   2 (14d ago)       33d   10.10.13.201   k8t-master-1     <none>           <none>
kube-apiserver-k8t-master-1            1/1     Running   368               14d   10.10.13.201   k8t-master-1     <none>           <none>
kube-controller-manager-k8t-master-1   1/1     Running   3 (14d ago)       33d   10.10.13.201   k8t-master-1     <none>           <none>
kube-proxy-4zzsq                       1/1     Running   1 (14d ago)       33d   10.10.13.202   k8t-node-1       <none>           <none>
kube-proxy-7lkjg                       1/1     Running   2 (14d ago)       33d   10.10.13.201   k8t-master-1     <none>           <none>
kube-proxy-8djmb                       1/1     Running   2 (4d16h ago)     33d   10.10.13.203   k8t-node-2       <none>           <none>
kube-proxy-windows-9rqgv               1/1     Running   5 (13d ago)       14d   10.10.13.204   k8t-win-node-1   <none>           <none>
kube-scheduler-k8t-master-1            1/1     Running   3 (14d ago)       33d   10.10.13.201   k8t-master-1     <none>           <none>
snapshot-controller-9695c8478-4xbdj    1/1     Running   438 (4d16h ago)   31d   10.244.2.33    k8t-node-2       <none>           <none>
snapshot-controller-9695c8478-cn6lt    1/1     Running   1 (14d ago)       31d   10.244.1.18    k8t-node-1       <none>           <none>
$ kubectl get pods  -n kube-flannel -o=wide
NAME                                  READY   STATUS    RESTARTS        AGE   IP             NODE             NOMINATED NODE   READINESS GATES
kube-flannel-ds-8vvv2                 1/1     Running   1 (14d ago)     14d   10.10.13.201   k8t-master-1     <none>           <none>
kube-flannel-ds-94v42                 1/1     Running   1 (4d16h ago)   14d   10.10.13.203   k8t-node-2       <none>           <none>
kube-flannel-ds-hhzhk                 1/1     Running   0               14d   10.10.13.202   k8t-node-1       <none>           <none>
kube-flannel-ds-windows-amd64-4wkmb   1/1     Running   0               23h   10.10.13.204   k8t-win-node-1   <none>           <none>
$kubectl exec -it my-windows-pod -- powershell
PS C:\> ipconfig /all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : my-windows-pod
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : default.svc.cluster.local
                                       svc.cluster.local
                                       cluster.local

Ethernet adapter vEthernet (5195b5da3e3bb0b8f92bcbdfce384d2c2a7eac5e55220691f91bdb64dd671f1a_flannel.4096):

   Connection-specific DNS Suffix  . : default.svc.cluster.local
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter #4
   Physical Address. . . . . . . . . : 00-15-5D-34-6F-58
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::3c2d:f9da:9aec:d253%29(Preferred)
   IPv4 Address. . . . . . . . . . . : 10.244.4.28(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 10.244.4.1
   DNS Servers . . . . . . . . . . . : 10.96.0.10
   NetBIOS over Tcpip. . . . . . . . : Disabled
   Connection-specific DNS Suffix Search List :
                                       default.svc.cluster.local
                                       svc.cluster.local
                                       cluster.local
PS C:\> nslookup www.google.de
DNS request timed out.
    timeout was 2 seconds.
Server:  UnKnown
Address:  10.96.0.10

DNS request timed out.
    timeout was 2 seconds.
PS C:\> nslookup www.google.de 10.10.13.1    # 10.10.13.1 is my external dns
Server:  UnKnown
Address:  10.10.13.1

Non-authoritative answer:
Name:    www.google.de
Addresses:  2a00:1450:4016:80c::2003
          172.217.16.163
PS C:\> nslookup www.google.de 10.244.0.20
Server:  10-244-0-20.kube-dns.kube-system.svc.cluster.local
Address:  10.244.0.20

Non-authoritative answer:
Name:    www.google.de
Addresses:  2a00:1450:4016:80c::2003
          172.217.16.163
PS C:\>  Test-NetConnection -ComputerName 10.96.0.10 -Port 53 -InformationLevel Detailed
WARNING: TCP connect to (10.96.0.10 : 53) failed
WARNING: Ping to 10.96.0.10 failed with status: TimedOut

ComputerName            : 10.96.0.10
RemoteAddress           : 10.96.0.10
RemotePort              : 53
NameResolutionResults   : 10.96.0.10
MatchingIPsecRules      :
NetworkIsolationContext :
InterfaceAlias          : vEthernet (5195b5da3e3bb0b8f92bcbdfce384d2c2a7eac5e55220691f91bdb64dd671f1a_flannel.4096)
SourceAddress           : 10.244.4.28
NetRoute (NextHop)      : 10.244.4.1
PingSucceeded           : False
PingReplyDetails (RTT)  : 0 ms
TcpTestSucceeded        : False
PS C:\>  Test-NetConnection -ComputerName 10.10.13.1 -Port 53 -InformationLevel Detailed

ComputerName            : 10.10.13.1
RemoteAddress           : 10.10.13.1
RemotePort              : 53
NameResolutionResults   : 10.10.13.1
MatchingIPsecRules      :
NetworkIsolationContext :
InterfaceAlias          : vEthernet (5195b5da3e3bb0b8f92bcbdfce384d2c2a7eac5e55220691f91bdb64dd671f1a_flannel.4096)
SourceAddress           : 10.244.4.28
NetRoute (NextHop)      : 10.244.4.1
TcpTestSucceeded        : True
PS C:\>  Test-NetConnection -ComputerName 10.244.0.20 -Port 53 -InformationLevel Detailed

ComputerName            : 10.244.0.20
RemoteAddress           : 10.244.0.20
RemotePort              : 53
NameResolutionResults   : 10.244.0.20
MatchingIPsecRules      :
NetworkIsolationContext :
InterfaceAlias          : vEthernet (5195b5da3e3bb0b8f92bcbdfce384d2c2a7eac5e55220691f91bdb64dd671f1a_flannel.4096)
SourceAddress           : 10.244.4.28
NetRoute (NextHop)      : 10.244.4.1
TcpTestSucceeded        : True

Expected behavior PS C:> nslookup www.google.de should work.

Kubernetes (please complete the following information):

WindowsVersion

1809

Outside POD: 

PS C:> Get-ComputerInfo | Select-Object WindowsVersion

WindowsVersion

1809

 - Kubernetes Version:

$ kubectl version --output=yaml clientVersion: buildDate: "2023-06-14T09:53:42Z" compiler: gc gitCommit: 25b4e43193bcda6c7328a6d147b1fb73a33f1598 gitTreeState: clean gitVersion: v1.27.3 goVersion: go1.20.5 major: "1" minor: "27" platform: linux/arm64 kustomizeVersion: v5.0.1 serverVersion: buildDate: "2023-06-14T09:47:40Z" compiler: gc gitCommit: 25b4e43193bcda6c7328a6d147b1fb73a33f1598 gitTreeState: clean gitVersion: v1.27.3 goVersion: go1.20.5 major: "1" minor: "27" platform: linux/arm64

 - CNI:

$ kubectl get daemonsets -n kube-flannel -o=wide NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR kube-flannel-ds 3 3 3 3 3 14d kube-flannel docker.io/flannel/flannel:v0.22.0 app=flannel kube-flannel-ds-windows-amd64 1 1 1 1 1 14d kube-flannel mik4sa/flannel:v0.21.5-hostprocess app=flannel


**Additional context**
I'll try to test it with [uweerikmartin/flannel](https://hub.docker.com/r/uweerikmartin/flannel/tags) but have no success get an error:

$ kubectl logs kube-flannel-ds-windows-amd64-qtgvk -n kube-flannel Copying SDN CNI binaries to host

Directory: C:\opt\cni

Mode LastWriteTime Length Name


d----- 7/2/2023 3:25 PM bin copy flannel config

Directory: C:\etc

Mode LastWriteTime Length Name


d----- 7/2/2023 4:38 PM kube-flannel

Directory: C:\etc\kube-flannel

Mode LastWriteTime Length Name


-a---- 8/5/2023 9:16 AM 109 net-conf.json

Directory: C:\hpc\mounts\kube-flannel

Mode LastWriteTime Length Name


d----- 8/5/2023 9:16 AM ..2023_08_05_16_16_31.325047069 d----l 8/5/2023 9:16 AM ..data -a---l 8/5/2023 9:16 AM 0 cni-conf.json -a---l 8/5/2023 9:16 AM 0 net-conf.json update cni config get-content : Cannot find path 'C:\hpc\mounts\kubeadm-config\ClusterConfiguration' because it does not exist. At C:\hpc\flannel\start.ps1:18 char:18

uli-fischer commented 1 year ago

tested with oguertlertt/flannel:v0.22.0. Same Problem. 👎 no clue what it's wrong.

$ kubectl get daemonsets -n kube-flannel -o=wide
NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE   CONTAINERS     IMAGES                                    SELECTOR
kube-flannel-ds                 3         3         3       3            3           <none>          14d   kube-flannel   docker.io/flannel/flannel:v0.22.0         app=flannel
kube-flannel-ds-windows-amd64   1         1         1       1            1           <none>          14d   kube-flannel   oguertlertt/flannel:v0.22.0-hostprocess   app=flannel

get this errors in the Logfile od the Hostprocess:


$ kubectl logs kube-flannel-ds-windows-amd64-bf4bc -n kube-flannel -f
Copying SDN CNI binaries to host

    Directory: C:\opt\cni

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----         7/2/2023   3:25 PM                bin
copy flannel config

    Directory: C:\etc

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----         7/2/2023   4:38 PM                kube-flannel

    Directory: C:\etc\kube-flannel

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----         8/5/2023   9:39 AM            109 net-conf.json

    Directory: C:\hpc\mounts\kube-flannel

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----         8/5/2023   9:45 AM                ..2023_08_05_16_45_21.1802066358
d----l         8/5/2023   9:45 AM                ..data
-a---l         8/5/2023   9:45 AM              0 cni-conf.json
-a---l         8/5/2023   9:45 AM              0 net-conf.json
update cni config

    Directory: C:\etc\cni

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----         7/2/2023   4:38 PM                net.d
add route
The route addition failed: The object already exists.

envs
kube-flannel-ds-windows-amd64-bf4bc
kube-flannel
Starting flannel
I0805 09:45:24.184051  246316 main.go:212] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[10.10.13.204] ifaceRegex:[] ipMasq:false ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true useMultiClusterCidr:false}
W0805 09:45:24.186802  246316 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0805 09:45:24.254216  246316 kube.go:486] Starting kube subnet manager
I0805 09:45:24.254216  246316 kube.go:145] Waiting 10m0s for node controller to sync
I0805 09:45:24.267665  246316 kube.go:507] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.244.0.0/24]
I0805 09:45:24.267665  246316 kube.go:507] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.244.1.0/24]
I0805 09:45:24.267665  246316 kube.go:507] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.244.2.0/24]
I0805 09:45:24.267665  246316 kube.go:507] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.244.4.0/24]
I0805 09:45:25.255069  246316 kube.go:152] Node controller sync successful
I0805 09:45:25.255069  246316 main.go:232] Created subnet manager: Kubernetes Subnet Manager - k8t-win-node-1
I0805 09:45:25.255069  246316 main.go:235] Installing signal handlers
I0805 09:45:25.255713  246316 main.go:543] Found network config - Backend type: vxlan
I0805 09:45:25.256988  246316 match.go:73] Searching for interface using 10.10.13.204
I0805 09:45:25.272671  246316 match.go:259] Using interface with name vEthernet (Ethernet) and address 10.10.13.204
I0805 09:45:25.275260  246316 match.go:281] Defaulting external address to interface address (10.10.13.204)
I0805 09:45:25.275327  246316 vxlan_windows.go:126] VXLAN config: Name=flannel.4096 MacPrefix=0E-2A VNI=4096 Port=4789 GBP=false DirectRouting=false
time="2023-08-05T09:45:25-07:00" level=info msg="HCN feature check" supportedFeatures="{Acl:{AclAddressLists:true AclNoHostRulePriority:true AclPortRanges:true AclRuleId:true} Api:{V1:true V2:true} RemoteSubnet:true HostRoute:true DSR:true Slash32EndpointPrefixes:true AclSupportForProtocol252:false SessionAffinity:false IPv6DualStack:false SetPolicy:false VxlanPort:false L4Proxy:true L4WfpProxy:false TierAcl:false NetworkACL:false NestedIpSet:false}" version="{Major:9 Minor:5}"
I0805 09:45:25.354942  246316 kube.go:507] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.244.4.0/24]
I0805 09:45:25.354994  246316 device_windows.go:103] Found existing HostComputeNetwork flannel.4096
I0805 09:45:25.381913  246316 main.go:408] Changing default FORWARD chain policy to ACCEPT
I0805 09:45:25.383161  246316 kube.go:507] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.244.4.0/24]
I0805 09:45:25.383739  246316 main.go:436] Wrote subnet file to /run/flannel/subnet.env
I0805 09:45:25.383739  246316 main.go:440] Running backend.
I0805 09:45:25.383739  246316 vxlan_network_windows.go:63] Watching for new subnet leases
I0805 09:45:25.383739  246316 subnet.go:159] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xaf40000, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xa0a0dc9, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x34, 0x30, 0x39, 0x36, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x38, 0x32, 0x3a, 0x62, 0x66, 0x3a, 0x66, 0x32, 0x3a, 0x33, 0x65, 0x3a, 0x30, 0x65, 0x3a, 0x62, 0x33, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0805 09:45:25.383739  246316 subnet.go:159] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xaf40100, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xa0a0dca, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x34, 0x30, 0x39, 0x36, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x64, 0x32, 0x3a, 0x30, 0x33, 0x3a, 0x35, 0x62, 0x3a, 0x34, 0x32, 0x3a, 0x32, 0x30, 0x3a, 0x33, 0x39, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0805 09:45:25.388437  246316 subnet.go:159] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xaf40200, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xa0a0dcb, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x34, 0x30, 0x39, 0x36, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x38, 0x32, 0x3a, 0x61, 0x33, 0x3a, 0x35, 0x39, 0x3a, 0x36, 0x63, 0x3a, 0x63, 0x36, 0x3a, 0x62, 0x61, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0805 09:45:25.426776  246316 main.go:461] Waiting for all goroutines to exit
....
Mik4sa commented 1 year ago

I have no time this weekend to assist you. I haven't read your description deeply tbh, but try to:

  1. build your own images for flannel and kube-proxy (I'm using v0.21.5 for flannel currently)
  2. Exactly follow the guide step by step but use your own images
  3. Then everything should work

Note: The RBAC file you are executing is no longer needed and should be deleted from the cluster when upgrading

uli-fischer commented 1 year ago

Okay i'll check this but unfortunately i a very newbee in Kubernetes/docker so build a Windows Image ist difficult :-) but i'll try this and come back to you

Note: The RBAC file you are executing is no longer needed and should be deleted from the cluster when upgrading i've seen this and check if the RBAC is the correct one.

do you know why there is no newer version in sigwindowstools? and if this is en error in the documentation

iankingori commented 1 year ago

@uli-fischer I built this recently: docker.io/syck0/flannel:v0.21.5-hostprocess. It works for me. Try it out

FangKee commented 1 year ago

use your own images or use [Mik4sa] v0.21.5 for flannel currently Please see #336

uli-fischer commented 1 year ago

OK I've tested it with all versions mentioned. No changes here. Upon further investigation, I found this error in the Windows kube proxy. I think that could be the error, but have no idea what's wrong.

I0822 07:12:51.168375   23324 config.go:133] "Calling handler.OnEndpointSliceUpdate"
I0822 07:13:01.157594   23324 config.go:133] "Calling handler.OnEndpointSliceUpdate"
I0822 07:13:11.162851   23324 config.go:133] "Calling handler.OnEndpointSliceUpdate"
I0822 07:13:14.464328   23324 hns.go:135] "Queried endpoints from network" network="flannel.4096"
I0822 07:13:14.464441   23324 hns.go:136] "Queried endpoints details" network="flannel.4096" endpointInfos=map[10.244.7.3:10.244.7.3:0 8f57c1ba-d61d-4a9c-9a92-5dadf07250dc:10.244.7.3:0]
I0822 07:13:14.464441   23324 hns.go:306] "Queried load balancers" count=0
E0822 07:13:14.477518   23324 proxier.go:1236] "Source Vip endpoint creation failed" err="hcnCreateEndpoint failed in Win32: IP address is either invalid or not part of any configured subnet(s). (0x803b001e) {\"Success\":false,\"Error\":\"IP address is either invalid or not part of any configured subnet(s). \",\"ErrorCode\":2151350302}"
I0822 07:13:14.477693   23324 proxier.go:1177] "Syncing proxy rules complete" elapsed="18.6334ms"
I0822 07:13:14.477693   23324 bounded_frequency_runner.go:296] sync-runner: ran, next possible in 1s, periodic in 30s
uli-fischer commented 8 months ago

Hi Pexeus Sorry ive not find an solution for this up to now. My next step ist to buld my own Images and try it once mor. biut no time up to now. if you find a solution pleas let me Know

Mik4sa commented 8 months ago

How did you guys initialized your cluster with kubeadm? Do you still have the exact command?

uli-fischer commented 8 months ago

Hi

as dokumentet i've don this sudo kubeadm init --pod-network-cidr=10.244.0.0/16 on the debian Master node

Mik4sa commented 8 months ago

Hmm this is actually the same command I used

Zombro commented 8 months ago

troubleshooting this issue as well... trying out a bunch of new stuff... considering refactoring my setup to host-gw...

as for this error in kube proxy :

E0822 07:13:14.477518   23324 proxier.go:1236] "Source Vip endpoint creation failed" err="hcnCreateEndpoint failed in Win32: IP address is either invalid or not part of any configured subnet(s). (0x803b001e) {\"Success\":false,\"Error\":\"IP address is either invalid or not part of any configured subnet(s). \",\"ErrorCode\":2151350302}"

check the kube-proxy start script. did you unjoin and rejoin your windows worker node to the cluster ? flannel probably decided to pick a new 10.244.X.0/24 subnet for your node. the logic in the script checks for an existing file:

https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/hostprocess/flannel/kube-proxy/start.ps1#L9-L10

try deleting contents in C:\sourcevip and restart windows kube-proxy

Zombro commented 8 months ago

as for the reported issue, the following stands out to me. inside of the test my-windows-pod, the vEthernet adapter looks configured properly...

PS C:\> ipconfig /all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : my-windows-pod
   Primary Dns Suffix  . . . . . . . : 
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : development.svc.cluster.local
                                       svc.cluster.local
                                       cluster.local

Ethernet adapter vEthernet (d804a0f1ccc4bceb0754f85022a8a16fb9db520b948689f7a4d9ba4b26c44082_flannel.4096):

   Connection-specific DNS Suffix  . : development.svc.cluster.local
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Container Adapter #4
   Physical Address. . . . . . . . . : 00-15-5D-CD-19-B6
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::be82:2ea:9ff1:e51f%53(Preferred) 
   IPv4 Address. . . . . . . . . . . : 10.244.11.7(Preferred) 
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 10.244.11.1
   DNS Servers . . . . . . . . . . . : 10.96.0.10
   NetBIOS over Tcpip. . . . . . . . : Disabled
   Connection-specific DNS Suffix Search List :
                                       development.svc.cluster.local
                                       svc.cluster.local
                                       cluster.local

... but the routes are screwed up. i expect to see something for 10.244.0.0/16 at least in if53

PS C:\> Get-NetRoute

ifIndex DestinationPrefix                              NextHop                                  RouteMetric ifMetric PolicyStore
------- -----------------                              -------                                  ----------- -------- -----------
53      255.255.255.255/32                             0.0.0.0                                          256 25       ActiveStore
52      255.255.255.255/32                             0.0.0.0                                          256 75       ActiveStore
53      224.0.0.0/4                                    0.0.0.0                                          256 25       ActiveStore
52      224.0.0.0/4                                    0.0.0.0                                          256 75       ActiveStore
52      127.255.255.255/32                             0.0.0.0                                          256 75       ActiveStore
52      127.0.0.1/32                                   0.0.0.0                                          256 75       ActiveStore
52      127.0.0.0/8                                    0.0.0.0                                          256 75       ActiveStore
53      10.244.11.255/32                               0.0.0.0                                          256 25       ActiveStore
53      10.244.11.7/32                                 0.0.0.0                                          256 25       ActiveStore
53      10.244.11.0/24                                 0.0.0.0                                          256 25       ActiveStore
53      0.0.0.0/0                                      10.244.11.1                                      256 25       ActiveStore

everything linux is working fine.

my theory is

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 3 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/sig-windows-tools/issues/340#issuecomment-2118554662): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.