kubernetes-sigs / sig-windows-tools

Repository for tools and artifacts related to the sig-windows charter in Kubernetes. Scripts to assist kubeadm and wincat and flannel will be hosted here.
Apache License 2.0
123 stars 123 forks source link

Kube-proxy update leads to "Cannot find path" errors #337

Closed oscgu closed 1 year ago

oscgu commented 1 year ago

Describe the bug Hello, Im trying to upgrade kube-proxy from 1.23.7 to 1.24.15 and the container just keeps crashlooping with the following error:

Write files so the kubeconfig points to correct locations

    Directory: C:\var\lib

Mode                LastWriteTime         Length Name                                                                  
----                -------------         ------ ----                                                                  
d-----         5/4/2023   6:53 AM                kube-proxy                                                            
Get-Content : Cannot find path 
'C:\hpc\08e67f231fac02709e443f7e8e688399e8eab64e9c633e994f2e5bc889766dbd\mounts\var\lib\kube-proxy\kubeconfig.conf' 
because it does not exist.
At C:\hpc\08e67f231fac02709e443f7e8e688399e8eab64e9c633e994f2e5bc889766dbd\kube-proxy\start.ps1:55 char:3
+ ((Get-Content -path $env:CONTAINER_SANDBOX_MOUNT_POINT/mounts/var/lib ...
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (C:\hpc\08e67f23...kubeconfig.conf:String) [Get-Content], ItemNotFoundEx 
   ception
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand

I also tried to find the path on the host myself but its not there. I also tried using higher versions and updated containerd from 1.6.20 to 1.7, but still get this error.

To Reproduce Try updating kube-proxy on k8s 1.24.15 from 1.23.7

Expected behavior Kube-proxy should work after the update

Kubernetes (please complete the following information):

Additional context Im running a mixed cluster (linux & windows) and currently trying to get it up to date. Also Im using the flannel deployment prior to this PR since it didnt work

Mik4sa commented 1 year ago

The images are outdated/no longer fits to the manifests in the master branch. If you want to use the latest of both you are advised to create your own images and use these. I did so and I'm running successfully with flannel 0.21.5 and kube-proxy 1.27.1

oscgu commented 1 year ago

Im using a self-built image for flannel which works fine but kube-proxy still doesnt work. This is the image I built: oguertlertt/kube-proxy:v1.24.15-flannel-hostprocess. Also building v1.23.7 myself leads to this error even though it works with the sigwindowstools image

Mik4sa commented 1 year ago

Since #297 containerD 1.7 is necessary. You could try to build an image before this if you want to stay with containerD 1.6.x.

oscgu commented 1 year ago

Also getting the issue with containerd 1.7, it's just the start of the path that slightly differs in the error. With 1.7 the path it tries starts with C:\hpc\ and with 1.6 it's C:\C\ but same error apart from that. Nonetheless will see if it works if I build it before the PR you mentioned

oscgu commented 1 year ago

Looks like I did have an invalid path in my deployment file from all the testing 😓 Kube-proxy is still crashlooping but for a different reason now:

Write files so the kubeconfig points to correct locations

    Directory: C:\var\lib

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----       24.05.2023     11:21                kube-proxy
Finding sourcevip
Cannot index into a null array.
At C:\C\708cdd1699ae0925b888a355014d8780b68733f323e30ba92be256d071303366\kube-proxy\start.ps1:14 char:17
+ ...                $vip = $sourceVipJSONData.ips[0].address.Split("/")[0]
+                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : NullArray

When I look into the sourceVip.json file I see the following:

{
    "code": 999,
    "msg": "failed to allocate for range 0: 10.244.1.2 has been allocated to dummy, duplicate allocation is not allowed"
}

I already tried to delete the files and restarted the kube-proxy pods but still getting this error. This is what sourceVipRequest.json looks like:

        {"cniVersion": "0.3.0", "name": "flannel.4096", "ipam":{"type":"host-local","ranges":[[{"subnet":"10.244.1.0/24"}]],"dataDir":"/var/lib/cni/networks"}}
oscgu commented 1 year ago

Deleting C:\var\lib\cni\networks\flannel.4096 seems to have resolved above issue

Mik4sa commented 1 year ago

So is there still any problem?

oscgu commented 1 year ago

Everything works now, thanks!