kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.41k stars 1.55k forks source link

[WSL2] Sync failed errors in kube-proxy for Service with SessionAffinity: ClientIP #1740

Closed valeneiko closed 3 years ago

valeneiko commented 4 years ago

What happened: iptables fail to be updated on the nodes after a Service with sessionAffinity: ClientIP is created. The issue manifests in requests beeing dropped to any Services that were created after the Service with session affinity.

kube-proxy pod is logging the following error:

E0720 14:29:10.934607       1 proxier.go:1507] Failed to execute iptables-restore: exit status 2 (iptables-restore v1.8.3 (legacy): Couldn't load match `recent':No such file or directory

Error occurred at line: 96
Try `iptables-restore -h' or 'iptables-restore --help' for more information.
)
I0720 14:29:10.934636       1 proxier.go:779] Sync failed; retrying in 30s

What you expected to happen: iptables to be updated correctly so that requests could be routed to any Service in the cluster.

How to reproduce it (as minimally and precisely as possible): Create a Service with sessionAffinity: ClientIP

apiVersion: v1
kind: Service
metadata:
  labels:
    alertmanager: main
  name: alertmanager-main
  namespace: monitoring
spec:
  ports:
  - name: web
    port: 9093
    targetPort: web
  selector:
    alertmanager: main
    app: alertmanager
  sessionAffinity: ClientIP

Anything else we need to know?: Issue is reproducible with both kubeProxyMode: iptables (default) and kubeProxyMode: ipvs

Environment:

Server: Containers: 1 Running: 1 Paused: 0 Stopped: 0 Images: 1 Server Version: 19.03.8 Storage Driver: overlay2 Backing Filesystem: Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd init version: fec3683 Security Options: seccomp Profile: default Kernel Version: 4.19.104-microsoft-standard Operating System: Docker Desktop OSType: linux Architecture: x86_64 CPUs: 24 Total Memory: 25GiB Name: docker-desktop ID: D4I2:L4Y5:PGPS:CEUY:H3TU:C33L:HASQ:VZKB:53SE:SHQG:OOQV:BZMQ Docker Root Dir: /var/lib/docker Debug Mode: true File Descriptors: 48 Goroutines: 57 System Time: 2020-07-20T15:35:34.5632783Z EventsListeners: 3 Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false Product License: Community Engine

WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled


- OS: Windows 10 `(Build: 19041.388)`
aojea commented 4 years ago

hmm, I think that is missing one kernel module, If I'm correct it should be xt_recent @PatrickLang you are the WSL2 expert, how is possible to include this module?

BenTheElder commented 4 years ago

kind is not going to mess with your kernel modules so bug => support

If docker desktop is missing a module, that's probably hard to fix as an end user, but they might be willing to seeing as they also offer running the docker desktop VM as a single fixed-version kubernetes node instead of just dockerd.

BenTheElder commented 4 years ago

/kind external

tallaxes commented 4 years ago

Yes, it looks like the current WSL2 Kernel is built without xt_recent, needed by iptables -m recent ... which kube-proxy uses to implement sessionAffinity: ClientIP. Custom Kernel built with CONFIG_NETFILTER_XT_MATCH_RECENT=y fixed it for me. Submitted https://github.com/microsoft/WSL2-Linux-Kernel/pull/198 (4.19.y) and https://github.com/microsoft/WSL2-Linux-Kernel/pull/199 (5.4.y)

BenTheElder commented 4 years ago

thanks @tallaxes !

WSLUser commented 4 years ago

If someone wants to compile the 5.10 LTS kernel for WSL2 with this option enabled, take a look here https://github.com/WSLUser/WSL2-Linux-Kernel/blob/linux-msft-wsl-5.10.y/Microsoft/config-wsl. Follow https://wsl.dev/wsl2-kernel-zfs/ for steps for compiling your own kernel.

hawk29 commented 3 years ago

CONFIG_NETFILTER_XT_MATCH_RECENT=y

I am sorry but a newbie question. I have come across the same issue using docker-desktop. I have downloaded and installed the latest docker-desktop but to no avail. Is there a release where this will be embedded for end-users or do we have to compile on our own?

Client: Debug Mode: false Plugins: scan: Docker Scan (Docker Inc., v0.3.4)

Server: Containers: 86 Running: 80 Paused: 0 Stopped: 6 Images: 24 Server Version: 19.03.13 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd init version: fec3683 Security Options: seccomp Profile: default Kernel Version: 4.19.128-microsoft-standard Operating System: Docker Desktop OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 24.77GiB Name: docker-desktop ID: 4IEZ:4LGJ:N7EI:P4FA:XJYC:5TTB:X7HG:FCPV:BTFP:YTO2:M75E:QDKH Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false Product License: Community Engine

WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled

tallaxes commented 3 years ago

@hawk29 - that would be a question to WSL2 maintainers; as far as I can tell it is not included in any recent releases. (And I don't see any PR merging activity at microsoft/WSL2-Linux-Kernel - so maybe they just don't accept contributions ...)

FWIW, in tallaxes/WSL2-Linux-Kernel fork I have configured GitHub Action to build it, so you should be able to get built Kernel image from there - without worrying about downloading/running "mystery meat" bits - since the build process is transparent. The Kernel image is captured as build artifact - click on build run, scroll to Artifacts, look for bzImage. Then follow instructions for configuring global options in .wslconfig, setting kernel key to point to the custom kernel. (Obviously, use at your own risk, #include <disclamer.h> ...)

k8s-ci-robot commented 3 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/kind/issues/1740#issuecomment-842724515): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
thavlik commented 3 years ago

@tallaxes FYI your build artifact was removed due to age.

tallaxes commented 3 years ago

@thavlik Rebuilt

BenTheElder commented 3 years ago

I don't recall if this existed then but https://kind.sigs.k8s.io/docs/user/using-wsl2/ is where we host what we know needs to be done for WSL2, since the maintainers don't use WSL2 we can really use any missing bits contributed there, https://kind.sigs.k8s.io/docs/contributing/development/#documentation

thanks!

OP: if your issue is not resolved, please file a new one, I've eliminated that bot from this repo, but I think maybe this issue is now stale anyhow 🤔

valeneiko commented 3 years ago

I haven't needed SessionAffinity for a while now, so not sure if the issue is resolved. I can try to to check when I get some time to do so.

@thavlik Did you run into this issue recently? Is it still reproducible?

If so it might be worth adding the information about cusom kernel to the wsl2 docs.

thavlik commented 3 years ago

@thavlik Did you run into this issue recently? Is it still reproducible?

Yes, on both WSL2 and Hyper-V backends I have an issue where a microservice that issues a token is a few seconds ahead of the test code, and the golang JWT library will error if you use a token before it's issued. I worked around it by catching the error in development environments only.

valeneiko commented 3 years ago

I can confirm. The issue is still reproducible. The solution with custom kernel works. I compiled 5.4.72 to check (the version currently used by WSL2).

The soluton

  1. Build a kernel with xt_recent kernel module enabled

    docker run --name wsl-kernel-builder --rm -it ubuntu:latest bash
    
    WSL_COMMIT_REF=linux-msft-5.4.72 # change this line to the version you want to build
    
    # Install dependencies
    apt update
    apt install -y git build-essential flex bison libssl-dev libelf-dev bc
    
    # Checkout WSL2 Kernel repo
    mkdir src
    cd src
    git init
    git remote add origin https://github.com/microsoft/WSL2-Linux-Kernel.git
    git config --local gc.auto 0
    git -c protocol.version=2 fetch --no-tags --prune --progress --no-recurse-submodules --depth=1 origin +${WSL_COMMIT_REF}:refs/remotes/origin/build/linux-msft-wsl-5.4.y
    git checkout --progress --force -B build/linux-msft-wsl-5.4.y refs/remotes/origin/build/linux-msft-wsl-5.4.y
    
    # Enable xt_recent kernel module
    sed -i 's/# CONFIG_NETFILTER_XT_MATCH_RECENT is not set/CONFIG_NETFILTER_XT_MATCH_RECENT=y/' Microsoft/config-wsl
    
    # Compile the kernel 
    make -j2 KCONFIG_CONFIG=Microsoft/config-wsl
    
    # From host terminal copy the built kernel
    docker cp wsl-kernel-builder:/src/arch/x86/boot/bzImage .
  2. Configure WSL to use newly built kernel: https://docs.microsoft.com/en-us/windows/wsl/wsl-config#configure-global-options-with-wslconfig
BenTheElder commented 3 years ago

this is at least documented now, thanks @anyname2. also thanks for https://github.com/microsoft/WSL/issues/7124 to track upstream.

Itnotf commented 1 year ago

thanks @valeneiko