weaveworks / weave

Simple, resilient multi-host containers networking and more.
https://www.weave.works
Apache License 2.0
6.62k stars 671 forks source link

Difficulties trying to reactivate fastdatapath when changing WEAVE_NO_FASTDP #4004

Open arthurzenika opened 7 months ago

arthurzenika commented 7 months ago

What you expected to happen?

At some point in the history of our kubernetes cluster the WEAVE_NO_FASTDP was set to true and sleeve is being used by default since. We'd like to switch back to default (fastdatapath and fallback to sleeve on certain conditions).

What happened?

│ INFO: 2024/04/26 08:46:52.444799 weave  2.8.1                                                                                                              │
│ FATA: 2024/04/26 08:46:52.816690 Existing bridge type "bridge" is different than requested "bridged_fastdp". Please do 'weave reset' and try again   

with a debug pod I tried to do the weave reset but it fails :

kubectl debug -it weave-net-w6wpv -n kube-system --image=weaveworks/weave-kube:2.8.1 -- /bin/sh
[snip]
/home/weave # WEAVE_DEBUG=1 ./weave --local reset
+ SCRIPT_VERSION=2.8.1
+ IMAGE_VERSION=latest
+ '[' 2.8.1 '=' unreleased ]
+ IMAGE_VERSION=2.8.1
+ IMAGE_VERSION=2.8.1
+ MIN_DOCKER_VERSION=1.10.0
+ DOCKERHUB_USER=weaveworks
+ BASE_EXEC_IMAGE=weaveworks/weaveexec
+ EXEC_IMAGE=weaveworks/weaveexec:2.8.1
+ WEAVEDB_IMAGE=weaveworks/weavedb:latest
+ BASE_IMAGE=weaveworks/weave
+ IMAGE=weaveworks/weave:2.8.1
+ echo 
+ cut -s -d: -f1
+ PROXY_HOST=
+ PROXY_HOST=127.0.0.1
+ DOCKER_CLIENT_HOST=
+ IP_REGEXP='[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'
+ CIDR_REGEXP='[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/[0-9]{1,2}'
+ '[' --local '=' --local ]
+ shift 1
+ IS_LOCAL=1
+ '['  '=' --help ]
+ '[' -z 1 ]
+ RESTART_POLICY='--restart always'
+ CONTAINER_NAME=weave
+ PLUGIN_NAME=weaveworks/net-plugin
+ OLD_PLUGIN_CONTAINER_NAME=weaveplugin
+ CNI_PLUGIN_NAME=weave-plugin-2.8.1
+ CNI_PLUGIN_DIR=/opt/cni/bin
+ VOLUMES_LABEL=weavevolumes
+ VOLUMES_CONTAINER_NAME=weavevolumes-2.8.1
+ DB_CONTAINER_NAME=weavedb
+ DOCKER_BRIDGE=docker0
+ BRIDGE=weave
+ DATAPATH=datapath
+ CONTAINER_IFNAME=ethwe
+ BRIDGE_IFNAME=vethwe-bridge
+ DATAPATH_IFNAME=vethwe-datapath
+ PORT=6783
+ HTTP_ADDR=127.0.0.1:6784
+ STATUS_ADDR=127.0.0.1:6782
+ PROXY_PORT=12375
+ OLD_PROXY_CONTAINER_NAME=weaveproxy
+ PROC_PATH=/proc
+ COVERAGE_ARGS=
+ '[' -n  ]
+ id -u
+ '[' 0 '=' 0 ]
+ uname -s -r
+ sed -n -e 's|^\([^ ]*\) \([0-9][0-9]*\)\.\([0-9][0-9]*\).*|\1 \2 \3|p'
+ read sys maj min
+ '[' Linux '!=' Linux ]
+ '[' '(' 4 -eq 3 -a 19 -ge 8 ')' -o 4 -gt 3 ]
+ command_exists ip
+ command -v ip
+ '[' 1 -gt 0 ]
+ COMMAND=reset
+ shift 1
+ '[' 0 -eq 0 ]
+ res=0
+ '['  '=' --force ]
+ check_running weave
+ res=1
+ stop
+ util_op remove-plugin-network weave
+ command_exists weaveutil
+ command -v weaveutil
+ weaveutil remove-plugin-network weave
unable to connect to docker: Get "http://unix.sock/v1.21/version": dial unix /var/run/docker.sock: connect: no such file or directory
+ true
+ warn_if_stopping_proxy_in_env
+ proxy_addr
+ PROXY_ADDR=
+ util_op stop-container weave
+ echo 'Weave is not running (ignore on Kubernetes).'
Weave is not running (ignore on Kubernetes).
+ util_op stop-container weaveplugin
+ true
+ util_op stop-container weaveproxy
+ true
+ conntrack -D -p udp --dport 6783
+ true
+ util_op remove-container -f weave
+ true
+ util_op remove-container -f weaveproxy
+ true
+ util_op remove-container -f weaveplugin
+ true
+ protect_against_docker_hang
+ rm -f /run/docker/plugins/weave.sock /run/docker/plugins/weavemesh.sock
+ util_op list-containers weavevolumes
+ command_exists weaveutil
+ command -v weaveutil
+ weaveutil list-containers weavevolumes
unable to list containers: Get "http://unix.sock/v1.18/version": dial unix /var/run/docker.sock: connect: no such file or directory
+ VOLUME_CONTAINERS=

How to reproduce it?

Anything else we need to know?

Versions:

$ weave version

        Version: 2.8.1 (failed to check latest version - see logs; next check at 2024/04/26 11:21:25)

        Service: router
       Protocol: weave 1..2
           Name: 5a:63:d7:f6:d1:3e(k8s-mod46-worker-1)
     Encryption: disabled
  PeerDiscovery: enabled
        Targets: 4
    Connections: 4 (4 established)
          Peers: 5 (with 20 established connections)
 TrustedSubnets: none

        Service: ipam
         Status: ready
          Range: 10.42.0.0/16
  DefaultSubnet: 10.42.0.0/16

$ docker version

n/a containerd

$ uname -a

Linux k8s-mod46-worker-1 4.19.0-21-amd64 #1 SMP Debian 4.19.249-2 (2022-06-30) x86_64 Linux

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3+rke2r1", GitCommit:"aef86a93758dc3cb2c658dd9657ab4ad4afc21cb", GitTreeState:"clean", BuildDate:"2022-07-13T20:19:26Z", GoVersion:"go1.18.3b7", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3+rke2r1", GitCommit:"aef86a93758dc3cb2c658dd9657ab4ad4afc21cb", GitTreeState:"clean", BuildDate:"2022-07-13T20:19:26Z", GoVersion:"go1.18.3b7", Compiler:"gc", Platform:"linux/amd64"}

Logs:

$ docker logs weave

or, if using Kubernetes:

$ kubectl logs -n kube-system <weave-net-pod> weave

Network:

$ ip route
$ ip -4 -o addr
$ sudo iptables-save

Thanks in advance if anyone is reading this but I fully understand that weavenet is no longer maintained. But I wanted to at least document this problem in case it can help others facing something similar.

arthurzenika commented 6 months ago

Some colleagues have worked on a work around which seems to enable switching to fastdp (with some downtime) :