projectcalico / vpp-dataplane

VPP dataplane implementation for Calico
Apache License 2.0
145 stars 37 forks source link

Calico-VPP pod claims IPv6 address of node but uses IPv4 address instead #651

Open nesselzzz opened 10 months ago

nesselzzz commented 10 months ago

Environment

Issue description I'm setting up an IPv6 cluster. Each node in the cluster has two interfaces within ESXi. One interface is an ipv4 interface for OOBM, and the other serves as the main interface for kubernetes and is the uplink interface for vpp. Whenever I run "kubectl create -f calico-vpp.yaml", my node loses its IPv6 address (as the documentation states). I would expect this to be hitless if I understand the documentation properly, however anything trying to reach that IP is met with no response. As a result, all kubectl commands stop working since the API was using that address.

I have used nerdctl to exec into the container, and when executing "ip a", the uplink interface I configured shows no IPv6 address...only link local. Surprisingly the IPv4 address and interface is listed in the container, and the node has not lost that IP at all.

Is this a bug or am I doing something wrong?

To Reproduce Steps to reproduce the behavior:


apiVersion: operator.tigera.io/v1 kind: APIServer metadata: name: default spec: {}



- curl -o calico-vpp.yaml https://raw.githubusercontent.com/projectcalico/vpp-dataplane/v3.26.0/yaml/generated/calico-vpp-nohuge.yaml
- edit calico-vpp.yaml to reflect proper ipv6 services subnet, and proper uplink interface and apply via kubectl create

**Expected behavior**
calico-vpp pod would successfully be created, and I would be able to maintain ipv6 connectivity
nesselzzz commented 10 months ago

Did a little more troubleshooting. I have found that before even applying the calico-vpp.yaml file, and applying the base calico.yaml file to get calico instantiated, it crashes when I specify the linuxDataplane as VPP. When checking the logs, I see the following errors:

2023-11-07 13:17:43.430 [INFO][18] tunnel-ip-allocator/param_types.go 291: Looking for executable on path name="/usr/local/bin/felix-plugins/felix-api-proxy"
2023-11-07 13:17:43.431 [WARNING][18] tunnel-ip-allocator/param_types.go 295: Path lookup failed error=exec: "/usr/local/bin/felix-plugins/felix-api-proxy": stat /usr/local/bin/felix-plugins/felix-api-proxy: no such file or directory name="/usr/local/bin/felix-plugins/felix-api-proxy"
2023-11-07 13:17:43.431 [ERROR][18] tunnel-ip-allocator/config_params.go 636: Invalid (required) config value. error=Failed to parse config parameter DataplaneDriver; value "/usr/local/bin/felix-plugins/felix-api-proxy": missing file source=environment variable
2023-11-07 13:17:43.431 [PANIC][18] tunnel-ip-allocator/allocateip.go 836: Failed to parse Felix environments error=Failed to parse config parameter DataplaneDriver; value "/usr/local/bin/felix-plugins/felix-api-proxy": missing file

As for applying the calico-vpp.yaml file, I managed to be able to check the logs before kubectl loses connectivity to the API. The logs are below:

time="2023-11-07T13:51:52Z" level=info msg="Version info\nImage tag                   : 20c50cfd71e32ab9c15d4632e2b4a9659993148d\nVPP-dataplane version       : 20c50cf yaml: build yamls to add bgpfilters\nVPP Version                 : 23.10-rc0~6-g892b7bce0\nBinapi-generator version    : v0.8.0-dev\nVPP Base commit             : 03304d1c6 gerrit:34726/3 interface: add buffer stats api\n------------------ Cherry picked commits --------------------\ninterface: Fix interface.api endianness\ncapo: Calico Policies plugin\nacl: acl-plugin custom policies\ncnat: [WIP] no k8s maglev from pods\npbl: Port based balancer\ngerrit:34726/3 interface: add buffer stats api\n-------------------------------------------------------------\n"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_INTERFACES={\n  \"defaultPodIfSpec\": {\n    \"rx\": 1,\n    \"tx\": 1,\n    \"rxqsz\": 0,\n    \"txqsz\": 0,\n    \"isl3\": true,\n    \"rxMode\": 0\n  },\n  \"maxPodIfSpec\": {\n    \"rx\": 10,\n    \"tx\": 10,\n    \"rxqsz\": 1024,\n    \"txqsz\": 1024,\n    \"isl3\": null,\n    \"rxMode\": 0\n  },\n  \"vppHostTapSpec\": {\n    \"rx\": 1,\n    \"tx\": 1,\n    \"rxqsz\": 1024,\n    \"txqsz\": 1024,\n    \"isl3\": false,\n    \"rxMode\": 0\n  },\n  \"uplinkInterfaces\": [\n    {\n      \"rx\": 0,\n      \"tx\": 0,\n      \"rxqsz\": 0,\n      \"txqsz\": 0,\n      \"isl3\": null,\n      \"rxMode\": 0,\n      \"physicalNetworkName\": \"\",\n      \"interfaceName\": \"ens192\",\n      \"vppDriver\": \"af_packet\",\n      \"newDriver\": \"\",\n      \"annotations\": null,\n      \"mtu\": 0\n    }\n  ]\n}"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_GRACEFUL_SHUTDOWN_TIMEOUT=10s"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_SWAP_DRIVER="
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_CONFIG_EXEC_TEMPLATE="
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_HOOK_VPP_RUNNING=#!/bin/sh\n\nHOOK=\"$0\"\nchroot /host /bin/sh <<EOSCRIPT\n\nfix_dns () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; fixing dns...\"\n        sed -i \"s/\\[main\\]/\\[main\\]\\ndns=none/\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nundo_dns_fix () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; undoing dns fix...\"\n        sed -i \"0,/dns=none/{/dns=none/d;}\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nrestart_network () {\n    if systemctl status systemd-networkd > /dev/null 2>&1; then\n        echo \"default_hook: system is using systemd-networkd; restarting...\"\n        systemctl restart systemd-networkd\n    elif systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; restarting...\"\n        systemctl restart NetworkManager\n    elif systemctl status networking > /dev/null 2>&1; then\n        echo \"default_hook: system is using networking service; restarting...\"\n        systemctl restart networking\n    elif systemctl status network > /dev/null 2>&1; then\n        echo \"default_hook: system is using network service; restarting...\"\n        systemctl restart network\n    else\n        echo \"default_hook: Networking backend not detected, network configuration may fail\"\n    fi\n}\n\nif which systemctl > /dev/null; then\n    echo \"default_hook: using systemctl...\"\nelse\n    echo \"default_hook: Init system not supported, network configuration may fail\"\n    exit 1\nfi\n\nif [ \"$HOOK\" = \"BEFORE_VPP_RUN\" ]; then\n    fix_dns\nelif [ \"$HOOK\" = \"VPP_RUNNING\" ]; then\n    restart_network\nelif [ \"$HOOK\" = \"VPP_DONE_OK\" ]; then\n    undo_dns_fix\n    restart_network\nelif [ \"$HOOK\" = \"VPP_ERRORED\" ]; then\n    undo_dns_fix\n    restart_network\nfi\n\nEOSCRIPT\n"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_LOG_LEVEL=info"
time="2023-11-07T13:51:52Z" level=info msg="Config:SERVICE_PREFIX=[2600:1700:3960:c71f:1::/108]"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_INITIAL_CONFIG={\n  \"vppStartupSleepSeconds\": 1,\n  \"corePattern\": \"/var/lib/vpp/vppcore.%e.%p\",\n  \"extraAddrCount\": 0,\n  \"ifConfigSavePath\": \"\",\n  \"defaultGWs\": \"\"\n}"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_LOG_FORMAT="
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_NATIVE_DRIVER="
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_HOOK_VPP_DONE_OK=#!/bin/sh\n\nHOOK=\"$0\"\nchroot /host /bin/sh <<EOSCRIPT\n\nfix_dns () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; fixing dns...\"\n        sed -i \"s/\\[main\\]/\\[main\\]\\ndns=none/\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nundo_dns_fix () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; undoing dns fix...\"\n        sed -i \"0,/dns=none/{/dns=none/d;}\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nrestart_network () {\n    if systemctl status systemd-networkd > /dev/null 2>&1; then\n        echo \"default_hook: system is using systemd-networkd; restarting...\"\n        systemctl restart systemd-networkd\n    elif systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; restarting...\"\n        systemctl restart NetworkManager\n    elif systemctl status networking > /dev/null 2>&1; then\n        echo \"default_hook: system is using networking service; restarting...\"\n        systemctl restart networking\n    elif systemctl status network > /dev/null 2>&1; then\n        echo \"default_hook: system is using network service; restarting...\"\n        systemctl restart network\n    else\n        echo \"default_hook: Networking backend not detected, network configuration may fail\"\n    fi\n}\n\nif which systemctl > /dev/null; then\n    echo \"default_hook: using systemctl...\"\nelse\n    echo \"default_hook: Init system not supported, network configuration may fail\"\n    exit 1\nfi\n\nif [ \"$HOOK\" = \"BEFORE_VPP_RUN\" ]; then\n    fix_dns\nelif [ \"$HOOK\" = \"VPP_RUNNING\" ]; then\n    restart_network\nelif [ \"$HOOK\" = \"VPP_DONE_OK\" ]; then\n    undo_dns_fix\n    restart_network\nelif [ \"$HOOK\" = \"VPP_ERRORED\" ]; then\n    undo_dns_fix\n    restart_network\nfi\n\nEOSCRIPT\n"
time="2023-11-07T13:51:52Z" level=info msg="Config:NODENAME=kube-master1"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_INIT_SCRIPT_TEMPLATE="
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_HOOK_BEFORE_IF_READ=#!/bin/sh\n\nHOOK=\"$0\"\nchroot /host /bin/sh <<EOSCRIPT\n\nfix_dns () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; fixing dns...\"\n        sed -i \"s/\\[main\\]/\\[main\\]\\ndns=none/\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nundo_dns_fix () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; undoing dns fix...\"\n        sed -i \"0,/dns=none/{/dns=none/d;}\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nrestart_network () {\n    if systemctl status systemd-networkd > /dev/null 2>&1; then\n        echo \"default_hook: system is using systemd-networkd; restarting...\"\n        systemctl restart systemd-networkd\n    elif systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; restarting...\"\n        systemctl restart NetworkManager\n    elif systemctl status networking > /dev/null 2>&1; then\n        echo \"default_hook: system is using networking service; restarting...\"\n        systemctl restart networking\n    elif systemctl status network > /dev/null 2>&1; then\n        echo \"default_hook: system is using network service; restarting...\"\n        systemctl restart network\n    else\n        echo \"default_hook: Networking backend not detected, network configuration may fail\"\n    fi\n}\n\nif which systemctl > /dev/null; then\n    echo \"default_hook: using systemctl...\"\nelse\n    echo \"default_hook: Init system not supported, network configuration may fail\"\n    exit 1\nfi\n\nif [ \"$HOOK\" = \"BEFORE_VPP_RUN\" ]; then\n    fix_dns\nelif [ \"$HOOK\" = \"VPP_RUNNING\" ]; then\n    restart_network\nelif [ \"$HOOK\" = \"VPP_DONE_OK\" ]; then\n    undo_dns_fix\n    restart_network\nelif [ \"$HOOK\" = \"VPP_ERRORED\" ]; then\n    undo_dns_fix\n    restart_network\nfi\n\nEOSCRIPT\n"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_BGP_LOG_LEVEL=INFO"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_IPSEC_IKEV2_PSK="
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_DEBUG={}"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_FEATURE_GATES={}"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_IPSEC={\n  \"nbAsyncCryptoThreads\": 0,\n  \"extraAddresses\": 0\n}"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_SRV6={\n  \"localsidPool\": \"\",\n  \"policyPool\": \"\"\n}"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_INTERFACE="
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_CONFIG_TEMPLATE=unix {\n  nodaemon\n  full-coredump\n  cli-listen /var/run/vpp/cli.sock\n  pidfile /run/vpp/vpp.pid\n  exec /etc/vpp/startup.exec\n}\napi-trace { on }\ncpu {\n    workers 0\n}\nsocksvr {\n    socket-name /var/run/vpp/vpp-api.sock\n}\nplugins {\n    plugin default { enable }\n    plugin dpdk_plugin.so { disable }\n    plugin calico_plugin.so { enable }\n    plugin ping_plugin.so { disable }\n    plugin dispatch_trace_plugin.so { enable }\n}\nbuffers {\n  buffers-per-numa 131072\n}"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_HOOK_BEFORE_VPP_RUN=#!/bin/sh\n\nHOOK=\"$0\"\nchroot /host /bin/sh <<EOSCRIPT\n\nfix_dns () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; fixing dns...\"\n        sed -i \"s/\\[main\\]/\\[main\\]\\ndns=none/\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nundo_dns_fix () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; undoing dns fix...\"\n        sed -i \"0,/dns=none/{/dns=none/d;}\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nrestart_network () {\n    if systemctl status systemd-networkd > /dev/null 2>&1; then\n        echo \"default_hook: system is using systemd-networkd; restarting...\"\n        systemctl restart systemd-networkd\n    elif systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; restarting...\"\n        systemctl restart NetworkManager\n    elif systemctl status networking > /dev/null 2>&1; then\n        echo \"default_hook: system is using networking service; restarting...\"\n        systemctl restart networking\n    elif systemctl status network > /dev/null 2>&1; then\n        echo \"default_hook: system is using network service; restarting...\"\n        systemctl restart network\n    else\n        echo \"default_hook: Networking backend not detected, network configuration may fail\"\n    fi\n}\n\nif which systemctl > /dev/null; then\n    echo \"default_hook: using systemctl...\"\nelse\n    echo \"default_hook: Init system not supported, network configuration may fail\"\n    exit 1\nfi\n\nif [ \"$HOOK\" = \"BEFORE_VPP_RUN\" ]; then\n    fix_dns\nelif [ \"$HOOK\" = \"VPP_RUNNING\" ]; then\n    restart_network\nelif [ \"$HOOK\" = \"VPP_DONE_OK\" ]; then\n    undo_dns_fix\n    restart_network\nelif [ \"$HOOK\" = \"VPP_ERRORED\" ]; then\n    undo_dns_fix\n    restart_network\nfi\n\nEOSCRIPT\n"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_HOOK_VPP_ERRORED=#!/bin/sh\n\nHOOK=\"$0\"\nchroot /host /bin/sh <<EOSCRIPT\n\nfix_dns () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; fixing dns...\"\n        sed -i \"s/\\[main\\]/\\[main\\]\\ndns=none/\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nundo_dns_fix () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; undoing dns fix...\"\n        sed -i \"0,/dns=none/{/dns=none/d;}\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nrestart_network () {\n    if systemctl status systemd-networkd > /dev/null 2>&1; then\n        echo \"default_hook: system is using systemd-networkd; restarting...\"\n        systemctl restart systemd-networkd\n    elif systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; restarting...\"\n        systemctl restart NetworkManager\n    elif systemctl status networking > /dev/null 2>&1; then\n        echo \"default_hook: system is using networking service; restarting...\"\n        systemctl restart networking\n    elif systemctl status network > /dev/null 2>&1; then\n        echo \"default_hook: system is using network service; restarting...\"\n        systemctl restart network\n    else\n        echo \"default_hook: Networking backend not detected, network configuration may fail\"\n    fi\n}\n\nif which systemctl > /dev/null; then\n    echo \"default_hook: using systemctl...\"\nelse\n    echo \"default_hook: Init system not supported, network configuration may fail\"\n    exit 1\nfi\n\nif [ \"$HOOK\" = \"BEFORE_VPP_RUN\" ]; then\n    fix_dns\nelif [ \"$HOOK\" = \"VPP_RUNNING\" ]; then\n    restart_network\nelif [ \"$HOOK\" = \"VPP_DONE_OK\" ]; then\n    undo_dns_fix\n    restart_network\nelif [ \"$HOOK\" = \"VPP_ERRORED\" ]; then\n    undo_dns_fix\n    restart_network\nfi\n\nEOSCRIPT\n"
time="2023-11-07T13:51:52Z" level=info msg="Waiting for VPP... [0/10]" component=vpp-api
time="2023-11-07T13:51:54Z" level=info msg="Waiting for VPP... [1/10]" component=vpp-api
time="2023-11-07T13:51:56Z" level=info msg="Waiting for VPP... [2/10]" component=vpp-api
time="2023-11-07T13:51:58Z" level=info msg="Waiting for VPP... [3/10]" component=vpp-api
time="2023-11-07T13:52:00Z" level=info msg="Waiting for VPP... [4/10]" component=vpp-api
time="2023-11-07T13:52:02Z" level=warning msg="Waiting for VPP... [5/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist" component=vpp-api
time="2023-11-07T13:52:04Z" level=warning msg="Waiting for VPP... [6/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist" component=vpp-api
time="2023-11-07T13:52:06Z" level=warning msg="Waiting for VPP... [7/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist" component=vpp-api
time="2023-11-07T13:52:08Z" level=warning msg="Waiting for VPP... [8/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist" component=vpp-api
time="2023-11-07T13:52:10Z" level=warning msg="Waiting for VPP... [9/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist" component=vpp-api
time="2023-11-07T13:52:12Z" level=fatal msg="Cannot create VPP client: Cannot connect to VPP after 10 tries"

Any help would be greatly appreciated.