siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.82k stars 544 forks source link

Unable to update proxy mode to `ipvs` after cluster creation #8928

Closed amimof closed 4 months ago

amimof commented 4 months ago

Bug Report

After contralplane/node provisioning and cluster bootstrapping, changing kube-proxy proxy mode to ipvs will not actually apply the change to kube-proxy. The mode remains in iptables proxy mode. Not sure if this behavior is intended or a bug.

Description

I have a cluster that consists of 3 bare metal nodes. After I provisioned the nodes and bootstrapped the cluster I wanted to update the cluster, setting the proxy mode to ipvs.

...
  proxy:
    image: registry.k8s.io/kube-proxy:v1.30.0
    mode: ipvs
    extraArgs:
      proxy-mode: ipvs
      ipvs-strict-arp: true
...

Then I applied the configuration with talosctl apply-config -n talos-node1 -f controlplane.yaml -p @talos-node1.yaml. However the changes do not seem to take affect. For example I can see that the kube-proxy is still in iptables mode here:

talosctl -n talos-node1 processes | grep proxy
talos-node1  17426  S      23       4.26      1.3 GB   59 MB   /usr/local/bin/kube-proxy --cluster-cidr=10.244.0.0/16 --conntrack-max-per-core=0 -
-hostname-override=talos-node1 --kubeconfig=/etc/kubernetes/kubeconfig --proxy-mode=iptables

And also the same is true in the kube-proxy daemonset:

kubectl describe daemonset -n kube-system kube-proxy | grep mode
      --proxy-mode=iptables

Only when I update the daemonset, manually setting --proxy-mode=ipvs will kube-proxy be configured with packet forwarding using ipvs.

talos-node1.yaml:

machine:
  network:
    hostname: talos-node1.domain.se

controlplane.yaml:

version: v1alpha1 # Indicates the schema used to decode the contents.
debug: false # Enable verbose logging to the console.
persist: true
# Provides machine specific configuration options.
machine:
  type: controlplane # Defines the role of the machine within the cluster.
  token: s4v1ae.fcd50hg2xjfzge9r # The `token` is used by a machine to join the PKI of the cluster.
  # The root certificate authority of the PKI.
  ca:
    crt:
    key:
  # Extra certificate subject alternative names for the machine's certificate.
  certSANs:
    - api.lab.domain.se
    - 192.168.13.10
  #   # Uncomment this to enable SANs.
  #   - 10.0.0.10
  #   - 172.16.0.10
  #   - 192.168.0.10

  # Used to provide additional options to the kubelet.
  kubelet:
    image: ghcr.io/siderolabs/kubelet:v1.30.0 # The `image` field is an optional reference to an alternative kubelet image.
    defaultRuntimeSeccompProfileEnabled: true # Enable container runtime default Seccomp profile.
    disableManifestsDirectory: true # The `disableManifestsDirectory` field configures the kubelet to get static pod manifests from the /etc/kubernetes/manifests directory.

    # # The `ClusterDNS` field is an optional reference to an alternative kubelet clusterDNS ip list.
    # clusterDNS:
    #     - 10.96.0.10
    #     - 169.254.2.53

    # # The `extraArgs` field is used to provide additional flags to the kubelet.
    # extraArgs:
    #     key: value

    # # The `extraMounts` field is used to add additional mounts to the kubelet container.
    # extraMounts:
    #     - destination: /var/lib/example # Destination is the absolute path where the mount will be placed in the container.
    #       type: bind # Type specifies the mount kind.
    #       source: /var/lib/example # Source specifies the source path of the mount.
    #       # Options are fstab style mount options.
    #       options:
    #         - bind
    #         - rshared
    #         - rw

    # # The `extraConfig` field is used to provide kubelet configuration overrides.
    # extraConfig:
    #     serverTLSBootstrap: true

    # # The `KubeletCredentialProviderConfig` field is used to provide kubelet credential configuration.
    # credentialProviderConfig:
    #     apiVersion: kubelet.config.k8s.io/v1
    #     kind: CredentialProviderConfig
    #     providers:
    #         - apiVersion: credentialprovider.kubelet.k8s.io/v1
    #           defaultCacheDuration: 12h
    #           matchImages:
    #             - '*.dkr.ecr.*.amazonaws.com'
    #             - '*.dkr.ecr.*.amazonaws.com.cn'
    #             - '*.dkr.ecr-fips.*.amazonaws.com'
    #             - '*.dkr.ecr.us-iso-east-1.c2s.ic.gov'
    #             - '*.dkr.ecr.us-isob-east-1.sc2s.sgov.gov'
    #           name: ecr-credential-provider

    # # The `nodeIP` field is used to configure `--node-ip` flag for the kubelet.
    # nodeIP:
    #     # The `validSubnets` field configures the networks to pick kubelet node IP from.
    #     validSubnets:
    #         - 10.0.0.0/8
    #         - '!10.0.0.3/32'
    #         - fdc7::/16
  # Provides machine specific network configuration options.
  network:
    interfaces:
      - interface: eno1
        dhcp: true
        vip:
          ip: 192.168.13.10
  # # `interfaces` is used to define the network interface configuration.
  # interfaces:
  #     - interface: enp0s1 # The interface name.
  #       # Assigns static IP addresses to the interface.
  #       addresses:
  #         - 192.168.2.0/24
  #       # A list of routes associated with the interface.
  #       routes:
  #         - network: 0.0.0.0/0 # The route's network (destination).
  #           gateway: 192.168.2.1 # The route's gateway (if empty, creates link scope route).
  #           metric: 1024 # The optional metric for the route.
  #       mtu: 1500 # The interface's MTU.
  #
  #       # # Picks a network device using the selector.

  #       # # select a device with bus prefix 00:*.
  #       # deviceSelector:
  #       #     busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.
  #       # # select a device with mac address matching `*:f0:ab` and `virtio` kernel driver.
  #       # deviceSelector:
  #       #     hardwareAddr: '*:f0:ab' # Device hardware address, supports matching by wildcard.
  #       #     driver: virtio # Kernel driver, supports matching by wildcard.
  #       # # select a device with bus prefix 00:*, a device with mac address matching `*:f0:ab` and `virtio` kernel driver.
  #       # deviceSelector:
  #       #     - busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.
  #       #     - hardwareAddr: '*:f0:ab' # Device hardware address, supports matching by wildcard.
  #       #       driver: virtio # Kernel driver, supports matching by wildcard.

  #       # # Bond specific options.
  #       # bond:
  #       #     # The interfaces that make up the bond.
  #       #     interfaces:
  #       #         - enp2s0
  #       #         - enp2s1
  #       #     # Picks a network device using the selector.
  #       #     deviceSelectors:
  #       #         - busPath: 00:* # PCI, USB bus prefix, supports matching by wildcard.
  #       #         - hardwareAddr: '*:f0:ab' # Device hardware address, supports matching by wildcard.
  #       #           driver: virtio # Kernel driver, supports matching by wildcard.
  #       #     mode: 802.3ad # A bond option.
  #       #     lacpRate: fast # A bond option.

  #       # # Bridge specific options.
  #       # bridge:
  #       #     # The interfaces that make up the bridge.
  #       #     interfaces:
  #       #         - enxda4042ca9a51
  #       #         - enxae2a6774c259
  #       #     # A bridge option.
  #       #     stp:
  #       #         enabled: true # Whether Spanning Tree Protocol (STP) is enabled.

  #       # # Indicates if DHCP should be used to configure the interface.
  #       # dhcp: true

  #       # # DHCP specific options.
  #       # dhcpOptions:
  #       #     routeMetric: 1024 # The priority of all routes received via DHCP.

  #       # # Wireguard specific configuration.

  #       # # wireguard server example
  #       # wireguard:
  #       #     privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).
  #       #     listenPort: 51111 # Specifies a device's listening port.
  #       #     # Specifies a list of peer configurations to apply to a device.
  #       #     peers:
  #       #         - publicKey: ABCDEF... # Specifies the public key of this peer.
  #       #           endpoint: 192.168.1.3 # Specifies the endpoint of this peer entry.
  #       #           # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.
  #       #           allowedIPs:
  #       #             - 192.168.1.0/24
  #       # # wireguard peer example
  #       # wireguard:
  #       #     privateKey: ABCDEF... # Specifies a private key configuration (base64 encoded).
  #       #     # Specifies a list of peer configurations to apply to a device.
  #       #     peers:
  #       #         - publicKey: ABCDEF... # Specifies the public key of this peer.
  #       #           endpoint: 192.168.1.2:51822 # Specifies the endpoint of this peer entry.
  #       #           persistentKeepaliveInterval: 10s # Specifies the persistent keepalive interval for this peer.
  #       #           # AllowedIPs specifies a list of allowed IP addresses in CIDR notation for this peer.
  #       #           allowedIPs:
  #       #             - 192.168.1.0/24

  #       # # Virtual (shared) IP address configuration.

  #       # # layer2 vip example
  #       # vip:
  #       #     ip: 172.16.199.55 # Specifies the IP address to be used.

  # # Used to statically set the nameservers for the machine.
  # nameservers:
  #     - 8.8.8.8
  #     - 1.1.1.1

  # # Allows for extra entries to be added to the `/etc/hosts` file
  # extraHostEntries:
  #     - ip: 192.168.1.100 # The IP of the host.
  #       # The host alias.
  #       aliases:
  #         - example
  #         - example.domain.tld

  # # Configures KubeSpan feature.
  # kubespan:
  #     enabled: true # Enable the KubeSpan feature.

  # Used to provide instructions for installations.
  install:
    disk: /dev/sdd # The disk used for installations.
    image: ghcr.io/siderolabs/installer:v1.7.0 # Allows for supplying the image used to perform the installation.
    wipe: true # Indicates if the installation disk should be wiped at installation time.

    # # Look up disk using disk attributes like model, size, serial and others.
    # diskSelector:
    #     size: 4GB # Disk size.
    #     model: WDC* # Disk model `/sys/block/<dev>/device/model`.
    #     busPath: /pci0000:00/0000:00:17.0/ata1/host0/target0:0:0/0:0:0:0 # Disk bus path.

    # # Allows for supplying extra kernel args via the bootloader.
    # extraKernelArgs:
    #     - talos.platform=metal
    #     - reboot=k

    # # Allows for supplying additional system extension images to install on top of base Talos image.
    # extensions:
    #     - image: ghcr.io/siderolabs/gvisor:20220117.0-v1.0.0 # System extension image.
  # Used to configure the machine's container image registry mirrors.
  registries: {}
  # # Specifies mirror configuration for each registry host namespace.
  # mirrors:
  #     ghcr.io:
  #         # List of endpoints (URLs) for registry mirrors to use.
  #         endpoints:
  #             - https://registry.insecure
  #             - https://ghcr.io/v2/

  # # Specifies TLS & auth configuration for HTTPS image registries.
  # config:
  #     registry.insecure:
  #         # The TLS configuration for the registry.
  #         tls:
  #             insecureSkipVerify: true # Skip TLS server certificate verification (not recommended).
  #
  #             # # Enable mutual TLS authentication with the registry.
  #             # clientIdentity:
  #             #     crt: LS0tIEVYQU1QTEUgQ0VSVElGSUNBVEUgLS0t
  #             #     key: LS0tIEVYQU1QTEUgS0VZIC0tLQ==
  #
  #         # # The auth configuration for this registry.
  #         # auth:
  #         #     username: username # Optional registry authentication.
  #         #     password: password # Optional registry authentication.

  # Features describe individual Talos features that can be switched on or off.
  features:
    rbac: true # Enable role-based access control (RBAC).
    stableHostname: true # Enable stable default hostname.
    apidCheckExtKeyUsage: true # Enable checks for extended key usage of client certificates in apid.
    diskQuotaSupport: true # Enable XFS project quota support for EPHEMERAL partition and user disks.
    # KubePrism - local proxy/load balancer on defined port that will distribute
    kubePrism:
      enabled: true # Enable KubePrism support - will start local load balancing proxy.
      port: 7445 # KubePrism port.
    # Configures host DNS caching resolver.
    hostDNS:
      enabled: true # Enable host DNS caching resolver.

    # # Configure Talos API access from Kubernetes pods.
    # kubernetesTalosAPIAccess:
    #     enabled: true # Enable Talos API access from Kubernetes pods.
    #     # The list of Talos API roles which can be granted for access from Kubernetes pods.
    #     allowedRoles:
    #         - os:reader
    #     # The list of Kubernetes namespaces Talos API access is available from.
    #     allowedKubernetesNamespaces:
    #         - kube-system

  # # Provides machine specific control plane configuration options.

  # # ControlPlane definition example.
  # controlPlane:
  #     # Controller manager machine specific configuration options.
  #     controllerManager:
  #         disabled: false # Disable kube-controller-manager on the node.
  #     # Scheduler machine specific configuration options.
  #     scheduler:
  #         disabled: true # Disable kube-scheduler on the node.

  # # Used to provide static pod definitions to be run by the kubelet directly bypassing the kube-apiserver.

  # # nginx static pod.
  # pods:
  #     - apiVersion: v1
  #       kind: pod
  #       metadata:
  #         name: nginx
  #       spec:
  #         containers:
  #             - image: nginx
  #               name: nginx

  # # Used to partition, format and mount additional disks.

  # # MachineDisks list example.
  # disks:
  #     - device: /dev/sdb # The name of the disk to use.
  #       # A list of partitions to create on the disk.
  #       partitions:
  #         - mountpoint: /var/mnt/extra # Where to mount the partition.
  #
  #           # # The size of partition: either bytes or human readable representation. If `size:` is omitted, the partition is sized to occupy the full disk.

  #           # # Human readable representation.
  #           # size: 100 MB
  #           # # Precise value in bytes.
  #           # size: 1073741824

  # # Allows the addition of user specified files.

  # # MachineFiles usage example.
  # files:
  #     - content: '...' # The contents of the file.
  #       permissions: 0o666 # The file's permissions in octal.
  #       path: /tmp/file.txt # The path of the file.
  #       op: append # The operation to use

  # # The `env` field allows for the addition of environment variables.

  # # Environment variables definition examples.
  # env:
  #     GRPC_GO_LOG_SEVERITY_LEVEL: info
  #     GRPC_GO_LOG_VERBOSITY_LEVEL: "99"
  #     https_proxy: http://SERVER:PORT/
  # env:
  #     GRPC_GO_LOG_SEVERITY_LEVEL: error
  #     https_proxy: https://USERNAME:PASSWORD@SERVER:PORT/
  # env:
  #     https_proxy: http://DOMAIN\USERNAME:PASSWORD@SERVER:PORT/

  # # Used to configure the machine's time settings.

  # # Example configuration for cloudflare ntp server.
  # time:
  #     disabled: false # Indicates if the time service is disabled for the machine.
  #     # description: |
  #     servers:
  #         - time.cloudflare.com
  #     bootTimeout: 2m0s # Specifies the timeout when the node time is considered to be in sync unlocking the boot sequence.

  # # Used to configure the machine's sysctls.

  # # MachineSysctls usage example.
  # sysctls:
  #     kernel.domainname: talos.dev
  #     net.ipv4.ip_forward: "0"
  #     net/ipv6/conf/eth0.100/disable_ipv6: "1"

  # # Used to configure the machine's sysfs.

  # # MachineSysfs usage example.
  # sysfs:
  #     devices.system.cpu.cpu0.cpufreq.scaling_governor: performance

  # # Machine system disk encryption configuration.
  # systemDiskEncryption:
  #     # Ephemeral partition encryption.
  #     ephemeral:
  #         provider: luks2 # Encryption provider to use for the encryption.
  #         # Defines the encryption keys generation and storage method.
  #         keys:
  #             - # Deterministically generated key from the node UUID and PartitionLabel.
  #               nodeID: {}
  #               slot: 0 # Key slot number for LUKS2 encryption.
  #
  #               # # KMS managed encryption key.
  #               # kms:
  #               #     endpoint: https://192.168.88.21:4443 # KMS endpoint to Seal/Unseal the key.
  #
  #         # # Cipher kind to use for the encryption. Depends on the encryption provider.
  #         # cipher: aes-xts-plain64

  #         # # Defines the encryption sector size.
  #         # blockSize: 4096

  #         # # Additional --perf parameters for the LUKS2 encryption.
  #         # options:
  #         #     - no_read_workqueue
  #         #     - no_write_workqueue

  # # Configures the udev system.
  # udev:
  #     # List of udev rules to apply to the udev system
  #     rules:
  #         - SUBSYSTEM=="drm", KERNEL=="renderD*", GROUP="44", MODE="0660"

  # # Configures the logging system.
  # logging:
  #     # Logging destination.
  #     destinations:
  #         - endpoint: tcp://1.2.3.4:12345 # Where to send logs. Supported protocols are "tcp" and "udp".
  #           format: json_lines # Logs format.

  # # Configures the kernel.
  # kernel:
  #     # Kernel modules to load.
  #     modules:
  #         - name: brtfs # Module name.

  # # Configures the seccomp profiles for the machine.
  # seccompProfiles:
  #     - name: audit.json # The `name` field is used to provide the file name of the seccomp profile.
  #       # The `value` field is used to provide the seccomp profile.
  #       value:
  #         defaultAction: SCMP_ACT_LOG

  # # Configures the node labels for the machine.

  # # node labels example.
  # nodeLabels:
  #     exampleLabel: exampleLabelValue

  # # Configures the node taints for the machine. Effect is optional.

  # # node taints example.
  # nodeTaints:
  #     exampleTaint: exampleTaintValue:NoSchedule
# Provides cluster specific configuration options.
cluster:
  id:
  secret:
  # Provides control plane specific configuration options.
  controlPlane:
    endpoint: https://api.lab.domain.se:6443 # Endpoint is the canonical controlplane endpoint, which can be an IP address or a DNS hostname.
  clusterName: lab # Configures the cluster's name.
  # Provides cluster specific network configuration options.
  network:
    dnsDomain: cluster.local # The domain used by Kubernetes DNS.
    # The pod subnet CIDR.
    podSubnets:
      - 10.244.0.0/16
    # The service subnet CIDR.
    serviceSubnets:
      - 10.96.0.0/12

    # # The CNI used.
    # cni:
    #     name: custom # Name of CNI to use.
    #     # URLs containing manifests to apply for the CNI.
    #     urls:
    #         - https://docs.projectcalico.org/archive/v3.20/manifests/canal.yaml
  token: giut7a.tl12wqjynjuy369h # The [bootstrap token](https://kubernetes.io/docs/reference/access-authn-authz/bootstrap-tokens/) used to join the cluster.
  secretboxEncryptionSecret: yJwgqvQzeShtuNcTRMaeoyL9V1cLPcfpjT+FzacwEz4= # A key used for the [encryption of secret data at rest](https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/).
  # The base64 encoded root certificate authority used by Kubernetes.
  ca:
    crt:
    key:
  # The base64 encoded aggregator certificate authority used by Kubernetes for front-proxy certificate generation.
  aggregatorCA:
    crt:
    key:
  # The base64 encoded private key for service account token generation.
  serviceAccount:
    key:
  # API server specific configuration options.
  apiServer:
    image: registry.k8s.io/kube-apiserver:v1.30.0 # The container image used in the API server manifest.
    # Extra certificate subject alternative names for the API server's certificate.
    certSANs:
      - api.lab.domain.se
    disablePodSecurityPolicy: true # Disable PodSecurityPolicy in the API server and default manifests.
    # Configure the API server admission plugins.
    admissionControl:
      - name: PodSecurity # Name is the name of the admission controller.
        # Configuration is an embedded configuration object to be used as the plugin's
        configuration:
          apiVersion: pod-security.admission.config.k8s.io/v1alpha1
          defaults:
            audit: restricted
            audit-version: latest
            enforce: baseline
            enforce-version: latest
            warn: restricted
            warn-version: latest
          exemptions:
            namespaces:
              - kube-system
            runtimeClasses: []
            usernames: []
          kind: PodSecurityConfiguration
    # Configure the API server audit policy.
    auditPolicy:
      apiVersion: audit.k8s.io/v1
      kind: Policy
      rules:
        - level: Metadata
  # Controller manager server specific configuration options.
  controllerManager:
    image: registry.k8s.io/kube-controller-manager:v1.30.0 # The container image used in the controller manager manifest.
  # Kube-proxy server-specific configuration options
  proxy:
    image: registry.k8s.io/kube-proxy:v1.30.0 # The container image used in the kube-proxy manifest.
    mode: ipvs
    extraArgs:
      proxy-mode: ipvs
      ipvs-strict-arp: true

    # # Disable kube-proxy deployment on cluster bootstrap.
    # disabled: false
  # Scheduler server specific configuration options.
  scheduler:
    image: registry.k8s.io/kube-scheduler:v1.30.0 # The container image used in the scheduler manifest.
  # Configures cluster member discovery.
  discovery:
    enabled: true # Enable the cluster membership discovery feature.
    # Configure registries used for cluster member discovery.
    registries:
      # Kubernetes registry uses Kubernetes API server to discover cluster members and stores additional information
      kubernetes:
        disabled: true # Disable Kubernetes discovery registry.
      # Service registry is using an external service to push and pull information about cluster members.
      service: {}
      # # External service endpoint.
      # endpoint: https://discovery.talos.dev/
  # Etcd specific configuration options.
  etcd:
    # The `ca` is the root certificate authority of the PKI.
    ca:
      crt:
      key:

    # # The container image used to create the etcd service.
    # image: gcr.io/etcd-development/etcd:v3.5.13-arm64

    # # The `advertisedSubnets` field configures the networks to pick etcd advertised IP from.
    # advertisedSubnets:
    #     - 10.0.0.0/8
  # A list of urls that point to additional manifests.
  extraManifests: []
  #   - https://www.example.com/manifest1.yaml
  #   - https://www.example.com/manifest2.yaml

  # A list of inline Kubernetes manifests.
  inlineManifests: []
  #   - name: namespace-ci # Name of the manifest.
  #     contents: |- # Manifest contents as a string.
  #       apiVersion: v1
  #       kind: Namespace
  #       metadata:
  #        name: ci

  # # A key used for the [encryption of secret data at rest](https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/).

  # # Decryption secret example (do not use in production!).
  # aescbcEncryptionSecret: z01mye6j16bspJYtTB/5SFX8j7Ph4JXxM2Xuu4vsBPM=

  # # Core DNS specific configuration options.
  # coreDNS:
  #     image: registry.k8s.io/coredns/coredns:v1.11.1 # The `image` field is an override to the default coredns image.

  # # External cloud provider configuration.
  # externalCloudProvider:
  #     enabled: true # Enable external cloud provider.
  #     # A list of urls that point to additional manifests for an external cloud provider.
  #     manifests:
  #         - https://raw.githubusercontent.com/kubernetes/cloud-provider-aws/v1.20.0-alpha.0/manifests/rbac.yaml
  #         - https://raw.githubusercontent.com/kubernetes/cloud-provider-aws/v1.20.0-alpha.0/manifests/aws-cloud-controller-manager-daemonset.yaml

  # # A map of key value pairs that will be added while fetching the extraManifests.
  # extraManifestHeaders:
  #     Token: "1234567"
  #     X-ExtraInfo: info

  # # Settings for admin kubeconfig generation.
  # adminKubeconfig:
  #     certLifetime: 1h0m0s # Admin kubeconfig certificate lifetime (default is 1 year).

  # Allows running workload on control-plane nodes.
  allowSchedulingOnControlPlanes: true

Logs

Environment

Client:
        Tag:         v1.7.0
        SHA:         70fb41ff
        Built:
        Go version:  go1.22.2
        OS/Arch:     darwin/arm64
Server:
        NODE:        talos-node1
        Tag:         v1.7.0
        SHA:         70fb41ff
        Built:
        Go version:  go1.22.2
        OS/Arch:     linux/amd64
        Enabled:     RBAC
Client Version: v1.29.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0
Bare Metal amd64
sanmai-NL commented 4 months ago

Earlier, a team member indicated that IPVS support is quasi deprecated: https://github.com/siderolabs/talos/issues/8332.

smira commented 4 months ago

Talos Linux never updates/removes bootstrap manifests (including kube-proxy) automatically, as it is considered to be dangerous.

You can still re-apply the manifests by doing a no-op talosctl upgrade-k8s --to <same-version>.