harvester / harvester

Open source hyperconverged infrastructure (HCI) software
https://harvesterhci.io/
Apache License 2.0
3.84k stars 324 forks source link

[BUG] Harvester Management VIP not accessible (HTTPS/SSH) #3448

Open vrapcan opened 1 year ago

vrapcan commented 1 year ago

Describe the bug

Harvester Management VIP is not accessible, neither through the URL in a web browser, nor using SSH. Web interface or SSH connection can only be made to an IP address of a node, not the VIP.

The issue occurred after our pfSense instances running in HA as VMs on the same cluster both rebooted at the same time.

Each not has a static IP address assigned, DHCP is NOT used for the nodes' IP address assignment.

Management URL shows the status 'Ready" on each node of the cluster.

iKVM_capture

To Reproduce Steps to reproduce the behavior:

  1. Deploy a Harvester cluster with static IP address assignments.
  2. Run a pfSense VM on the same cluster to route the traffic of the same range as the Harvester nodes are using.
  3. Switch off/reboot the pfSense VM.
  4. Ty accessing Harvester VIP through a browser or creating an SSH connection.

Expected behavior

It should be possible to access Harvester cluster through the Harvester VIP.

Support bundle

File is too big to be attached here. I will post it on Slack.

Environment

2 x Baremetal Supermicro motherboard, 2x AMD EPYC 7352 24-Core Processor, 256 GB RAM, 8 TB SSD

2 x Baremetal ASUS ESC4000A-E10 Barebone GPU: 4 x NVIDIA RTX 3090 Turbo 24GB CPU: EPYC 7713 (64 cores, Milan, 2.0 / 3.6 GHz) Memory: 512 GB ECC DDR4 3200 Mhz SSD1: 2 TB M.2 NVMe PCIe 4.0 TLC SSD2: 8 TB 2.5" U.2 NVMe TLC Network P2100G (2x100Gb/s)

Additional context Collapsed code segments are bellow.

Kube VIP pods from each node.

describe pod kube-vip-cloud-provider-0

``` # kubectl -n harvester-system describe pod kube-vip-cloud-provider-0 Name: kube-vip-cloud-provider-0 Namespace: harvester-system Priority: 0 Node: inog01/10.40.0.18 Start Time: Mon, 13 Feb 2023 07:13:16 +0000 Labels: app.kubernetes.io/instance=harvester app.kubernetes.io/name=kube-vip-cloud-provider controller-revision-hash=kube-vip-cloud-provider-6b7c859586 statefulset.kubernetes.io/pod-name=kube-vip-cloud-provider-0 Annotations: cni.projectcalico.org/containerID: e1ee3a2e90ace20da5c5ec5a7850b48718ac635da22d311c0f401ec83d66b2ca cni.projectcalico.org/podIP: 10.52.3.123/32 cni.projectcalico.org/podIPs: 10.52.3.123/32 k8s.v1.cni.cncf.io/network-status: [{ "name": "k8s-pod-network", "ips": [ "10.52.3.123" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "k8s-pod-network", "ips": [ "10.52.3.123" ], "default": true, "dns": {} }] kubernetes.io/psp: global-unrestricted-psp Status: Running IP: 10.52.3.123 IPs: IP: 10.52.3.123 Controlled By: StatefulSet/kube-vip-cloud-provider Containers: kube-vip-cloud-provider: Container ID: containerd://83f9fa2893b12b43bce49fa88d26aaeac32c2f77b069c168786186077c56b327 Image: kubevip/kube-vip-cloud-provider:v0.0.1 Image ID: sha256:5d5b371eade4bb540ec1fd7c41d67e6510d183a0cf8673e0f6c00ab4ecd1c757 Port: Host Port: Command: /kube-vip-cloud-provider --leader-elect-resource-name=kube-vip-cloud-controller State: Running Started: Mon, 13 Feb 2023 07:13:18 +0000 Ready: True Restart Count: 0 Environment: KUBEVIP_NAMESPACE: harvester-system Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-r77xd (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: kube-api-access-r77xd: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: ```

describe pod kube-vip-5swqz

``` # kubectl -n harvester-system describe pod kube-vip-5swqz Name: kube-vip-5swqz Namespace: harvester-system Priority: 0 Node: inog02/10.40.0.17 Start Time: Mon, 13 Feb 2023 07:12:17 +0000 Labels: app.kubernetes.io/instance=harvester app.kubernetes.io/name=kube-vip controller-revision-hash=54b9b4d66c pod-template-generation=1 Annotations: kubernetes.io/psp: global-unrestricted-psp Status: Running IP: 10.40.0.17 IPs: IP: 10.40.0.17 Controlled By: DaemonSet/kube-vip Containers: kube-vip: Container ID: containerd://fe0d49081988ced9763ac54a766d08809979689cc61b88a3be6c2db92693814b Image: ghcr.io/kube-vip/kube-vip:v0.4.4 Image ID: sha256:10a711aa9888db2d9173101453ad14aa0b9b4b256486620a6e3de90b1751365b Port: Host Port: Args: manager State: Running Started: Mon, 13 Feb 2023 07:12:17 +0000 Ready: True Restart Count: 0 Environment: cp_enable: false lb_enable: true lb_port: 6443 svc_enable: true vip_arp: true vip_cidr: 32 vip_interface: vip_leaderelection: false Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jm677 (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: kube-api-access-jm677: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: node-role.kubernetes.io/control-plane=true Tolerations: node-role.kubernetes.io/control-plane:NoSchedule op=Exists node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/network-unavailable:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: ```

describe pod kube-vip-dpg4l

``` # kubectl -n harvester-system describe pod kube-vip-dpg4l Name: kube-vip-dpg4l Namespace: harvester-system Priority: 0 Node: inos02/10.40.0.27 Start Time: Mon, 13 Feb 2023 07:12:33 +0000 Labels: app.kubernetes.io/instance=harvester app.kubernetes.io/name=kube-vip controller-revision-hash=54b9b4d66c pod-template-generation=1 Annotations: kubernetes.io/psp: global-unrestricted-psp Status: Running IP: 10.40.0.27 IPs: IP: 10.40.0.27 Controlled By: DaemonSet/kube-vip Containers: kube-vip: Container ID: containerd://e780abb5a8bfaa5cc2e1bbfc2866a297516b4610fdc322b82ce5c9649a842eba Image: ghcr.io/kube-vip/kube-vip:v0.4.4 Image ID: sha256:10a711aa9888db2d9173101453ad14aa0b9b4b256486620a6e3de90b1751365b Port: Host Port: Args: manager State: Running Started: Mon, 13 Feb 2023 07:12:34 +0000 Ready: True Restart Count: 0 Environment: cp_enable: false lb_enable: true lb_port: 6443 svc_enable: true vip_arp: true vip_cidr: 32 vip_interface: vip_leaderelection: false Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lq9x8 (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: kube-api-access-lq9x8: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: node-role.kubernetes.io/control-plane=true Tolerations: node-role.kubernetes.io/control-plane:NoSchedule op=Exists node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/network-unavailable:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: ```

describe pod kube-vip-w4dzv

``` # kubectl -n harvester-system describe pod kube-vip-w4dzv Name: kube-vip-w4dzv Namespace: harvester-system Priority: 0 Node: inos01/10.40.0.21 Start Time: Mon, 13 Feb 2023 07:12:59 +0000 Labels: app.kubernetes.io/instance=harvester app.kubernetes.io/name=kube-vip controller-revision-hash=54b9b4d66c pod-template-generation=1 Annotations: kubernetes.io/psp: global-unrestricted-psp Status: Running IP: 10.40.0.21 IPs: IP: 10.40.0.21 Controlled By: DaemonSet/kube-vip Containers: kube-vip: Container ID: containerd://127acbd1b67b55b19d719e455820623129de72e3a43a3ad0a5a0ab3dcbb3374c Image: ghcr.io/kube-vip/kube-vip:v0.4.4 Image ID: sha256:10a711aa9888db2d9173101453ad14aa0b9b4b256486620a6e3de90b1751365b Port: Host Port: Args: manager State: Running Started: Mon, 13 Feb 2023 07:13:00 +0000 Ready: True Restart Count: 0 Environment: cp_enable: false lb_enable: true lb_port: 6443 svc_enable: true vip_arp: true vip_cidr: 32 vip_interface: vip_leaderelection: false Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-56h7s (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: kube-api-access-56h7s: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: node-role.kubernetes.io/control-plane=true Tolerations: node-role.kubernetes.io/control-plane:NoSchedule op=Exists node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/network-unavailable:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: ```

Kube VIP logs

logs kube-vip-cloud-provider-0

``` # kubectl -n harvester-system logs kube-vip-cloud-provider-0 I0213 07:13:18.507805 1 serving.go:331] Generated self-signed cert in-memory W0213 07:13:18.801412 1 client_config.go:608] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0213 07:13:18.804126 1 controllermanager.go:127] Version: v0.0.0-master+$Format:%h$ W0213 07:13:18.805043 1 controllermanager.go:139] detected a cluster without a ClusterID. A ClusterID will be required in the future. Please tag your cluster to avoid any future issues I0213 07:13:18.806171 1 secure_serving.go:197] Serving securely on [::]:10258 I0213 07:13:18.806212 1 leaderelection.go:243] attempting to acquire leader lease kube-system/kube-vip-cloud-controller... I0213 07:13:18.806251 1 tlsconfig.go:240] Starting DynamicServingCertificateController I0213 07:13:34.442400 1 leaderelection.go:253] successfully acquired lease kube-system/kube-vip-cloud-controller I0213 07:13:34.442501 1 event.go:291] "Event occurred" object="kube-system/kube-vip-cloud-controller" kind="Endpoints" apiVersion="v1" type="Normal" reason="LeaderElection" message="kube-vip-cloud-provider-0_79d8e993-ed04-43c4-8091-b822c963bdc7 became leader" I0213 07:13:34.442529 1 event.go:291] "Event occurred" object="kube-system/kube-vip-cloud-controller" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="kube-vip-cloud-provider-0_79d8e993-ed04-43c4-8091-b822c963bdc7 became leader" I0213 07:13:34.443739 1 node_controller.go:108] Sending events to api server. W0213 07:13:34.443755 1 core.go:57] failed to start cloud node controller: cloud provider does not support instances W0213 07:13:34.443761 1 controllermanager.go:251] Skipping "cloud-node" I0213 07:13:34.444079 1 node_lifecycle_controller.go:77] Sending events to api server W0213 07:13:34.444090 1 core.go:76] failed to start cloud node lifecycle controller: cloud provider does not support instances W0213 07:13:34.444094 1 controllermanager.go:251] Skipping "cloud-node-lifecycle" I0213 07:13:34.444475 1 controllermanager.go:254] Started "service" I0213 07:13:34.444480 1 core.go:108] Will not configure cloud provider routes for allocate-node-cidrs: false, configure-cloud-routes: true. W0213 07:13:34.444483 1 controllermanager.go:251] Skipping "route" I0213 07:13:34.444523 1 controller.go:239] Starting service controller I0213 07:13:34.444534 1 shared_informer.go:240] Waiting for caches to sync for service I0213 07:13:34.545242 1 shared_informer.go:247] Caches are synced for service I0213 07:13:34.545458 1 event.go:291] "Event occurred" object="kube-system/ingress-expose" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer" I0213 07:13:34.555059 1 loadBalancer.go:149] syncing service 'ingress-expose' (398f068d-9c78-4244-a313-62d6a7fd8881) I0213 07:13:34.555145 1 loadBalancer.go:164] found existing service 'ingress-expose' (398f068d-9c78-4244-a313-62d6a7fd8881) with vip 0.0.0.0 I0213 07:13:34.555245 1 event.go:291] "Event occurred" object="kube-system/ingress-expose" kind="Service" apiVersion="v1" type="Normal" reason="EnsuredLoadBalancer" message="Ensured load balancer" ```

logs kube-vip-5swqz

``` # kubectl -n harvester-system logs kube-vip-5swqz time="2023-02-13T07:12:17Z" level=info msg="Starting kube-vip.io [v0.4.4]" time="2023-02-13T07:12:17Z" level=info msg="No interface is specified for VIP in config, auto-detecting default Interface" time="2023-02-13T07:12:17Z" level=info msg="kube-vip will bind to interface [mgmt-br.4000]" time="2023-02-13T07:12:17Z" level=info msg="server started" time="2023-02-13T07:12:17Z" level=info msg="Starting Kube-vip Manager with the ARP engine" time="2023-02-13T07:12:17Z" level=info msg="Namespace [kube-system], Hybrid mode [false]" time="2023-02-13T07:12:17Z" level=info msg="Beginning cluster membership, namespace [harvester-system], lock name [plndr-svcs-lock], id [inog02]" I0213 07:12:17.747576 1 leaderelection.go:248] attempting to acquire leader lease harvester-system/plndr-svcs-lock... time="2023-02-13T07:12:17Z" level=info msg="new leader elected: inos01" I0213 07:12:57.316807 1 leaderelection.go:258] successfully acquired lease harvester-system/plndr-svcs-lock time="2023-02-13T07:12:57Z" level=info msg="Beginning watching services for type: LoadBalancer in all namespaces" time="2023-02-13T07:12:57Z" level=info msg="Service [gitjob] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rancher-logging-root-fluentd] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [prometheus-operated] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [longhorn-engine-manager] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [csi-attacher] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [alertmanager-operated] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rancher-monitoring-operator] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rancher-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [virt-api] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rke2-ingress-nginx-controller-admission] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rke2-coredns-rke2-coredns] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [longhorn-frontend] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rancher-logging] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rancher-logging-root-fluentd-headless] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rancher-monitoring-alertmanager] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [webhook-service] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [kubernetes] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [harvester] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [kubevirt-prometheus-metrics] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rancher-monitoring-kubelet] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [longhorn-admission-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [longhorn-conversion-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [pcidevices-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rancher-monitoring-coredns] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [csi-provisioner] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rancher-logging-kube-audit-fluentd-headless] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rancher-monitoring-kube-state-metrics] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rancher-monitoring-prometheus] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rancher] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [harvester-network-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rancher-monitoring-grafana] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [harvester-cluster-repo] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rke2-metrics-server] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [csi-resizer] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rancher-logging-kube-audit-fluentd] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rancher-monitoring-prometheus-adapter] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [cattle-cluster-agent] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [kubevirt-operator-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [longhorn-replica-manager] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [longhorn-backend] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [rancher-monitoring-prometheus-node-exporter] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [harvester-vm-import-controller] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [harvester-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:12:57Z" level=info msg="Service [ingress-expose] has been added/modified it has an assigned external addresses [0.0.0.0]" time="2023-02-13T07:12:57Z" level=info msg="add the service [kube-system/ingress-expose] with external address 0.0.0.0" time="2023-02-13T07:12:57Z" level=info msg="Creating new macvlan interface for DHCP [vip-398f068d]" time="2023-02-13T07:13:07Z" level=error msg="timeout to request the IP from DHCP server for service kube-system/ingress-expose" time="2023-02-13T07:13:07Z" level=info msg="Service [csi-snapshotter] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:13:32Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 10s)" time="2023-02-13T07:14:17Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 18.65335013s)" time="2023-02-13T07:15:11Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 30.901574972s)" time="2023-02-13T07:15:50Z" level=info msg="Service [rancher-logging-kube-audit-fluentd-headless] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:15:50Z" level=info msg="Service [rancher-logging-root-fluentd-headless] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:15:50Z" level=info msg="Service [rancher-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:15:50Z" level=info msg="Service [harvester-network-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:15:50Z" level=info msg="Service [csi-resizer] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:15:50Z" level=info msg="Service [longhorn-backend] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:15:50Z" level=info msg="Service [rancher-logging] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:15:50Z" level=info msg="Service [rancher-monitoring-prometheus] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:15:50Z" level=info msg="Service [harvester-cluster-repo] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:15:50Z" level=info msg="Service [rancher] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:15:50Z" level=info msg="Service [harvester-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:15:50Z" level=info msg="Service [ingress-expose] has been added/modified it has an assigned external addresses [0.0.0.0]" time="2023-02-13T07:15:50Z" level=info msg="add the service [kube-system/ingress-expose] with external address 0.0.0.0" time="2023-02-13T07:15:50Z" level=info msg="Using existing macvlan interface for DHCP [vip-398f068d]" time="2023-02-13T07:16:00Z" level=error msg="timeout to request the IP from DHCP server for service kube-system/ingress-expose" time="2023-02-13T07:16:00Z" level=info msg="Service [csi-attacher] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [rancher-logging-root-fluentd] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [kubevirt-operator-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [rke2-coredns-rke2-coredns] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [harvester-vm-import-controller] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [kubernetes] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [rancher-monitoring-grafana] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [rancher-monitoring-operator] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [rancher-monitoring-coredns] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [longhorn-admission-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [longhorn-conversion-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [longhorn-engine-manager] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [gitjob] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [rancher-monitoring-prometheus-adapter] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [webhook-service] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [rancher-monitoring-kubelet] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [rke2-metrics-server] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [rancher-monitoring-kube-state-metrics] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [rancher-monitoring-prometheus-node-exporter] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [kubevirt-prometheus-metrics] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [csi-provisioner] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [longhorn-frontend] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [pcidevices-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [prometheus-operated] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [rke2-ingress-nginx-controller-admission] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [csi-snapshotter] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [cattle-cluster-agent] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [alertmanager-operated] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [rancher-monitoring-alertmanager] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [harvester] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [virt-api] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [longhorn-replica-manager] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:00Z" level=info msg="Service [rancher-logging-kube-audit-fluentd] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:16:17Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 46.667421423s)" time="2023-02-13T07:16:25Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 10s)" ... SAME MESSAGE MULTIPLE TIMES ... time="2023-02-13T07:51:01Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 1m0s)" time="2023-02-13T07:51:30Z" level=info msg="Service [longhorn-admission-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [longhorn-conversion-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [longhorn-engine-manager] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [gitjob] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [rancher-monitoring-grafana] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [rancher-monitoring-operator] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [rancher-monitoring-coredns] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [rke2-metrics-server] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [rancher-monitoring-kube-state-metrics] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [rancher-monitoring-prometheus-adapter] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [webhook-service] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [rancher-monitoring-kubelet] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [longhorn-frontend] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [pcidevices-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [prometheus-operated] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [rancher-monitoring-prometheus-node-exporter] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [kubevirt-prometheus-metrics] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [csi-provisioner] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [alertmanager-operated] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [rke2-ingress-nginx-controller-admission] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [csi-snapshotter] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [cattle-cluster-agent] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [longhorn-replica-manager] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [rancher-logging-kube-audit-fluentd] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [rancher-monitoring-alertmanager] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [harvester] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [virt-api] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [harvester-network-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [csi-resizer] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [longhorn-backend] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [rancher-logging] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [rancher-logging-kube-audit-fluentd-headless] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [rancher-logging-root-fluentd-headless] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [rancher-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [harvester-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:30Z" level=info msg="Service [ingress-expose] has been added/modified it has an assigned external addresses [0.0.0.0]" time="2023-02-13T07:51:30Z" level=info msg="add the service [kube-system/ingress-expose] with external address 0.0.0.0" time="2023-02-13T07:51:30Z" level=info msg="Using existing macvlan interface for DHCP [vip-398f068d]" time="2023-02-13T07:51:40Z" level=error msg="timeout to request the IP from DHCP server for service kube-system/ingress-expose" time="2023-02-13T07:51:40Z" level=info msg="Service [csi-attacher] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:40Z" level=info msg="Service [rancher-logging-root-fluentd] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:40Z" level=info msg="Service [rancher-monitoring-prometheus] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:40Z" level=info msg="Service [harvester-cluster-repo] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:40Z" level=info msg="Service [rancher] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:40Z" level=info msg="Service [kubernetes] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:40Z" level=info msg="Service [kubevirt-operator-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:40Z" level=info msg="Service [rke2-coredns-rke2-coredns] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:51:40Z" level=info msg="Service [harvester-vm-import-controller] has been added/modified it has an assigned external addresses []" time="2023-02-13T07:52:05Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 10s)" ... SAME MESSAGE MULTIPLE TIMES ... time="2023-02-13T08:46:27Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 1m0s)" time="2023-02-13T08:46:51Z" level=info msg="Service [ingress-expose] has been added/modified it has an assigned external addresses [0.0.0.0]" time="2023-02-13T08:46:51Z" level=info msg="add the service [kube-system/ingress-expose] with external address 0.0.0.0" time="2023-02-13T08:46:51Z" level=info msg="Using existing macvlan interface for DHCP [vip-398f068d]" time="2023-02-13T08:47:01Z" level=error msg="timeout to request the IP from DHCP server for service kube-system/ingress-expose" time="2023-02-13T08:47:01Z" level=info msg="Service [csi-attacher] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rancher-logging-root-fluentd] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rancher-monitoring-prometheus] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [harvester-cluster-repo] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rancher] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [harvester-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [kubernetes] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [kubevirt-operator-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rke2-coredns-rke2-coredns] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [harvester-vm-import-controller] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [longhorn-conversion-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [longhorn-engine-manager] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [gitjob] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rancher-monitoring-grafana] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rancher-monitoring-operator] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rancher-monitoring-coredns] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [longhorn-admission-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rancher-monitoring-kube-state-metrics] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rancher-monitoring-prometheus-adapter] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [webhook-service] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rancher-monitoring-kubelet] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rke2-metrics-server] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [pcidevices-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [prometheus-operated] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rancher-monitoring-prometheus-node-exporter] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [kubevirt-prometheus-metrics] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [csi-provisioner] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [longhorn-frontend] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [alertmanager-operated] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rke2-ingress-nginx-controller-admission] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [csi-snapshotter] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [cattle-cluster-agent] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rancher-logging-kube-audit-fluentd] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rancher-monitoring-alertmanager] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [harvester] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [virt-api] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [longhorn-replica-manager] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [csi-resizer] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [longhorn-backend] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rancher-logging] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rancher-logging-kube-audit-fluentd-headless] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rancher-logging-root-fluentd-headless] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [rancher-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:01Z" level=info msg="Service [harvester-network-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T08:47:26Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 10s)" ... SAME MESSAGE MULTIPLE TIMES ... time="2023-02-13T09:35:27Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 1m0s)" time="2023-02-13T09:35:33Z" level=info msg="Service [rancher-monitoring-prometheus-adapter] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [webhook-service] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [rancher-monitoring-kubelet] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [rke2-metrics-server] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [rancher-monitoring-kube-state-metrics] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [rancher-monitoring-prometheus-node-exporter] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [kubevirt-prometheus-metrics] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [csi-provisioner] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [longhorn-frontend] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [pcidevices-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [prometheus-operated] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [rke2-ingress-nginx-controller-admission] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [csi-snapshotter] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [cattle-cluster-agent] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [alertmanager-operated] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [rancher-monitoring-alertmanager] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [harvester] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [virt-api] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [longhorn-replica-manager] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [rancher-logging-kube-audit-fluentd] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [rancher-logging-kube-audit-fluentd-headless] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [rancher-logging-root-fluentd-headless] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [rancher-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [harvester-network-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [csi-resizer] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [longhorn-backend] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [rancher-logging] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [rancher-monitoring-prometheus] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [harvester-cluster-repo] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [rancher] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [harvester-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:33Z" level=info msg="Service [ingress-expose] has been added/modified it has an assigned external addresses [0.0.0.0]" time="2023-02-13T09:35:33Z" level=info msg="add the service [kube-system/ingress-expose] with external address 0.0.0.0" time="2023-02-13T09:35:33Z" level=info msg="Using existing macvlan interface for DHCP [vip-398f068d]" time="2023-02-13T09:35:33Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 1m0s)" time="2023-02-13T09:35:43Z" level=error msg="timeout to request the IP from DHCP server for service kube-system/ingress-expose" time="2023-02-13T09:35:43Z" level=info msg="Service [csi-attacher] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:43Z" level=info msg="Service [rancher-logging-root-fluentd] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:43Z" level=info msg="Service [kubevirt-operator-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:43Z" level=info msg="Service [rke2-coredns-rke2-coredns] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:43Z" level=info msg="Service [harvester-vm-import-controller] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:43Z" level=info msg="Service [kubernetes] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:43Z" level=info msg="Service [rancher-monitoring-grafana] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:43Z" level=info msg="Service [rancher-monitoring-operator] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:43Z" level=info msg="Service [rancher-monitoring-coredns] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:43Z" level=info msg="Service [longhorn-admission-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:43Z" level=info msg="Service [longhorn-conversion-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:43Z" level=info msg="Service [longhorn-engine-manager] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:35:43Z" level=info msg="Service [gitjob] has been added/modified it has an assigned external addresses []" time="2023-02-13T09:36:08Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 10s)" ... SAME MESSAGE MULTIPLE TIMES ... time="2023-02-13T10:34:13Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 1m0s)" time="2023-02-13T10:35:05Z" level=info msg="Service [rancher-logging-kube-audit-fluentd] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [rancher-monitoring-alertmanager] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [harvester] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [virt-api] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [longhorn-replica-manager] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [longhorn-backend] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [rancher-logging] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [rancher-logging-kube-audit-fluentd-headless] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [rancher-logging-root-fluentd-headless] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [rancher-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [harvester-network-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [csi-resizer] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [csi-attacher] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [rancher-logging-root-fluentd] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [rancher-monitoring-prometheus] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [harvester-cluster-repo] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [rancher] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [harvester-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:05Z" level=info msg="Service [ingress-expose] has been added/modified it has an assigned external addresses [0.0.0.0]" time="2023-02-13T10:35:05Z" level=info msg="add the service [kube-system/ingress-expose] with external address 0.0.0.0" time="2023-02-13T10:35:05Z" level=info msg="Using existing macvlan interface for DHCP [vip-398f068d]" time="2023-02-13T10:35:15Z" level=error msg="timeout to request the IP from DHCP server for service kube-system/ingress-expose" time="2023-02-13T10:35:15Z" level=info msg="Service [kubernetes] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [kubevirt-operator-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [rke2-coredns-rke2-coredns] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [harvester-vm-import-controller] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [longhorn-engine-manager] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [gitjob] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [rancher-monitoring-grafana] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [rancher-monitoring-operator] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [rancher-monitoring-coredns] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [longhorn-admission-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [longhorn-conversion-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [rancher-monitoring-kube-state-metrics] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [rancher-monitoring-prometheus-adapter] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [webhook-service] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [rancher-monitoring-kubelet] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [rke2-metrics-server] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [prometheus-operated] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [rancher-monitoring-prometheus-node-exporter] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [kubevirt-prometheus-metrics] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [csi-provisioner] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [longhorn-frontend] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [pcidevices-webhook] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [alertmanager-operated] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [rke2-ingress-nginx-controller-admission] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [csi-snapshotter] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:15Z" level=info msg="Service [cattle-cluster-agent] has been added/modified it has an assigned external addresses []" time="2023-02-13T10:35:32Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 1m0s)" ... SAME MESSAGE MULTIPLE TIMES ... time="2023-02-13T10:59:36Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 1m0s)" ```

logs kube-vip-dpg4l

``` # kubectl -n harvester-system logs kube-vip-dpg4l time="2023-02-13T07:12:34Z" level=info msg="Starting kube-vip.io [v0.4.4]" time="2023-02-13T07:12:34Z" level=info msg="No interface is specified for VIP in config, auto-detecting default Interface" time="2023-02-13T07:12:34Z" level=info msg="kube-vip will bind to interface [mgmt-br.4000]" time="2023-02-13T07:12:34Z" level=info msg="server started" time="2023-02-13T07:12:34Z" level=info msg="Starting Kube-vip Manager with the ARP engine" time="2023-02-13T07:12:34Z" level=info msg="Namespace [kube-system], Hybrid mode [false]" time="2023-02-13T07:12:34Z" level=info msg="Beginning cluster membership, namespace [harvester-system], lock name [plndr-svcs-lock], id [inos02]" I0213 07:12:34.425560 1 leaderelection.go:248] attempting to acquire leader lease harvester-system/plndr-svcs-lock... time="2023-02-13T07:12:34Z" level=info msg="new leader elected: inos01" time="2023-02-13T07:12:57Z" level=info msg="new leader elected: inog02" ```

logs kube-vip-w4dzv

``` # kubectl -n harvester-system logs kube-vip-w4dzv time="2023-02-13T07:13:00Z" level=info msg="Starting kube-vip.io [v0.4.4]" time="2023-02-13T07:13:00Z" level=info msg="No interface is specified for VIP in config, auto-detecting default Interface" time="2023-02-13T07:13:00Z" level=info msg="kube-vip will bind to interface [mgmt-br.4000]" time="2023-02-13T07:13:00Z" level=info msg="server started" time="2023-02-13T07:13:00Z" level=info msg="Starting Kube-vip Manager with the ARP engine" time="2023-02-13T07:13:00Z" level=info msg="Namespace [kube-system], Hybrid mode [false]" time="2023-02-13T07:13:00Z" level=info msg="Beginning cluster membership, namespace [harvester-system], lock name [plndr-svcs-lock], id [inos01]" I0213 07:13:00.587030 1 leaderelection.go:248] attempting to acquire leader lease harvester-system/plndr-svcs-lock... time="2023-02-13T07:13:00Z" level=info msg="new leader elected: inog02" ```

w13915984028 commented 1 year ago

@vrapcan : One suspecting point is in the log of kube-vip-cloud-provider-0:

found existing service 'ingress-expose' (398f068d-9c78-4244-a313-62d6a7fd8881) with vip 0.0.0.0

which causes the real VIP (10.40.0.20) is not reachable, as this IP is not exposed in the ingress-expose

please post the output of kubectl get service -A

normally, there is such kind of output:

NAMESPACE                  NAME                                          TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)                        AGE

kube-system                ingress-expose                                LoadBalancer   10.53.181.21    192.168.122.199   443:32542/TCP,80:30684/TCP     2d20h

did you ever try to change the VIP via any way?

And, the VIP should be different than any NODE IP.

cc @yaocw2020

vrapcan commented 1 year ago

Hi Jian,

did you ever try to change the VIP via any way?

No, the VIP is set and unchanged since we spun up the cluster.

the VIP should be different than any NODE IP.

The VIP is 10.40.0.20. Nodes are on 10.40.0.17, 10.40.0.18, 10.40.0.21 and 10.40.0.27.

Output of kubectl get service -A

 # kubectl get service -A
NAMESPACE                  NAME                                          TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                        AGE
cattle-fleet-system        gitjob                                        ClusterIP      10.53.30.14     <none>        80/TCP                         59d
cattle-logging-system      rancher-logging                               ClusterIP      None            <none>        8080/TCP                       59d
cattle-logging-system      rancher-logging-kube-audit-fluentd            ClusterIP      10.53.69.238    <none>        24240/TCP,24240/UDP            59d
cattle-logging-system      rancher-logging-kube-audit-fluentd-headless   ClusterIP      None            <none>        24240/TCP,24240/UDP            59d
cattle-logging-system      rancher-logging-root-fluentd                  ClusterIP      10.53.142.1     <none>        24240/TCP,24240/UDP            59d
cattle-logging-system      rancher-logging-root-fluentd-headless         ClusterIP      None            <none>        24240/TCP,24240/UDP            59d
cattle-monitoring-system   alertmanager-operated                         ClusterIP      None            <none>        9093/TCP,9094/TCP,9094/UDP     17d
cattle-monitoring-system   prometheus-operated                           ClusterIP      None            <none>        9090/TCP                       59d
cattle-monitoring-system   rancher-monitoring-alertmanager               ClusterIP      10.53.58.74     <none>        9093/TCP                       17d
cattle-monitoring-system   rancher-monitoring-grafana                    ClusterIP      10.53.44.253    <none>        80/TCP                         59d
cattle-monitoring-system   rancher-monitoring-kube-state-metrics         ClusterIP      10.53.163.237   <none>        8080/TCP                       59d
cattle-monitoring-system   rancher-monitoring-operator                   ClusterIP      10.53.58.73     <none>        443/TCP                        59d
cattle-monitoring-system   rancher-monitoring-prometheus                 ClusterIP      10.53.184.104   <none>        9090/TCP                       59d
cattle-monitoring-system   rancher-monitoring-prometheus-adapter         ClusterIP      10.53.38.245    <none>        443/TCP                        59d
cattle-monitoring-system   rancher-monitoring-prometheus-node-exporter   ClusterIP      10.53.81.150    <none>        9796/TCP                       59d
cattle-system              cattle-cluster-agent                          ClusterIP      10.53.224.163   <none>        80/TCP,443/TCP                 19d
cattle-system              harvester-cluster-repo                        ClusterIP      10.53.237.30    <none>        80/TCP                         59d
cattle-system              rancher                                       ClusterIP      10.53.201.22    <none>        80/TCP,443/TCP                 59d
cattle-system              rancher-webhook                               ClusterIP      10.53.149.205   <none>        443/TCP                        59d
cattle-system              webhook-service                               ClusterIP      10.53.6.147     <none>        443/TCP                        59d
default                    kubernetes                                    ClusterIP      10.53.0.1       <none>        443/TCP                        59d
harvester-system           harvester                                     ClusterIP      10.53.251.157   <none>        8443/TCP                       59d
harvester-system           harvester-network-webhook                     ClusterIP      10.53.189.134   <none>        443/TCP                        59d
harvester-system           harvester-vm-import-controller                ClusterIP      10.53.169.75    <none>        8080/TCP                       21d
harvester-system           harvester-webhook                             ClusterIP      10.53.19.169    <none>        443/TCP                        59d
harvester-system           kubevirt-operator-webhook                     ClusterIP      10.53.236.76    <none>        443/TCP                        59d
harvester-system           kubevirt-prometheus-metrics                   ClusterIP      10.53.222.168   <none>        443/TCP                        59d
harvester-system           pcidevices-webhook                            ClusterIP      10.53.135.219   <none>        8443/TCP                       21d
harvester-system           virt-api                                      ClusterIP      10.53.133.146   <none>        443/TCP                        59d
kube-system                ingress-expose                                LoadBalancer   10.53.242.124   10.40.0.20    443:31746/TCP,80:30362/TCP     59d
kube-system                rancher-monitoring-coredns                    ClusterIP      None            <none>        9153/TCP                       59d
kube-system                rancher-monitoring-kubelet                    ClusterIP      None            <none>        10250/TCP,10255/TCP,4194/TCP   59d
kube-system                rke2-coredns-rke2-coredns                     ClusterIP      10.53.0.10      <none>        53/UDP,53/TCP                  59d
kube-system                rke2-ingress-nginx-controller-admission       ClusterIP      10.53.170.127   <none>        443/TCP                        59d
kube-system                rke2-metrics-server                           ClusterIP      10.53.186.15    <none>        443/TCP                        59d
longhorn-system            csi-attacher                                  ClusterIP      10.53.189.38    <none>        12345/TCP                      59d
longhorn-system            csi-provisioner                               ClusterIP      10.53.120.173   <none>        12345/TCP                      59d
longhorn-system            csi-resizer                                   ClusterIP      10.53.102.27    <none>        12345/TCP                      59d
longhorn-system            csi-snapshotter                               ClusterIP      10.53.227.175   <none>        12345/TCP                      59d
longhorn-system            longhorn-admission-webhook                    ClusterIP      10.53.99.1      <none>        9443/TCP                       59d
longhorn-system            longhorn-backend                              ClusterIP      10.53.37.27     <none>        9500/TCP                       59d
longhorn-system            longhorn-conversion-webhook                   ClusterIP      10.53.151.237   <none>        9443/TCP                       59d
longhorn-system            longhorn-engine-manager                       ClusterIP      None            <none>        <none>                         59d
longhorn-system            longhorn-frontend                             ClusterIP      10.53.118.226   <none>        80/TCP                         59d
longhorn-system            longhorn-replica-manager                      ClusterIP      None            <none>        <none>                         59d
w13915984028 commented 1 year ago

@vrapcan

kube-system ingress-expose LoadBalancer 10.53.242.124 10.40.0.20 443:31746/TCP,80:30362/TCP 59d

theingress-expose has the correct VIP 10.40.0.20, could you try to delete the pod kube-vip-cloud-provider-0 , and the POD will be replaced by a new one, then get the log of the new one, check what the value of ingress-expose in the log,

when it is the correct value, you may try the VIP again via SSH/HTTP. thanks.

lbartok commented 1 year ago

Hi Jian,

We deleted and here is the result from the newly created POD: The VIP that it picked up is 0.0.0.0 (not the desired 10.40.0.20) Can you tell us the source for the VIP in this case?

inos01:/home/rancher # kubectl -n harvester-system logs kube-vip-cloud-provider-0 -f
I0213 17:03:56.052282       1 serving.go:331] Generated self-signed cert in-memory
W0213 17:03:56.336930       1 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0213 17:03:56.338312       1 controllermanager.go:127] Version: v0.0.0-master+$Format:%h$
W0213 17:03:56.338819       1 controllermanager.go:139] detected a cluster without a ClusterID.  A ClusterID will be required in the future.  Please tag your cluster to avoid any future issues
I0213 17:03:56.339573       1 secure_serving.go:197] Serving securely on [::]:10258
I0213 17:03:56.339603       1 leaderelection.go:243] attempting to acquire leader lease  kube-system/kube-vip-cloud-controller...
I0213 17:03:56.339608       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0213 17:04:13.440916       1 leaderelection.go:253] successfully acquired lease kube-system/kube-vip-cloud-controller
I0213 17:04:13.442117       1 event.go:291] "Event occurred" object="kube-system/kube-vip-cloud-controller" kind="Endpoints" apiVersion="v1" type="Normal" reason="LeaderElection" message="kube-vip-cloud-provider-0_88bf91b0-c7a8-470b-9808-c5afe76dc9e8 became leader"
I0213 17:04:13.442170       1 event.go:291] "Event occurred" object="kube-system/kube-vip-cloud-controller" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="kube-vip-cloud-provider-0_88bf91b0-c7a8-470b-9808-c5afe76dc9e8 became leader"
I0213 17:04:13.442858       1 node_controller.go:108] Sending events to api server.
W0213 17:04:13.442886       1 core.go:57] failed to start cloud node controller: cloud provider does not support instances
W0213 17:04:13.442896       1 controllermanager.go:251] Skipping "cloud-node"
I0213 17:04:13.443550       1 node_lifecycle_controller.go:77] Sending events to api server
W0213 17:04:13.443589       1 core.go:76] failed to start cloud node lifecycle controller: cloud provider does not support instances
W0213 17:04:13.443595       1 controllermanager.go:251] Skipping "cloud-node-lifecycle"
I0213 17:04:13.444234       1 controllermanager.go:254] Started "service"
I0213 17:04:13.444241       1 core.go:108] Will not configure cloud provider routes for allocate-node-cidrs: false, configure-cloud-routes: true.
W0213 17:04:13.444246       1 controllermanager.go:251] Skipping "route"
I0213 17:04:13.444331       1 controller.go:239] Starting service controller
I0213 17:04:13.444374       1 shared_informer.go:240] Waiting for caches to sync for service
I0213 17:04:13.544518       1 shared_informer.go:247] Caches are synced for service
I0213 17:04:13.544782       1 event.go:291] "Event occurred" object="kube-system/ingress-expose" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I0213 17:04:13.559489       1 loadBalancer.go:149] syncing service 'ingress-expose' (398f068d-9c78-4244-a313-62d6a7fd8881)
I0213 17:04:13.559611       1 loadBalancer.go:164] found existing service 'ingress-expose' (398f068d-9c78-4244-a313-62d6a7fd8881) with vip 0.0.0.0
I0213 17:04:13.559807       1 event.go:291] "Event occurred" object="kube-system/ingress-expose" kind="Service" apiVersion="v1" type="Normal" reason="EnsuredLoadBalancer" message="Ensured load balancer"
w13915984028 commented 1 year ago

@vrapcan @lbartok Please add following information, thanks.

(1) From the create mode NODE, or try it in each NODE grep vip -i /oem/90_custom.yaml -2

(2) The network topology, are those NODEs connected to a Switch or an Router ? currently, the VIP needs the help of ARP Broadcast.

vrapcan commented 1 year ago

@w13915984028

(1) Running that command on all Harvester nodes of the cluster returns grep: /oem/90_custom.yaml: No such file or directory.

The /oem folder only contains a 99_custom.yaml file. Running that command against the 99_custom.yaml file returns nothing.

(2) Harvester nodes are connected to a managed L2 switch with SONiC OS. Routing functionality is handled by the pfSense VMs (in HA mode) running on the same cluster. Those that I mentioned in my first post here. Those that both rebooted at the same time.

vrapcan commented 1 year ago

@w13915984028

Based on you comment regarding the ARP Broatcast, I looked at the ARP Table on the pfSense and the VIP's IP address shows up in the table when I try to access it, but the MAC address field shows an (Incomplete) message, instead of a MAC address.

I can see an interface with the VIP's MAC address among the network interfaces on the first node, but it does not have the VIP IP address assigned.

221: vip-398f068d@mgmt-br.4000: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether d6:41:b6:bc:a9:df brd ff:ff:ff:ff:ff:ff

Description of the ingress-expose service is below.

Click for 'describe service ingress-expose' output

``` inog02:/home/rancher # kubectl describe service ingress-expose -n kube-system Name: ingress-expose Namespace: kube-system Labels: Annotations: kube-vip.io/hwaddr: d6:41:b6:bc:a9:df kube-vip.io/requestedIP: 10.40.0.20 Selector: app.kubernetes.io/name=rke2-ingress-nginx Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 10.53.242.124 IPs: 10.53.242.124 IP: 0.0.0.0 LoadBalancer Ingress: 10.40.0.20 Port: https-internal 443/TCP TargetPort: 443/TCP NodePort: https-internal 31746/TCP Endpoints: 10.52.0.93:443,10.52.1.10:443,10.52.2.4:443 + 1 more... Port: http 80/TCP TargetPort: 80/TCP NodePort: http 30362/TCP Endpoints: 10.52.0.93:80,10.52.1.10:80,10.52.2.4:80 + 1 more... Session Affinity: None External Traffic Policy: Cluster Events: ```

w13915984028 commented 1 year ago

@vrapcan Normally, when you set VIP as static IP, you should grep it from /oem/99_custom.yaml

Could you try to take one NODE from your current cluster, and install it as a single-node cluster, and verify if VIP works, thanks.

VIP mode is static image

Cluster is ready image

VIP mode is saved in oem file image

vrapcan commented 1 year ago

@w13915984028 Thank you for looking into this more deeply.

The nodes have static IP addresses, but it seems that the VIP has been set as DHCP during the installation (I did not do the installation).

Is there a way to force Harvester to request a new IP address for the VIP from the DHCP server? Or change the current VIP settings to a static IP address?

Could you try to take one NODE from your current cluster, ...

We only have 4 nodes. This may not be possible at the moment. I'll try to figure out a way to remove one.

w13915984028 commented 1 year ago

@vrapcan It meets our assumption, the VIP is set as DHCP mode.

@yaocw2020 Do you have a workaround to change the VIP from DHCP mode to static mode, or other solution to recover from current situation? thanks.

vrapcan commented 1 year ago

We managed to get the VIP working by creating a DHCP Static Mapping in the pfSense DHCP settings, linking the MAC address and the IP address (both) specified in the ingress-expose service.

inog02:/home/rancher # kubectl describe service ingress-expose -n kube-system
Name:                     ingress-expose
Namespace:                kube-system
Labels:                   <none>
Annotations:              kube-vip.io/hwaddr: d6:41:b6:bc:a9:df
                          kube-vip.io/requestedIP: 10.40.0.20
.....

image

While the VIP works at the moment, I am not confident that this is a good long-term solution. I believe that it would be better to change the Harvester's VIP settings to a static IP address if that is possible to do.

yaocw2020 commented 1 year ago

It's our best practice and recommends our customers bind the VIP in the DHCP static mapping. @vrapcan

mgazelle commented 1 year ago

I have the same issue and unfortunately I am not able to fix the VIP interface. I am using a static IP which should be bound to a specific MAC address. Is there any option to change the virtual mac address of the VIP Interface?

A few config details:

grep vip -i /oem/90_custom.yaml  -2
            values:
              service:
                vip:
                  enabled: true
                  mode: "static"
--
              harvester-network-controller:
                enabled: true
                vipEnabled: true
                image:
                  pullPolicy: "IfNotPresent"
--
              harvester-load-balancer:
                enabled: true
              kube-vip:
                enabled: true
              kube-vip-cloud-provider:
                enabled: true
        - apiVersion: management.cattle.io/v3
--
kubectl get service -A

NAMESPACE             NAME                                      TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                      AGE
cattle-fleet-system   gitjob                                    ClusterIP      10.53.170.58    <none>           80/TCP                       61m
cattle-system         harvester-cluster-repo                    ClusterIP      10.53.166.200   <none>           80/TCP                       60m
cattle-system         rancher                                   ClusterIP      10.53.125.43    <none>           80/TCP,443/TCP               62m
cattle-system         rancher-webhook                           ClusterIP      10.53.229.125   <none>           443/TCP                      61m
cattle-system         webhook-service                           ClusterIP      10.53.104.45    <none>           443/TCP                      61m
default               kubernetes                                ClusterIP      10.53.0.1       <none>           443/TCP                      62m
harvester-system      harvester                                 ClusterIP      10.53.140.87    <none>           8443/TCP                     59m
harvester-system      harvester-load-balancer-webhook           ClusterIP      10.53.172.87    <none>           443/TCP                      59m
harvester-system      harvester-network-webhook                 ClusterIP      10.53.195.32    <none>           443/TCP                      59m
harvester-system      harvester-webhook                         ClusterIP      10.53.138.244   <none>           443/TCP                      59m
harvester-system      kubevirt-operator-webhook                 ClusterIP      10.53.169.245   <none>           443/TCP                      58m
harvester-system      kubevirt-prometheus-metrics               ClusterIP      10.53.238.68    <none>           443/TCP                      58m
harvester-system      virt-api                                  ClusterIP      10.53.13.139    <none>           443/TCP                      58m
kube-system           harvester-snapshot-validation-webhook     ClusterIP      10.53.124.94    <none>           443/TCP                      59m
kube-system           ingress-expose                            LoadBalancer   10.53.52.152    xxx.xxx.xxx.xxx   443:30495/TCP,80:31351/TCP   57m
kube-system           rke2-coredns-rke2-coredns                 ClusterIP      10.53.0.10      <none>           53/UDP,53/TCP                62m
kube-system           rke2-ingress-nginx-controller-admission   ClusterIP      10.53.91.140    <none>           443/TCP                      61m
kube-system           rke2-metrics-server                       ClusterIP      10.53.98.100    <none>           443/TCP                      62m
longhorn-system       csi-attacher                              ClusterIP      10.53.23.224    <none>           12345/TCP                    58m
longhorn-system       csi-provisioner                           ClusterIP      10.53.162.130   <none>           12345/TCP                    58m
longhorn-system       csi-resizer                               ClusterIP      10.53.198.16    <none>           12345/TCP                    58m
longhorn-system       csi-snapshotter                           ClusterIP      10.53.166.57    <none>           12345/TCP                    58m
longhorn-system       longhorn-admission-webhook                ClusterIP      10.53.29.145    <none>           9443/TCP                     59m
longhorn-system       longhorn-backend                          ClusterIP      10.53.121.171   <none>           9500/TCP                     59m
longhorn-system       longhorn-conversion-webhook               ClusterIP      10.53.249.102   <none>           9443/TCP                     59m
longhorn-system       longhorn-engine-manager                   ClusterIP      None            <none>           <none>                       59m
longhorn-system       longhorn-frontend                         ClusterIP      10.53.199.123   <none>           80/TCP                       59m
longhorn-system       longhorn-recovery-backend                 ClusterIP      10.53.27.180    <none>           9600/TCP                     59m
longhorn-system       longhorn-replica-manager                  ClusterIP      None            <none>           <none>                       59m
kubectl get svc -n kube-system ingress-expose -ojsonpath='{.metadata.annotations}'
{"kube-vip.io/ignore-service-security":"true","kube-vip.io/loadbalancerIPs":"xxx.xxx.xxx.xxx","kube-vip.io/vipHost":"myclustername"}

Thank you

qrkourier commented 11 months ago

I may be experiencing the same underlying issue. I'm using rancher-vcluster. I can reach 22/tcp on the VIP from a VM pod scheduled on the same node where the VIP is bound, but not 443/tcp. From the same VM, I can reach 22,443/tcp on both of the other two Harvester nodes' IPs. Only 443/tcp on the VIP and the node IP where the VM is scheduled are failing. The SYN never arrives on the switch port, so it's apparently dropped somewhere in the Harvester networking.

The guest VM's veth device is a slave of the Harvester node's mgmt-br interface. Monitoring mgmt-br, I see the SYN is mangled to an rke2-ingress-nginx pod on a different node.

Update: the attempts to connect are balanced across the three running rke2-ingress-nginx pods. When capturing on the calico interface of the pod on the same host, I can see the SYN never arrives. It's last seen on mgmt-br with destination IP of the rke2-ingress-nginx pod IP.

joshuarestivo commented 10 months ago

I've run into the same issue. I have opnsense running as a VM within the cluster. My node IPs and VIP are all static. From the opnsense instance, or when traversing it, I can ping the VIP. ssh also responds on the VIP. From the opnsense VM, port 443 on the VIP reports as closed. From any host outside of the cluster, the VIP is accessible on 443. The cluster is connected to a L2 switch, there is a bond in place with two ports on each node in an active/backup configuration.

Editing my original comment for additional clarity: When you first bring up a new harvester cluster and install the IP gateway (opnsense, in my case) on it, everything works fine. The moment that the gateway gets migrated to a new node, the VIP goes unreachable on port 443 on the node that is actively holding the gateway. There is no going back after this. You are simply unable to reach the harvester web server on the node that the gateway is on (even when accessing it by it's direct IP). All other services (eg., 6443/tcp) remaining accessible, it's only port 443 that's impacted.

tahvane1 commented 5 months ago

I'm having this same issue with Rancher vcluster addon. Guest cluster can't be created when dns name points to VIP but succeeds when it points to node that is not having VIP address.