harvester / harvester

Open source hyperconverged infrastructure (HCI) software
https://harvesterhci.io/
Apache License 2.0
3.87k stars 327 forks source link

[BUG]Single node harvester in Provo DAY0 KVM VM, network is not reachable after a few days #1668

Closed w13915984028 closed 2 years ago

w13915984028 commented 2 years ago

Describe the bug

Hardware: Provo DAY0, single rack HP server HostOS: SUSE SLE15 SP2, KVM VM1: ubuntu server, VM2: Harvester main node, both using NAT to access the internet

I install the harvester in VM2 (single node), after 2 or 3 days, the VM2 is not accessible from VIP/node IP, ping fail from host OS, unless shutdown the VM2 and restart it again. Every a few day, the VM2 lost its network. But the VM1 has never encountered the network issue.

Encounter this issue frequently in this specific environment.

To Reproduce Steps to reproduce the behavior:

  1. Create one VM in KVM, boot with harvester ISO
  2. Install harvester (single node) harvester via the ISO guide
  3. Harvester starts up and is ready for usage
  4. After a few days, the VM will lost its network connectivity from HostOS, no more PING/SSH able,
  5. In this specific server, the issue came at least 3 times in different ISO version.

Expected behavior

The Harvester should keep running, network is reachable

Support bundle

ssh admin@10.84.132.14
Last login: Fri Dec 10 09:19:42 2021 from 10.163.24.30
admin@provoday0:~> sudo virsh list

 Id   Name          State
-----------------------------
 1    ubuntu20.04   running     ----- 192.168.122.85
 55   hmain2911     running     ---- 192.168.122.200

admin@provoday0:~> ping 192.168.122.200
PING 192.168.122.200 (192.168.122.200) 56(84) bytes of data.
From 192.168.122.1 icmp_seq=1 Destination Host Unreachable
From 192.168.122.1 icmp_seq=2 Destination Host Unreachable
From 192.168.122.1 icmp_seq=3 Destination Host Unreachable
From 192.168.122.1 icmp_seq=4 Destination Host Unreachable
From 192.168.122.1 icmp_seq=5 Destination Host Unreachable
From 192.168.122.1 icmp_seq=6 Destination Host Unreachable
^C

--- 192.168.122.200 ping statistics ---
9 packets transmitted, 0 received, +6 errors, 100% packet loss, time 8192ms
pipe 4
admin@provoday0:~>
admin@provoday0:~> ping 192.168.122.85
PING 192.168.122.85 (192.168.122.85) 56(84) bytes of data.
64 bytes from 192.168.122.85: icmp_seq=1 ttl=64 time=0.281 ms
64 bytes from 192.168.122.85: icmp_seq=2 ttl=64 time=0.210 ms
^C
--- 192.168.122.85 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.210/0.245/0.281/0.038 ms

Environment:

Additional context Add any other context about the problem here.

bk201 commented 2 years ago

Could you log in to the VM via VNC and dump the ip a result?

w13915984028 commented 2 years ago

Looks a bit tricky.

F12 page shows mgmt URL: 192.168.122.200, not ready

but ip addr show dev harvester-mgmt is 192.168.122.11, it is pingable from HostOS

image image

admin@provoday0:~> ping 192.168.122.11
PING 192.168.122.11 (192.168.122.11) 56(84) bytes of data.
64 bytes from 192.168.122.11: icmp_seq=1 ttl=64 time=0.347 ms
64 bytes from 192.168.122.11: icmp_seq=2 ttl=64 time=0.324 ms
64 bytes from 192.168.122.11: icmp_seq=3 ttl=64 time=0.305 ms
^C
--- 192.168.122.11 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2049ms
rtt min/avg/max/mdev = 0.305/0.325/0.347/0.022 ms
admin@provoday0:~>

VM has no restart after last installation

rancher@harvmain0112:~> uptime 18:01:45 up 12 days 0:40, 2 users, load average: 11.03, 11.47, 11.40

w13915984028 commented 2 years ago

rancher@harvmain0112:~> ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master harvester-mgmt state UP group default qlen 1000
    link/ether 52:54:00:90:e2:26 brd ff:ff:ff:ff:ff:ff
    altname enp0s3
3: harvester-mgmt: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:90:e2:26 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.11/24 brd 192.168.122.255 scope global harvester-mgmt
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe90:e226/64 scope link
       valid_lft forever preferred_lft forever
6: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 1e:6f:98:12:cd:76 brd ff:ff:ff:ff:ff:ff
    inet 10.52.0.0/32 brd 10.52.0.0 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::1c6f:98ff:fe12:cd76/64 scope link
       valid_lft forever preferred_lft forever
7: calib76ee3d87dc@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-313ef4eb-b374-2433-7bde-772aa0ee20b1
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
w13915984028 commented 2 years ago

pods are in terrible state

rancher@harvmain0112:~> kubectl get pods -A
NAMESPACE                   NAME                                                     READY   STATUS              RESTARTS   AGE
cattle-fleet-local-system   fleet-agent-7cbd6946f9-4xghw                             1/1     Running             144        12d
cattle-fleet-system         fleet-controller-7765f46db-26gzw                         0/1     CrashLoopBackOff    1448       12d
cattle-fleet-system         gitjob-95bb5f685-8wgrj                                   1/1     Running             143        12d
cattle-monitoring-system    prometheus-rancher-monitoring-prometheus-0               0/3     ContainerCreating   0          5d20h
cattle-monitoring-system    rancher-monitoring-admission-create-nww66                0/1     Completed           0          6d15h
cattle-monitoring-system    rancher-monitoring-crd-create-ckjsz                      0/1     Completed           0          6d15h
cattle-monitoring-system    rancher-monitoring-grafana-7f54b7d8bc-jggld              0/3     Init:0/2            0          5d16h
cattle-monitoring-system    rancher-monitoring-kube-state-metrics-744b9448f4-bgdqc   1/1     Running             30         12d
cattle-monitoring-system    rancher-monitoring-operator-754bcd8cb4-j9lwc             1/1     Running             2          12d
cattle-monitoring-system    rancher-monitoring-prometheus-adapter-77568b975-jdn8z    0/1     Error               1477       12d
cattle-monitoring-system    rancher-monitoring-prometheus-node-exporter-dgdv7        1/1     Running             77         12d
cattle-system               harvester-cluster-repo-6d7777b9c7-mcwg8                  1/1     Running             2          12d
cattle-system               rancher-7b76fb5dd5-qknw7                                 0/1     Running             1504       12d
cattle-system               rancher-webhook-fcd8cdc88-g8pph                          1/1     Running             1          12d
cattle-system               system-upgrade-controller-7c878c4798-n2tp4               1/1     Running             0          12d
harvester-system            harvester-5bd4876c66-dzfsn                               1/1     Running             75         11d
harvester-system            harvester-load-balancer-5b4b949748-5nfg6                 1/1     Running             145        12d
harvester-system            harvester-network-controller-5nl26                       1/1     Running             1          12d
harvester-system            harvester-network-controller-manager-7769fd599d-dhqn7    1/1     Running             149        12d
harvester-system            harvester-network-controller-manager-7769fd599d-gmz2v    1/1     Running             143        12d
harvester-system            harvester-node-disk-manager-wtq4z                        1/1     Running             2          12d
harvester-system            harvester-webhook-7f568f68fb-vb9vn                       0/1     Pending             0          11d
harvester-system            harvester-webhook-98575b94b-xb6j9                        1/1     Running             0          11d
harvester-system            kube-vip-cloud-provider-0                                1/1     Running             157        12d
harvester-system            kube-vip-sjgt4                                           1/1     Running             194        12d
harvester-system            virt-api-86455cdb7d-8ch6b                                0/1     Running             0          12d
harvester-system            virt-api-86455cdb7d-vz6qc                                0/1     Running             1          12d
harvester-system            virt-controller-5f649999dd-bl7k5                         0/1     Running             1805       12d
harvester-system            virt-controller-5f649999dd-rp4tw                         0/1     CrashLoopBackOff    1801       12d
harvester-system            virt-handler-z49mh                                       0/1     Running             1387       12d
harvester-system            virt-operator-56c5bdc7b8-9v2tf                           0/1     CrashLoopBackOff    1442       12d
kube-system                 cloud-controller-manager-harvmain0112                    1/1     Running             189        12d
kube-system                 etcd-harvmain0112                                        1/1     Running             7          4d16h
kube-system                 helm-install-rke2-canal-8czdv                            0/1     Completed           0          12d
kube-system                 helm-install-rke2-coredns-kbvc7                          0/1     Completed           0          12d
kube-system                 helm-install-rke2-ingress-nginx-7nfdc                    0/1     Completed           0          12d
kube-system                 helm-install-rke2-metrics-server-tj9bq                   0/1     Completed           0          12d
kube-system                 helm-install-rke2-multus-bdzc4                           0/1     Completed           0          12d
kube-system                 kube-apiserver-harvmain0112                              1/1     Running             0          9d
kube-system                 kube-controller-manager-harvmain0112                     1/1     Running             190        12d
kube-system                 kube-multus-ds-hznds                                     1/1     Running             1          12d
kube-system                 kube-proxy-harvmain0112                                  1/1     Running             1          12d
kube-system                 kube-scheduler-harvmain0112                              1/1     Running             183        12d
kube-system                 rke2-canal-dscrn                                         1/2     CrashLoopBackOff    1929       12d
kube-system                 rke2-coredns-rke2-coredns-7bb4f446c-nbxcv                1/1     Running             1          12d
kube-system                 rke2-coredns-rke2-coredns-autoscaler-7c58bd5b6c-4xhh4    1/1     Running             52         12d
kube-system                 rke2-ingress-nginx-controller-2hbjc                      0/1     CrashLoopBackOff    2011       12d
kube-system                 rke2-metrics-server-5df7d77b5b-rx8v7                     0/1     CrashLoopBackOff    1473       12d
kube-system                 snapshot-controller-9f68fdd9-cc86j                       1/1     Running             154        12d
kube-system                 snapshot-controller-9f68fdd9-scdsk                       1/1     Running             143        12d
longhorn-system             backing-image-manager-c00e-ecd3                          0/1     Running             0          5d9h
longhorn-system             csi-attacher-66fcbbff5c-9mz49                            1/1     Running             151        12d
longhorn-system             csi-attacher-66fcbbff5c-hqh5d                            1/1     Running             140        12d
longhorn-system             csi-attacher-66fcbbff5c-xqmjz                            1/1     Running             145        12d
longhorn-system             csi-provisioner-84fcfbf785-6r5hl                         0/1     CrashLoopBackOff    1447       12d
longhorn-system             csi-provisioner-84fcfbf785-p4lxg                         0/1     CrashLoopBackOff    1434       12d
longhorn-system             csi-provisioner-84fcfbf785-swj2j                         0/1     CrashLoopBackOff    1440       12d
longhorn-system             csi-resizer-58ff455cdb-4wmsc                             1/1     Running             126        12d
longhorn-system             csi-resizer-58ff455cdb-gbrzw                             1/1     Running             122        12d
longhorn-system             csi-resizer-58ff455cdb-kcgfr                             1/1     Running             120        12d
longhorn-system             csi-snapshotter-59f5cd8b8c-4xfbl                         1/1     Running             135        12d
longhorn-system             csi-snapshotter-59f5cd8b8c-dmdm7                         1/1     Running             127        12d
longhorn-system             csi-snapshotter-59f5cd8b8c-kxqjr                         1/1     Running             125        12d
longhorn-system             engine-image-ei-a6c8003e-q74hj                           1/1     Running             0          12d
longhorn-system             longhorn-csi-plugin-nds9l                                2/2     Running             3          12d
longhorn-system             longhorn-driver-deployer-97d65ccb8-tt2dg                 1/1     Running             0          12d
longhorn-system             longhorn-manager-2wd5g                                   0/1     Running             9          12d
longhorn-system             longhorn-post-upgrade-77dq6                              0/1     Completed           0          6d16h
longhorn-system             longhorn-ui-55cb5cdc88-5q8mt                             1/1     Running             3          12d
rancher@harvmain0112:~>
rancher@harvmain0112:~> kubectl get service -A
NAMESPACE                  NAME                                          TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)                        AGE
cattle-fleet-system        gitjob                                        ClusterIP      10.53.150.72    <none>            80/TCP                         12d
cattle-monitoring-system   prometheus-operated                           ClusterIP      None            <none>            9090/TCP                       12d
cattle-monitoring-system   rancher-monitoring-grafana                    ClusterIP      10.53.124.24    <none>            80/TCP                         12d
cattle-monitoring-system   rancher-monitoring-kube-state-metrics         ClusterIP      10.53.84.253    <none>            8080/TCP                       12d
cattle-monitoring-system   rancher-monitoring-operator                   ClusterIP      10.53.55.64     <none>            443/TCP                        12d
cattle-monitoring-system   rancher-monitoring-prometheus                 ClusterIP      10.53.62.99     <none>            9090/TCP                       12d
cattle-monitoring-system   rancher-monitoring-prometheus-adapter         ClusterIP      10.53.233.76    <none>            443/TCP                        12d
cattle-monitoring-system   rancher-monitoring-prometheus-node-exporter   ClusterIP      10.53.173.98    <none>            9796/TCP                       12d
cattle-system              harvester-cluster-repo                        ClusterIP      10.53.83.100    <none>            80/TCP                         12d
cattle-system              rancher                                       ClusterIP      10.53.128.146   <none>            80/TCP,443/TCP                 12d
cattle-system              rancher-webhook                               ClusterIP      10.53.191.69    <none>            443/TCP                        12d
cattle-system              webhook-service                               ClusterIP      10.53.41.133    <none>            443/TCP                        12d
default                    kubernetes                                    ClusterIP      10.53.0.1       <none>            443/TCP                        12d
harvester-system           harvester                                     ClusterIP      10.53.85.35     <none>            8443/TCP                       12d
harvester-system           harvester-webhook                             ClusterIP      10.53.120.75    <none>            443/TCP                        12d
harvester-system           kubevirt-operator-webhook                     ClusterIP      10.53.52.244    <none>            443/TCP                        12d
harvester-system           kubevirt-prometheus-metrics                   ClusterIP      10.53.111.120   <none>            443/TCP                        12d
harvester-system           virt-api                                      ClusterIP      10.53.219.64    <none>            443/TCP                        12d
kube-system                ingress-expose                                LoadBalancer   10.53.233.196   192.168.122.200   443:31255/TCP,80:32106/TCP     12d
kube-system                rancher-monitoring-coredns                    ClusterIP      None            <none>            9153/TCP                       12d
kube-system                rancher-monitoring-kubelet                    ClusterIP      None            <none>            10250/TCP,10255/TCP,4194/TCP   12d
kube-system                rke2-coredns-rke2-coredns                     ClusterIP      10.53.0.10      <none>            53/UDP,53/TCP                  12d
kube-system                rke2-ingress-nginx-controller-admission       ClusterIP      10.53.167.197   <none>            443/TCP                        12d
kube-system                rke2-metrics-server                           ClusterIP      10.53.50.165    <none>            443/TCP                        12d
longhorn-system            csi-attacher                                  ClusterIP      10.53.97.224    <none>            12345/TCP                      12d
longhorn-system            csi-provisioner                               ClusterIP      10.53.208.225   <none>            12345/TCP                      12d
longhorn-system            csi-resizer                                   ClusterIP      10.53.223.141   <none>            12345/TCP                      12d
longhorn-system            csi-snapshotter                               ClusterIP      10.53.52.108    <none>            12345/TCP                      12d
longhorn-system            longhorn-backend                              ClusterIP      10.53.3.200     <none>            9500/TCP                       12d
longhorn-system            longhorn-engine-manager                       ClusterIP      None            <none>            <none>                         12d
longhorn-system            longhorn-frontend                             ClusterIP      10.53.118.5     <none>            80/TCP                         12d
longhorn-system            longhorn-replica-manager                      ClusterIP      None            <none>            <none>                         12d
 kubectl logs deployment/fleet-controller -n cattle-fleet-system
Error: Get "https://10.53.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions": dial tcp 10.53.0.1:443: connect: no route to host
..

time="2021-12-13T18:06:34Z" level=fatal msg="Get \"https://10.53.0.1:443/apis/apiextensions.k8s.io/v1/customresourcedefinitions\": dial tcp 10.53.0.1:443: connect: no route to
gitlawr commented 2 years ago

So the node IP is accessible, VIP(192.168.122.200) is not. Logs from the kubevip pod:

E1214 02:08:46.546969       1 leaderelection.go:325] error retrieving resource lock harvester-system/plndr-svcs-lock: Get "https://10.53.0.1:443/apis/coordination.k8s.io/v1/namespaces/harvester-system/leases/plndr-svcs-lock": dial tcp 10.53.0.1:443: connect: no route to host

Check the endpoint:

$ kubectl get ep
NAME         ENDPOINTS             AGE
kubernetes   192.168.122.13:6443   12d

Note that the registered node IP(192.168.122.13) is different from the current one(192.168.122.11). The cause of the problem is that the node IP is changed.

w13915984028 commented 2 years ago

So the node IP is accessible, VIP(192.168.122.200) is not. Logs from the kubevip pod:

E1214 02:08:46.546969       1 leaderelection.go:325] error retrieving resource lock harvester-system/plndr-svcs-lock: Get "https://10.53.0.1:443/apis/coordination.k8s.io/v1/namespaces/harvester-system/leases/plndr-svcs-lock": dial tcp 10.53.0.1:443: connect: no route to host

Check the endpoint:

$ kubectl get ep
NAME         ENDPOINTS             AGE
kubernetes   192.168.122.13:6443   12d

Note that the registered node IP(192.168.122.13) is different from the current one(192.168.122.11). The cause of the problem is that the node IP is changed.

-- After Harvester installation, the node(VM) runs 12+ days without rebooting, which module may change the node IP ?

The node(VM) runs on-top of KVM, when booting, it gets IP address from KVM, after booting, the IP normally keeps unchanged.

gitlawr commented 2 years ago

I saw that the NIC uses DHCP mode. Could it be a misconfiguration or loss of leases on the DHCP server?

w13915984028 commented 2 years ago

I saw that the NIC uses DHCP mode. Could it be a misconfiguration or loss of leases on the DHCP server?

The node VM is attached to "default" network of KVM, the DHCP is an embedded functionality/feature of KVM, provides 192.168.122.* to guest VM, and also NAT feature for internet access.

Meanwhile, another VM, ubuntu server, is in same mode as Harvester VM, it`s IP is never changed after booting (30+ days).

The difference is, Harvester VM creates a harvester-mgmt interface/dev, removes the original node IP from "ens3/enp0s3" and attaches it to mgmt interface. Maybe this behavior trigs some tricky thing.

harvester VM

2: ens3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master harvester-mgmt state UP group default qlen 1000
    link/ether 52:54:00:90:e2:26 brd ff:ff:ff:ff:ff:ff
    altname enp0s3
3: harvester-mgmt: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:90:e2:26 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.11/24 brd 192.168.122.255 scope global harvester-mgmt
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe90:e226/64 scope link
       valid_lft forever preferred_lft forever
rancher@ubuntuvmday0:~$ ip addr
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:10:4f:08 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.85/24 brd 192.168.122.255 scope global dynamic enp1s0
       valid_lft 2159sec preferred_lft 2159sec
    inet6 fe80::5054:ff:fe10:4f08/64 scope link
       valid_lft forever preferred_lft forever

HostOS KVM "default" network: NAT mode; DHCP for guest IP provision

admin@provoday0:~> sudo virsh net-dumpxml default
<network connections='2'>
  <name>default</name>
  <uuid>df36fd7c-e2f8-4910-b45a-d3bf1238c919</uuid>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='virbr0' stp='on' delay='0'/>
  <mac address='52:54:00:ea:e2:f8'/>
  <ip address='192.168.122.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.122.2' end='192.168.122.254'/>
      <bootp file='ipxe-create' server='192.168.122.85'/>    --------> this config is for PXE test, normally there is no such line
    </dhcp>
  </ip>
</network>

admin@provoday0:~>

The PROVO DAY0 KVM "default" network can not use "bridge" mode, I tried, the PROVO DC DHCP server does not allocate IP to VM on-top of Provo DAY0, it is in MAC-IP binding mode, the VM MAC is virtual generated and not recorded in DC DHCP Server.

gitlawr commented 2 years ago

We might need to check if the DHCP lease renewal for the mgmt bond proceeds as expected

w13915984028 commented 2 years ago

KVM has following debug info, VIP is not in DHCP guest list, there is no DHCP history info.

guess: cluster components/network first has issue, then the VIP controller fail to renew VIP via DHCP (from KVM), finally the VIP is lost

virsh # domifaddr hmain2911 --full
 Name       MAC address          Protocol     Address
-------------------------------------------------------------------------------
 vnet54     52:54:00:90:e2:26    ipv4         192.168.122.11/24

virsh # domifaddr ubuntu20.04 --full
 Name       MAC address          Protocol     Address
-------------------------------------------------------------------------------
 vnet0      52:54:00:10:4f:08    ipv4         192.168.122.85/24

virsh # net-dhcp-leases --network default
 Expiry Time           MAC address         Protocol   IP address          Hostname       Client ID or DUID
---------------------------------------------------------------------------------------------------------------------------------------------------
 2021-12-14 04:59:16   52:54:00:10:4f:08   ipv4       192.168.122.85/24   ubuntuvmday0   ff:56:50:4d:98:00:02:00:00:ab:11:b5:85:b2:62:9e:64:3c:a1
 2021-12-14 05:02:29   52:54:00:90:e2:26   ipv4       192.168.122.11/24   harvmain0112   ff:00:90:e2:26:00:01:00:01:29:3a:6b:a3:52:54:00:90:e2:26
provoday0:/home/admin # dmesg | grep vnet54
[2411830.673235] virbr0: port 2(vnet54) entered blocking state
[2411830.673239] virbr0: port 2(vnet54) entered disabled state
[2411830.673384] device vnet54 entered promiscuous mode
[2411830.673612] virbr0: port 2(vnet54) entered blocking state
[2411830.673614] virbr0: port 2(vnet54) entered listening state
[2411832.685161] virbr0: port 2(vnet54) entered learning state
[2411834.701128] virbr0: port 2(vnet54) entered forwarding state
provoday0:/home/admin #

[2411834.701139] virbr0: topology change detected, propagating
[2421438.859043] BTRFS info (device sda7): qgroup scan completed (inconsistency flag cleared)
[2507898.036183] BTRFS info (device sda7): qgroup scan completed (inconsistency flag cleared)
[2579281.008378] FS-Cache: Loaded
[2579281.088907] RPC: Registered named UNIX socket transport module.
[2579281.088911] RPC: Registered udp transport module.
[2579281.088912] RPC: Registered tcp transport module.
[2579281.088913] RPC: Registered tcp NFSv4.1 backchannel transport module.
[2579281.167505] FS-Cache: Netfs 'nfs' registered for caching
[2579281.275462] Key type dns_resolver registered
[2579281.535414] NFS: Registering the id_resolver key type
[2579281.535427] Key type id_resolver registered
[2579281.535428] Key type id_legacy registered
[2594356.922544] BTRFS info (device sda7): qgroup scan completed (inconsistency flag cleared)
[2680815.803031] BTRFS info (device sda7): qgroup scan completed (inconsistency flag cleared)
[2767274.681385] BTRFS info (device sda7): qgroup scan completed (inconsistency flag cleared)
[2853733.629418] BTRFS info (device sda7): qgroup scan completed (inconsistency flag cleared)
[2940192.577587] BTRFS info (device sda7): qgroup scan completed (inconsistency flag cleared)
[3026651.395416] BTRFS info (device sda7): qgroup scan completed (inconsistency flag cleared)
[3113083.657239] BTRFS info (device sda7): qgroup scan completed (inconsistency flag cleared)
[3199509.337159] BTRFS info (device sda7): qgroup scan completed (inconsistency flag cleared)
[3285968.328645] BTRFS info (device sda7): qgroup scan completed (inconsistency flag cleared)
[3372427.394127] BTRFS info (device sda7): qgroup scan completed (inconsistency flag cleared)
[3458886.201840] BTRFS info (device sda7): qgroup scan completed (inconsistency flag cleared)
provoday0:/home/admin #
w13915984028 commented 2 years ago

I tried to reproduce this issue, and have some interesting finding. https://github.com/w13915984028/harvester-develop-summary/issues/1

The VIP is seeing aged from KVM DHCP list. image

The VIP is allocated from KVM using MAC 6a:10:d2:c2:6a:3b of vip-7e465ab1@harvester-mgmt, but it is attached to harvester-mgmt

It looks, when the Harvester goes into "ready" state, the VIP is last re-leased from DHCP server, after that, no renewing.

Which module is responsible for renewing the VIP? If it try to get VIP (ipv4) from dev vip-7e465ab1@harvester-mgmt, then it will fail, the VIP is attached to dev harvester-mgmt.

w13915984028 commented 2 years ago

Update: In this environment, when VIP is set as static IP address, the issues is not encountered.

Another related: https://github.com/harvester/harvester/issues/1681

w13915984028 commented 2 years ago

With Harvester master-head ISO, tested:

  1. VIP is set in static mode;
  2. VIP is allocated via DHCP server, server side has IP/MAC binding;

the issue is not encountered.

Close the issue now.