w13915984028 / harvester-develop-summary

Summary of Harvester develop.
2 stars 4 forks source link

Harvester to-be-investigated issue tracking #3

Open w13915984028 opened 2 years ago

w13915984028 commented 2 years ago

Track those issues.

w13915984028 commented 2 years ago

Remove from and rejoin a node from cluster:

https://github.com/harvester/harvester/issues/2665 (root cause is found) https://github.com/harvester/harvester/issues/2470

At the moment, the re-started node failed at bootstrap stage, rancherd waited kubelet, but the latter failed to start.

w13915984028 commented 2 years ago

The centos-7 vmdk file from vmware workstation (using both IDE or SCSI), converted to qemu image qcow2, then can start with sudo qemu-system-x86_64 -m 4G -accel kvm -hda /home/jianwang/images/centos-ide.qcow2, but failed when starting with kubevirt.

https://github.com/kubevirt/kubevirt/issues/8175

https://github.com/harvester/harvester/issues/2561 .

solved with workaround

w13915984028 commented 1 year ago

rancher-monitoring-prometheus-adapter seems busy, TBD

I0831 14:50:04.753431       1 httplog.go:104] "HTTP" verb="GET" URI="/healthz" latency="89.139µs" userAgent="kube-probe/1.22" audit-ID="9ba55a6d-5ecd-4f92-967c-c8e87083933f" srcIP="192.168.122.131:54022" resp=200
I0831 14:50:04.753867       1 httplog.go:104] "HTTP" verb="GET" URI="/healthz" latency="972.854µs" userAgent="kube-probe/1.22" audit-ID="ef2d53bc-b80e-4138-a45f-0bb251b4903b" srcIP="192.168.122.131:54014" resp=200
I0831 14:50:06.607622       1 httplog.go:104] "HTTP" verb="GET" URI="/apis/custom.metrics.k8s.io/v1beta1?timeout=32s" latency="13.856765ms" userAgent="kubectl/v1.22.12+rke2r1 (linux/amd64) kubernetes/b058e17" audit-ID="ad3a1b20-14db-4311-8306-761234a3bba3" srcIP="192.168.122.131:36634" resp=200
I0831 14:50:06.619239       1 httplog.go:104] "HTTP" verb="GET" URI="/apis/custom.metrics.k8s.io/v1beta1?timeout=32s" latency="10.991853ms" userAgent="kubectl/v1.22.12+rke2r1 (linux/amd64) kubernetes/b058e17" audit-ID="03d0c6e1-e30d-4e60-8299-cc76e4166865" srcIP="192.168.122.131:36634" resp=200
I0831 14:50:06.635917       1 httplog.go:104] "HTTP" verb="GET" URI="/apis/custom.metrics.k8s.io/v1beta1?timeout=32s" latency="9.884385ms" userAgent="kubectl/v1.22.12+rke2r1 (linux/amd64) kubernetes/b058e17" audit-ID="bcebc2ad-41f9-4e0c-a85a-11ac07c1670e" srcIP="192.168.122.131:36634" resp=200
I0831 14:50:06.665654       1 httplog.go:104] "HTTP" verb="GET" URI="/apis/custom.metrics.k8s.io/v1beta1?timeout=32s" latency="13.042048ms" userAgent="kubectl/v1.22.12+rke2r1 (linux/amd64) kubernetes/b058e17" audit-ID="3801e8e4-84a6-4d58-8c53-7edfb50a02d8" srcIP="192.168.122.131:36634" resp=200
harv31:~ # 
harv31:~ # 
harv31:~ # 
harv31:~ # 
harv31:~ # 
harv31:~ # kk logs -n cattle-monitoring-system rancher-monitoring-prometheus-adapter-8846d4757-qphc4
harv31:~ # kk get pods -n cattle-monitoring-system rancher-monitoring-prometheus-adapter-8846d4757-qphc4 -o JSON
{

        "containers": [
            {
                "args": [
                    "/adapter",
                    "--secure-port=6443",
                    "--cert-dir=/tmp/cert",
                    "--logtostderr=true",
                    "--prometheus-url=http://rancher-monitoring-prometheus.cattle-monitoring-system.svc:9090",
                    "--metrics-relist-interval=1m",
                    "--v=4",
                    "--config=/etc/adapter/config.yaml"
                ],
                "image": "rancher/mirrored-prometheus-adapter-prometheus-adapter:v0.9.0",
                "imagePullPolicy": "IfNotPresent",
harv31:~ # ps aux | grep adapter

10001    15803 11.0  0.9 1030400 200488 ?      Ssl  14:14   4:23 /adapter /adapter --secure-port=6443 --cert-dir=/tmp/cert --logtostderr=true --prometheus-url=http://rancher-monitoring-prometheus.cattle-monitoring-system.svc:9090 --metrics-relist-interval=1m --v=4 --config=/etc/adapter/config.yaml
w13915984028 commented 1 year ago

Node promotion:

https://github.com/harvester/harvester/issues/3039 [BUG] Adding a third node mades the second one to fail

And a similar one: https://github.com/harvester/harvester/issues/3091

w13915984028 commented 1 year ago

disk pressure cause by tmp file [BUG] Disk pressure caused by Rancher agent tmp files

w13915984028 commented 1 year ago

NVMe PCIe - Slow Virtual Machine Performance https://github.com/harvester/harvester/issues/3356

upload BIG image, fail at 99%, dueto checksum computing: https://github.com/harvester/harvester/issues/3450 https://github.com/longhorn/longhorn/issues/4865 #3555 [BUG] both upload and download may fail due to LH GetFileChecksum in last step

w13915984028 commented 1 year ago

Install fail/timeout, when the Harvester NODE and the ISO server have poor network performance. https://github.com/harvester/harvester/issues/2651 FIXED [[BUG] Virtual Media Installation Hangs For 2+hrs With "containerd.sock" Connection Error]

After NODE rebooting, it may:

Trouble shooting: Stuck in 'Setting up node/Harvester' [BUG] Stuck in 'Setting up node/Harvester' after install harvester #3844 https://github.com/harvester/harvester/issues/3844#issuecomment-1545062052

POD harvester-cluster-repo-67ddddf8d7-zc7zd is ImagePullBackOff

w13915984028 commented 1 year ago

Trouble shooting: upload/download of large images stuck at 99%: https://github.com/harvester/harvester/issues/3555 https://github.com/harvester/harvester/issues/3450 https://github.com/harvester/harvester/issues/3086

w13915984028 commented 1 year ago

Trouble shooting: wrong configuration of VLAN network IP segment:

[BUG]when the vm vlan network segment is the same as the host, host can not connection to vm https://github.com/harvester/harvester/issues/3414

[BUG] iptables on Harvester hosts prevents Vms network from working correctly (no access to internet) https://github.com/harvester/harvester/issues/3852

[BUG] After the weekend, VM on one node can't connect to outside network https://github.com/harvester/harvester/issues/3745

w13915984028 commented 1 year ago

Future enhancement: keep VM MAC stable: should allow user to supply MAC address when creating VM and keep it unchanged:

https://github.com/harvester/harvester/issues/3602 [FEATURE] Prefix Mac Address

[Question] How to ensure permanent static IP assigned to newly built VMs? https://github.com/harvester/harvester/issues/3682

[BUG] backup restore does not carry the same MAC address https://github.com/harvester/harvester/issues/3541

w13915984028 commented 1 year ago

Data corruption / in-consistency

https://github.com/harvester/harvester/issues/2522 [BUG] Windows VMs crashing

https://github.com/harvester/harvester/issues/2448 [BUG] VM file system may be corrupted when Stopped via the UI

2448

https://github.com/harvester/harvester/issues/1432 [BUG] monitoring not loading - invalid checksum; corrupted block

1432

https://github.com/harvester/harvester/issues/2092 [BUG] Single-node Harvester rancher-monitoring-prometheus enter "CrashLoopBackOff" due to "reloadBlocks: corrupted block" #2092

w13915984028 commented 1 year ago

Recover VM asap when HOST is gone:

https://github.com/harvester/harvester/issues/3864 [[Question] When a harvester host unexpected shutdown or reboot, the VMs on the host will not failover to the other nodes on

https://docs.harvesterhci.io/dev/advanced/settings/#vm-force-reset-policy