harvester / harvester

Open source hyperconverged infrastructure (HCI) software
https://harvesterhci.io/
Apache License 2.0
3.88k stars 327 forks source link

[BUG] Error upgrading Harvester Cluster #4554

Open AdamJSoftware opened 1 year ago

AdamJSoftware commented 1 year ago

Describe the bug When trying to upgrade my cluster. I get the following error whenever it tries to create a New VM (this error occurs after the VM was created and is running)

Failed creating server [fleet-default/apps-pool1-0c2d5923-4w8z8] of kind (HarvesterMachine) for machine apps-pool1-7b8ffb45c-5m2gh in infrastructure provider: CreateError: variables to set in the engine --engine-insecure-registry [--engine-insecure-registry option --engine-insecure-registry option] Specify insecure registries to allow with the created engine --engine-install-url "https://get.docker.com" Custom URL to use for engine installation [$MACHINE_DOCKER_INSTALL_URL] --engine-label [--engine-label option --engine-label option] Specify labels for the created engine --engine-opt [--engine-opt option --engine-opt option] Specify arbitrary flags to include with the created engine in the form flag=value --engine-registry-mirror [--engine-registry-mirror option --engine-registry-mirror option] Specify registry mirrors to use [$ENGINE_REGISTRY_MIRROR] --engine-storage-driver Specify a storage driver to use with the engine --harvester-cloud-config just keep it empty, this value will be filled by rancher-machine [$HARVESTER_CLOUD_CONFIG] --harvester-cluster-id harvester cluster id [$HARVESTER_CLUSTER_ID] --harvester-cluster-type harvester cluster type [$HARVESTER_CLUSTER_TYPE] --harvester-cpu-count "2" number of CPUs for machine [$HARVESTER_CPU_COUNT] --harvester-disk-bus bus of disk for machine [$HARVESTER_DISK_BUS] --harvester-disk-info harvester disk info [$HARVESTER_DISK_INFO] --harvester-disk-size "0" size of disk for machine (in GiB) [$HARVESTER_DISK_SIZE] --harvester-image-name harvester image name [$HARVESTER_IMAGE_NAME] --harvester-key-pair-name harvester key pair name [$HARVESTER_KEY_PAIR_NAME] --harvester-kubeconfig-content contents of kubeconfig file for harvester cluster, base64 is supported [$HARVESTER_KUBECONFIG_CONTENT] --harvester-memory-size "4" size of memory for machine (in GiB) [$HARVESTER_MEMORY_SIZE] --harvester-network-data networkData content of cloud-init for machine, base64 is supported [$HARVESTER_NETWORK_DATA] --harvester-network-info harvester network info [$HARVESTER_NETWORK_INFO] --harvester-network-model harvester network model [$HARVESTER_NETWORK_MODEL] --harvester-network-name harvester network name [$HARVESTER_NETWORK_NAME] --harvester-network-type harvester network type [$HARVESTER_NETWORK_TYPE] --harvester-ssh-password SSH password [$HARVESTER_SSH_PASSWORD] --harvester-ssh-port "22" SSH port [$HARVESTER_SSH_PORT] --harvester-ssh-private-key-path SSH private key path [$HARVESTER_SSH_PRIVATE_KEY_PATH] --harvester-ssh-user "root" SSH username [$HARVESTER_SSH_USER] --harvester-user-data userData content of cloud-init for machine, base64 is supported [$HARVESTER_USER_DATA] --harvester-vm-affinity harvester vm affinity, base64 is supported [$HARVESTER_VM_AFFINITY] --harvester-vm-namespace "default" harvester vm namespace [$HARVESTER_VM_NAMESPACE] --hostname-override Specify hostname to use during cloud-init instead of default generated hostname --swarm Configure Machine to join a Swarm cluster --swarm-addr addr to advertise for Swarm (default: detect and use the machine IP) --swarm-discovery Discovery service to use with Swarm --swarm-experimental Enable Swarm experimental features --swarm-host "tcp://0.0.0.0:3376" ip/socket to listen on for Swarm master --swarm-image "swarm:latest" Specify Docker image to use for Swarm [$MACHINE_SWARM_IMAGE] --swarm-join-opt [--swarm-join-opt option --swarm-join-opt option] Define arbitrary flags for Swarm join --swarm-master Configure Machine to be a Swarm master --swarm-opt [--swarm-opt option --swarm-opt option] Define arbitrary flags for Swarm master --swarm-strategy "spread" Define a default scheduling strategy for Swarm --tls-san [--tls-san option --tls-san option] Support extra SANs for TLS certs flag provided but not defined: -harvester-provider-id 

Environment

Vicente-Cheng commented 1 year ago

Hi @AdamJSoftware, Could you generate the support bundle for investigation? Thanks!

w13915984028 commented 1 year ago

@AdamJSoftware From the attached error, we need to know:

(1) Is this update related to the Harvester HOST cluster, namely, the Harvester cluster itself, upgrading to v1.2.0? (2) Is it the Rancher downstream REK2 guest cluster upgrading? (3) How many nodes are in your Harvester cluster.

There are few clues at the moment to trouble-shooting. thanks.

AdamJSoftware commented 1 year ago

Sorry for the late reply

1) No it is not update related 2) No it as not. I was simply adding and removing nodes 3) I have 3 machines in my harvester cluster.

This happened when I was adding and removing nodes quickly. I think rancher got confused with this and was no longer able to provision new nodes

AdamJSoftware commented 8 months ago

Any news on this or how to fix this? It's happening again. Before I had to delete the previous cluster.

AdamJSoftware commented 8 months ago

Today's support bundle if still relevant. https://drive.google.com/file/d/10wSX0UBRXyJ16ZDLAaADAT2eaqeLjo9B/view?usp=sharing