vmware-tanzu / vm-operator

Self-service manage your virtual infrastructure...
Other
106 stars 48 forks source link

✨ Reconcile VMs on async signal #733

Closed akutz closed 1 month ago

akutz commented 1 month ago

What does this PR do, and why is it needed?

This patch adds support for reconciling VMs when their state has changed on the underlying platform, ex. vSphere. There are three primary components:

Please note, the environment variable ASYNC_SIGNAL_DISABLED may be set to a truth-y string value, ex. "true", to completely disable the async signal logic, regardless of the feature state switch.

Watcher

The watcher is located in pkg/util/vsphere/watcher and watches one or more vSphere entities that can contain VMs, ex. a Folder, ClusterComputeResource, HostSystem, etc.. The watcher is initialized with a set of these entities and creates a ContainerView for each. These are added to the watcher's ListView, which enables entities to be added/removed later while the watcher is running. The watcher is signaled when a VM enters the view of the watcher or when a VM has a change to one of the on the following properties:

Not all changes to extraConfig cause the watcher to emit a result. The following extraConfig keys are ignored:

Additional keys may be ignored as well, but these are ignored by default in order to prevent an infinite loop:

When the watcher notices a VM enter its view or with a change, the watcher must get the namespace and name for the VM. This happens one of three ways:

  1. The change itself includes the namespace and name in the extraConfig
  2. The watcher looks up the namespace and name from the manager's cache, where the field status.uniqueID is now an indexed field.
  3. Finally, the watcher retrieves the property config.extraConfig["vmservice.namespacedName"] from the vSphere server.

If the namespace and name can be determined, the watcher checks if the VM already exists in Kubernetes with a status.uniqueID field and if the update type was the VM entering the view of the watcher. If these conditions are met, no result is emitted for this VM. This prevents double-reconciling VMs when the Controller-Manager starts up for the first time. During start-up, the Controller-Manager automatically reconciles all objects watched by controllers. Since all VMs would also be entering the view of watchers, this would cause a large-scale double-reconcile. Therefore, this logic skips emitting results on startup for VMs that are already deployed.

If the namespace and name are non-empty, the update types was Enter and the VM has an empty status.uniqueID field or the update type was Modify, the watcher emits a result on a channel watched by the next component, the service.

Service

The service is located in services/vm-watcher and is responsible for:

The service will always start a new instance of the watcher as long as the reason the previous instance failed was due to a login/auth error. This is to handle the case of credential rotation.

The service starts a watcher with an initial set of entities to watch that includes the ManagedObject ID for each Folder that can contain VM Service VMs. These folder IDs are gathered by listing all Zone resources on the cluster and collecting the value of spec.managedVMs.folderMoID.

The service monitors results from the watcher. Upon receiving a result, the service determines if the reported VM is valid, and if so, a reconcile request is enqueued.

Zone controller

The zone controller is located in controllers/infra/zone and reconciles topologyv1.Zone resources.

When a zone resource without a deletion timestamp is reconciled, the controller adds a finalizer to it and adds the zone's vm service folder to the list of the entities monitored by the watcher.

When a zone resource with a non-zero deletion timestamp is reconciled, the controller removes the zone's vm service folder from the list of the entities monitored by the watcher and removes the finalizer.

Which issue(s) is/are addressed by this PR? (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Fixes NA

Are there any special notes for your reviewer:

Please add a release note if necessary:

Reconcile VMs when their state changes on underlying platform
akutz commented 1 month ago

@dougm I am going to add a flake allowance to the tests that depend on vC Sim. See https://github.com/vmware-tanzu/vm-operator/actions/runs/11206817769/job/31148110424?pr=733#step:5:783 -- there's a race that pops up on occasion.

dougm commented 1 month ago

@dougm I am going to add a flake allowance to the tests that depend on vC Sim. See https://github.com/vmware-tanzu/vm-operator/actions/runs/11206817769/job/31148110424?pr=733#step:5:783 -- there's a race that pops up on occasion.

I've not reproduced myself yet, but looks like this should fix: https://github.com/vmware/govmomi/pull/3584

akutz commented 1 month ago

@dougm I am going to add a flake allowance to the tests that depend on vC Sim. See https://github.com/vmware-tanzu/vm-operator/actions/runs/11206817769/job/31148110424?pr=733#step:5:783 -- there's a race that pops up on occasion.

I've not reproduced myself yet, but looks like this should fix: vmware/govmomi#3584

Thanks @dougm , it did in fact fix it. I already rebased this PR after pulling your patch. Thanks again!

github-actions[bot] commented 1 month ago

Code Coverage

Package Line Rate Health
github.com/vmware-tanzu/vm-operator/controllers/contentlibrary/clustercontentlibraryitem 82%
github.com/vmware-tanzu/vm-operator/controllers/contentlibrary/contentlibraryitem 85%
github.com/vmware-tanzu/vm-operator/controllers/contentlibrary/utils 97%
github.com/vmware-tanzu/vm-operator/controllers/infra/capability 86%
github.com/vmware-tanzu/vm-operator/controllers/infra/configmap 71%
github.com/vmware-tanzu/vm-operator/controllers/infra/node 77%
github.com/vmware-tanzu/vm-operator/controllers/infra/secret 77%
github.com/vmware-tanzu/vm-operator/controllers/infra/validatingwebhookconfiguration 85%
github.com/vmware-tanzu/vm-operator/controllers/infra/zone 81%
github.com/vmware-tanzu/vm-operator/controllers/storageclass 94%
github.com/vmware-tanzu/vm-operator/controllers/storagepolicyquota 97%
github.com/vmware-tanzu/vm-operator/controllers/util/encoding 73%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachine/storagepolicyusage 99%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachine/virtualmachine 78%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachine/volume 87%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachineclass 75%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinepublishrequest 81%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinereplicaset 68%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachineservice 82%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachineservice/providers 92%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinesetresourcepolicy 80%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinewebconsolerequest/v1alpha1 72%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinewebconsolerequest/v1alpha1/conditions 88%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinewebconsolerequest/v1alpha1/patch 78%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinewebconsolerequest/v1alpha2 73%
github.com/vmware-tanzu/vm-operator/pkg/bitmask 100%
github.com/vmware-tanzu/vm-operator/pkg/builder 95%
github.com/vmware-tanzu/vm-operator/pkg/conditions 88%
github.com/vmware-tanzu/vm-operator/pkg/config 100%
github.com/vmware-tanzu/vm-operator/pkg/config/capabilities 100%
github.com/vmware-tanzu/vm-operator/pkg/config/env 100%
github.com/vmware-tanzu/vm-operator/pkg/context/generic 100%
github.com/vmware-tanzu/vm-operator/pkg/context/operation 100%
github.com/vmware-tanzu/vm-operator/pkg/patch 78%
github.com/vmware-tanzu/vm-operator/pkg/prober 91%
github.com/vmware-tanzu/vm-operator/pkg/prober/probe 90%
github.com/vmware-tanzu/vm-operator/pkg/prober/worker 77%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere 75%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/client 80%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/clustermodules 71%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/config 89%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/contentlibrary 74%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/credentials 100%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/network 80%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/placement 77%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/session 71%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/sysprep 100%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/vcenter 82%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/virtualmachine 83%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/vmlifecycle 67%
github.com/vmware-tanzu/vm-operator/pkg/record 78%
github.com/vmware-tanzu/vm-operator/pkg/topology 91%
github.com/vmware-tanzu/vm-operator/pkg/util 87%
github.com/vmware-tanzu/vm-operator/pkg/util/annotations 100%
github.com/vmware-tanzu/vm-operator/pkg/util/cloudinit 89%
github.com/vmware-tanzu/vm-operator/pkg/util/cloudinit/validate 91%
github.com/vmware-tanzu/vm-operator/pkg/util/image 100%
github.com/vmware-tanzu/vm-operator/pkg/util/kube 84%
github.com/vmware-tanzu/vm-operator/pkg/util/kube/cource 100%
github.com/vmware-tanzu/vm-operator/pkg/util/kube/internal 100%
github.com/vmware-tanzu/vm-operator/pkg/util/kube/spq 100%
github.com/vmware-tanzu/vm-operator/pkg/util/paused 100%
github.com/vmware-tanzu/vm-operator/pkg/util/ptr 100%
github.com/vmware-tanzu/vm-operator/pkg/util/resize 97%
github.com/vmware-tanzu/vm-operator/pkg/util/vmopv1 91%
github.com/vmware-tanzu/vm-operator/pkg/util/vsphere/client 64%
github.com/vmware-tanzu/vm-operator/pkg/util/vsphere/vm 79%
github.com/vmware-tanzu/vm-operator/pkg/util/vsphere/watcher 85%
github.com/vmware-tanzu/vm-operator/pkg/vmconfig 95%
github.com/vmware-tanzu/vm-operator/pkg/vmconfig/crypto 98%
github.com/vmware-tanzu/vm-operator/pkg/webconsolevalidation 100%
github.com/vmware-tanzu/vm-operator/services/vm-watcher 91%
github.com/vmware-tanzu/vm-operator/webhooks/common 100%
github.com/vmware-tanzu/vm-operator/webhooks/persistentvolumeclaim/validation 95%
github.com/vmware-tanzu/vm-operator/webhooks/unifiedstoragequota/validation 92%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachine/mutation 87%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachine/validation 95%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineclass/mutation 62%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineclass/validation 89%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinepublishrequest/validation 92%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinereplicaset/validation 90%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineservice/mutation 67%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineservice/validation 92%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinesetresourcepolicy/validation 89%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinewebconsolerequest/v1alpha1/validation 92%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinewebconsolerequest/v1alpha2/validation 92%
Summary 83% (10216 / 12300)

Minimum allowed line rate is 79%