vmware / terraform-provider-nsxt

Terraform VMware NSX-T provider
https://www.terraform.io/docs/providers/nsxt/
Other
128 stars 83 forks source link

data nsxt_policy_vm fails to find VM in large (1k+) VM environment #1193

Closed paulzerkel closed 4 months ago

paulzerkel commented 4 months ago

Describe the bug

A data "nsxt_policy_vm" block will intermittently fail to find VMs in large (1k+) VM environment. Despite the VM existing it returns with:

Error: Error while reading Virtual Machine <GUID>: Could not find Virtual Machine with ID: <GUID>

The error happens intermittently across GUIDs.

Reproduction steps

  1. Have more than 1k VMs
  2. Loop over a terraform plan for a document with a data "nsxt_policy_vm"
  3. Eventually it will fail with Could not find Virtual Machine with ID: <GUID>

Expected behavior

Existing VM with a given ID is found.

Additional context

Tested with NSX-T 3.2.3.

The root cause of this appears to be an inconsistency in how VMs are returned from /policy/api/v1/infra/realized-state/virtual-machines endpoint when the order is not specified with sort_order. When sort_order is not set the results come back in an inconsistent order between calls which breaks the pagination. This can be tested in a large environment with a (roughly) static set of VMs:

curl -u "$NSXT_USERNAME:$NSXT_PASSWORD" "https://server/policy/api/v1/infra/realized-state/virtual-machines?sort_ascending=false&enforcement_point_path=%2Finfra%2Fsites%2Fdefault%2Fenforcement-points%2Fdefault&include_mark_for_delete_objects=false" > a.json
sleep 5
curl -u "$NSXT_USERNAME:$NSXT_PASSWORD" "https://server/policy/api/v1/infra/realized-state/virtual-machines?sort_ascending=false&enforcement_point_path=%2Finfra%2Fsites%2Fdefault%2Fenforcement-points%2Fdefault&include_mark_for_delete_objects=false" > b.json
diff a.json b.json | less

a.json and b.json should have the same VMs in the same order, but without sort_order that is not always the case. Please note that it can take quite a few calls to trigger the inconsistency. As the result of this, the slice of VMs that comes back from listAllPolicyVirtualMachines will contain duplicated VMs and will also be missing VMs. Setting the sort order resolves this problem and I will follow up with a PR to add that.

annakhm commented 4 months ago

Thanks @paulzerkel for the troubleshooting!