Open ddelnano opened 1 year ago
@4censord please review this to verify it will work for your use cause. The only functional difference should be that this functionality must be opted into with a new xenorchestra_vm
resource argument -- use_graceful_termination
.
After discovering #220, setting this to the default has the potential to significantly slow down the tests and the general case. From running the test suite a few times against #212, the default 2 min timeout for validating PV drivers are present causes more flakiness in the build and longer test times.
I think it should be expected that if terraform is to destroy resources that it may not be graceful. Therefore, I think making this opt in is the best solution.
This would work for me.
I would still argue that it should be the default behavior.
I think it should be expected that if terraform is to destroy resources that it may not be graceful. Therefore, I think making this opt in is the best solution.
Most other providers seem to work this way.
But i understand that it is not practical to do so right now
When attempting to shut down a vm without the management agent installed, this now times out. But it does not clearly log the problem:
xenorchestra_vm.vm: Still destroying... [id=361d045e-c9cf-661d-4863-1d9f5d38681e, 2m0s elapsed]
╷
│ Error: failed to gracefully halt the vm with id: 361d045e-c9cf-661d-4863-1d9f5d38681e and error: timeout while waiting for state to become 'true' (last state: 'false', timeout: 2m0s)
│
│
╵
Changing the use_graceful_termination
attribute takes almost 30s:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# xenorchestra_vm.vm will be updated in-place
~ resource "xenorchestra_vm" "vm" {
id = "361d045e-c9cf-661d-4863-1d9f5d38681e"
tags = [
"dev",
]
~ use_graceful_termination = true -> false
# (20 unchanged attributes hidden)
# (3 unchanged blocks hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
xenorchestra_vm.vm: Modifying... [id=361d045e-c9cf-661d-4863-1d9f5d38681e]
xenorchestra_vm.vm: Still modifying... [id=361d045e-c9cf-661d-4863-1d9f5d38681e, 10s elapsed]
xenorchestra_vm.vm: Still modifying... [id=361d045e-c9cf-661d-4863-1d9f5d38681e, 20s elapsed]
xenorchestra_vm.vm: Modifications complete after 26s [id=361d045e-c9cf-661d-4863-1d9f5d38681e]
Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
Most other providers seem to work this way.
It seems vsphere does this, but uses a 3 min timeout by default. What other providers have you seen?
When attempting to shut down a vm without the management agent installed, this now times out.
Changing the use_graceful_termination attribute takes almost 30s
Modifying Vms unfortunately relies on a 25 second sleep (source). That's another area where there is a race condition and not being able to detect the value has persisted causes problems for the test suite.
I don't think I can fix that in this PR, but I can look into that early next year.
It seems vsphere does this, but uses a 3 min timeout by default. What other providers have you seen?
Does or attempts graceful termination:
Does not do graceful termination:
Changing the
use_graceful_termination
attribute takes almost 30sModifying Vms unfortunately relies on a 25 second sleep (source).
Oh, i see.
I think I'd like to take another pass at this and implement things closer to how hyperv works:
Having forceful shutdown should prevent the test suite flakiness that I experienced when setting the graceful shutdown as the default. So hopefully that should be more in line what you initially were advocating for.
This is a continuation of #212.
I discovered that #220 needed to be addressed to avoid a race condition that existed when the XO client inside the provide tried to shutdown Vms. In addition to this, I decided that this was better to provide as an opt in rather than the default behavior.
Testing
[x]
make testacc
passes[x] New tests pass
[x] New tests fail with previous
xenorchestra_vm
delete behaviorgracefulTermination := d.Get("use_graceful_termination").(bool)
// gracefulTermination := d.Get("use_graceful_termination").(bool) vmId := d.Id()
if gracefulTermination {
vm, err := c.GetVm(client.Vm{Id: vmId})
if err != nil {
return err
}
if vm.PowerState == "Running" {
err = c.HaltVm(vmId)
if err != nil {
return err
}
}
}
// if gracefulTermination {
// vm, err := c.GetVm(client.Vm{Id: vmId})
// if err != nil {
// return err
// }
// if vm.PowerState == "Running" {
// err = c.HaltVm(vmId)
// if err != nil {
// return err
// }
// }
// }
$ TEST=TestAccXenorchestraVm_gracefulTermination make testacc
=== RUN TestAccXenorchestraVm_gracefulTermination === PAUSE TestAccXenorchestraVm_gracefulTermination === RUN TestAccXenorchestraVm_gracefulTerminationForShutdownVm === PAUSE TestAccXenorchestraVm_gracefulTerminationForShutdownVm === CONT TestAccXenorchestraVm_gracefulTermination === CONT TestAccXenorchestraVm_gracefulTerminationForShutdownVm === CONT TestAccXenorchestraVm_gracefulTermination resource_xenorchestra_vm_test.go:966: Step 2/2 error: Error running destroy: exit status 1
--- FAIL: TestAccXenorchestraVm_gracefulTermination (42.79s) --- PASS: TestAccXenorchestraVm_gracefulTerminationForShutdownVm (109.80s) FAIL