Closed rohityadavcloud closed 1 year ago
Thanks @rhtyd, I'll try to add this check
@rhtyd can you give an example of the failures you're getting. I see the current steps are waiting the system vms to come 'Running' and then wait the agent status to be 'Up' https://github.com/shapeblue/Trillian/blob/master/Ansible/roles/cloudstack-manager/tasks/wait-for-environment.yml
This is one of those cases where agent is not in Up state. I've hit this couple of times, let me ping a case the next time it happens @Bobby
but, if you see the check in the link, it actually does a 200 retries if the agent is Up on each system vm. I think this check is already there.. Or.. maybe the agent is Up and goes away, which fails the template download, I think I've seen this as well..
Several env deployment failures occur (esp in case of KVM) when SSVM agent is not in UP state and fails to setup the default built-in template. Without this, the zone deployment fails. We can have a workaround to poll and ensure that SSVM agent state is UP (after the VM state is Running) and have a timeout/retries where in case of failure we try to destroy the SSVM and perhaps a new SSVM can come up cleanly.
/cc @PaulAngus @DagSonsteboSB