We found a bug recently in one of our clients. They have the following setup:
a claims repo, using firestartr
a Virtual Machine Scale Set in Azure, which uses a custom extension script to setup and start its VMs. Both are managed by Terraform
The error happened when following these steps:
Create a PR where the custom extension script of a VMSS is updated, so it looks for and downloads a non-existing image
Commit that change, apply it with Terraform and upload it to Azure. The VMSS should fail to start
Create another PR where the previous error is fixed
Commit that change, apply it with Terraform and upload it to Azure. The VMSS should still fail to start, with the same error as step 2
To fix this, you must manually upgrade each VM in the VMSS
It seems like Terraform/Azure only upgrades VMs when no error is present in its custom extension scripts (at least if that script is used to start the VM). We need to prove this is the case, investigate why it happens and how to fix it
Acceptance criteria
[x] The cause of the problem is identified and can be reproduced
Motivation
We found a bug recently in one of our clients. They have the following setup:
The error happened when following these steps:
It seems like Terraform/Azure only upgrades VMs when no error is present in its custom extension scripts (at least if that script is used to start the VM). We need to prove this is the case, investigate why it happens and how to fix it
Acceptance criteria