hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.62k stars 4.65k forks source link

azurerm_virtual_machine_run_command timeout bug #27428

Open vangork opened 2 months ago

vangork commented 2 months ago

Is there an existing issue for this?

Community Note

Terraform Version

1.7.5

AzureRM Provider Version

3.116.0

Affected Resource(s)/Data Source(s)

azurerm_virtual_machine_run_command

Terraform Configuration Files

resource "azurerm_linux_virtual_machine" "installer" {
  custom_data = filebase64("${path.module}/any_job_longer_than_90mins.sh")
}

resource "azurerm_virtual_machine_run_command" "wait_pfmp_installation_status" {
  location           = var.location
  name               = "wait_pfmp_installation_status"
  virtual_machine_id = azurerm_linux_virtual_machine.installer.id
  source {
    script = "cloud-init status --wait"
  }
  timeouts {
    create = "180m"
  }
}

Debug Output/Panic Output

│ Error: running the command: polling failed: the Azure API returned the following error:
│
│ Status: "VMExtensionProvisioningTimeout"
│ Code: ""
│ Message: "Provisioning of VM extension wait_pfmp_installation_status has timed out. Extension provisioning has taken too long to complete. The extension last reported \"Plugin enabled\".\r\n\r\nMore information on troubleshooting is available at https://aka.ms/RunCommandManagedLinux"
│ Activity Id: ""
│
│ ---
│
│ API Response:
│
│ ----[start]----
│ {
│   "startTime": "2024-09-18T14:29:48.690995+00:00",
│   "endTime": "2024-09-18T15:59:58.6403596+00:00",
│   "status": "Failed",
│   "error": {
│     "code": "VMExtensionProvisioningTimeout",
│     "message": "Provisioning of VM extension wait_pfmp_installation_status has timed out. Extension provisioning has taken too long to complete. The extension last reported \"Plugin enabled\".\r\n\r\nMore information on troubleshooting is available at https://aka.ms/RunCommandManagedLinux"
│   },
│   "name": "4d63493f-b3c9-403d-a969-9831042ba6d2"
│ }
│ -----[end]-----
│
│
│   with module.installer_node.azurerm_virtual_machine_run_command.wait_pfmp_installation_status,
│   on ..\..\modules\installer\main.tf line 79, in resource "azurerm_virtual_machine_run_command" "wait_pfmp_installation_status":
│   79: resource "azurerm_virtual_machine_run_command" "wait_pfmp_installation_status" {
│
│ running the command: polling failed: the Azure API returned the following error:
│
│ Status: "VMExtensionProvisioningTimeout"
│ Code: ""
│ Message: "Provisioning of VM extension wait_pfmp_installation_status has timed out. Extension provisioning has taken too long to complete. The extension last
│ reported \"Plugin enabled\".\r\n\r\nMore information on troubleshooting is available at https://aka.ms/RunCommandManagedLinux"
│ Activity Id: ""
│
│ ---
│
│ API Response:
│
│ ----[start]----
│ {
│   "startTime": "2024-09-18T14:29:48.690995+00:00",
│   "endTime": "2024-09-18T15:59:58.6403596+00:00",
│   "status": "Failed",
│   "error": {
│     "code": "VMExtensionProvisioningTimeout",
│     "message": "Provisioning of VM extension wait_pfmp_installation_status has timed out. Extension provisioning has taken too long to complete. The extension last reported \"Plugin enabled\".\r\n\r\nMore information on troubleshooting is available at https://aka.ms/RunCommandManagedLinux"
│   },
│   "name": "4d63493f-b3c9-403d-a969-9831042ba6d2"
│ }
│ -----[end]-----

Expected Behaviour

My vm has a "pfmp installation" job which would last around 2 - 2.5 hours being set in the custom_data. I wanna leverage managed run command of "cloud-init status --wait" to check if the job is done and move forward. As per https://learn.microsoft.com/en-us/azure/virtual-machines/linux/run-command-managed, managed run command should support for long running (hours/days) scripts.

Actual Behaviour

But the managed run command would timeout after 90 minutes even the create timeout value has been set to 180m.

Steps to Reproduce

No response

Important Factoids

No response

References

No response

Chambras commented 2 months ago

@vangork interesting have you tried with version 4.2.0?

vangork commented 2 months ago

@Chambras I've checked the code, there is no change between 4.2.0 and 3.116.0 for azurerm_virtual_machine_run_command. I am guessing that extensions_time_budget with defalt value 90 mins of azurerm_linux_virtual_machine cause the limitation, but this value can only be set to [15mins, 120mins].

Shall we allow a wider range for extensions_time_budget?

teowa commented 2 months ago

Hi @vangork , there is a blog might help, for now the azurerm_virtual_machine_run_command resource only supports synchronous mode, by review comment. As for azurerm_linux_virtual_machine.extensions_time_budget this seems a API limitation for the value must be '15' and '120' minutes, sending 180 will get error message

performing CreateOrUpdate: unexpected status 400 (400 Bad Request) with error: InvalidParameter: The value 180 of parameter 'extensionsTimeBudget' is out of range. The value must be between '15' and '120', inclusive.
vangork commented 2 months ago

@teowa Thanks for the info. Even if azurerm_virtual_machine_run_command support async mode later, it doesn't solve my problem. The custom-data which uses cloud-init agent of azurerm_linux_virtual_machine already can help with async long running job creation post vm creation. I just need a way to track the successful execution of the job, get the output for downstream resources and move forward. Currently I only found custom-data, user-data, CustomScript and RunCommand to excute a command inside of the vm, but unfortunately none of them supports a timeout of 3 hours. Do you see other ways if I can do that?