hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.52k stars 4.6k forks source link

azurerm_linux_virtual_machine_scale_set rolling upgrade_mode not possible using azurerm_virtual_machine_scale_set_extension as HealthExtension #24946

Open mKamleiter opened 6 months ago

mKamleiter commented 6 months ago

Is there an existing issue for this?

Community Note

Terraform Version

1.7.3

AzureRM Provider Version

3.92.0

Affected Resource(s)/Data Source(s)

azurerm_linux_virtual_machine_scale_set, azurerm_virtual_machine_scale_set_extension

Terraform Configuration Files

resource "azurerm_linux_virtual_machine_scale_set" "linux" {
  name = "vmss-rolling-upgrade"
  location = "westeurope"
  resource_group_name = "testing"
  sku = "Standard_D2s_v5"
  admin_username = "test"
  admin_password = "test123"
  upgrade_mode = "Rolling"

  source_image_reference {
    publisher = "Canonical"
    offer     = "0001-com-ubuntu-server-jammy"
    sku       = "22_04-lts"
    version   = "latest"
  }
  automatic_os_upgrade_policy {
    disable_automatic_rollback = false
    enable_automatic_os_upgrade = true
  }
  rolling_upgrade_policy {
    max_batch_instance_percent = 20
    max_unhealthy_instance_percent = 20
    max_unhealthy_upgraded_instance_percent = 20
    pause_time_between_batches = 10 
  }

}
resource "azurerm_virtual_machine_scale_set_extension" "health_extension" {
  name                         = "HealthExtension"
  publisher                    = "Microsoft.ManagedServices"
  type                         = "ApplicationHealthLinux"
  type_handler_version         = "1.0"
  virtual_machine_scale_set_id = azurerm_linux_virtual_machine_scale_set.linux.id
  settings = "{\"protocol\": \"TCP\", \"port\": 22"
}

Debug Output/Panic Output

╷
│ Error: `health_probe_id` must be set or a health extension must be specified when `upgrade_mode` is set to "Rolling"
│ 
│   with module.vmss-test.azurerm_linux_virtual_machine_scale_set.linux,
│   on ../../main.tf line 24, in resource "azurerm_linux_virtual_machine_scale_set" "linux":
│   24: resource "azurerm_linux_virtual_machine_scale_set" "linux" {

Expected Behaviour

It should be possible to create a azurerm_linux_virtual_machine_scale_set using upgrade_mode = "Rolling" while attaching the HealthExtension through azurerm_virtual_machine_scale_set_extension.

Currently it's only possible to attach the HealthExtension using inline extension block. However scale sets can be used as Azure Devops Agents and Azure Devops attaches an additional external scale set extension, which gets deleted if inline extension is used.

Actual Behaviour

HealthExtension set using azurerm_virtual_machine_scale_set_extension should work aswell

Steps to Reproduce

terraform apply

Important Factoids

No response

References

No response

ms-zhenhua commented 6 months ago

Hi @mKamleiter, thank you for reaching out. It is required by Azure when creating an azurerm_linux_virtual_machine_scale_set using upgrade_mode = "Rolling" with an inline HealthExtension. If the configuration of health extension is not taken, the Azure service will return an error.

mKamleiter commented 6 months ago

Hi @ms-zhenhua, thanks for you answer. Already suspected that.

Any idea on how to integrate rolling scale sets with azure DevOps scale set agents? We need to pre-provision the virtual machine scale set and would like to change them to rolling for the azure update manager to work properly. Unfortunately once the Devops guys connect the scale set to an agent pool, Azure DevOps creates another scale set extension, which gets removed by terraform. Already tried to add a lifecycle hook to ignore the changes, but unfortunately can't ignore only one specific extension.

Thanks for the support

ms-zhenhua commented 6 months ago

Hi @mKamleiter, could you confirm why is the Azure DevOps extension removed by terraform? Is it possible to define a new extension block for the Azure DevOps extension to avoid the deletion?

mKamleiter commented 6 months ago

Hi @ms-zhenhua,

Our setup involves two core platform teams responsible for maintaining and providing application landing zones, as suggested by the Landing Zone Framework (https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/landing-zone/#platform-landing-zones-vs-application-landing-zones).

The first team is responsible for infrastructure provisioning, including subscriptions, core services such as virtual networks and the virtual machine scale set mentioned earlier, which will be used later with Azure DevOps.

Once the first team has completed their work, the second team, DevOps, takes over. Their responsibilities include provisioning an Azure DevOps project, multiple service connections, and agent pools within the project, while also ensuring proper permissions are set up. They then connect the pre-provisioned scale set to the corresponding agent pools. This process generates a new virtual machine scale set extension named 'Microsoft.Azure.Devops.Pipelines.Agent'. This extension provides details about the agent pool, including the registration token required for authentication.

If the platform team configures an inline extension block for the 'ApplicationHealthLinux' extension, Terraform will treat it as the only extension to be configured for the virtual machine scale set. It will delete the DevOps extension instead of ignoring it alongside its own extension, as the azurerm_virtual_machine_scale_set_extension resource does.

We attempted to pre-provision an empty extension for 'Microsoft.Azure.Devops.Pipelines.Agent', but unfortunately the VM entered a failed state due to incorrect configuration.

It would be good to have the ability to add 'ignore_changes' lifecycle hooks for specific extensions, but currently this is not possible.