test-kitchen / kitchen-azurerm

A driver for Test Kitchen that works with Azure Resource Manager
Apache License 2.0
48 stars 52 forks source link

InvalidTemplate error on Azure VM creation—"zone" parameter is invalid #232

Closed decoyjoe closed 2 years ago

decoyjoe commented 2 years ago

Version:

1.9.0

Environment:

Windows 10, Cinc Workstation 22.2.807

Driver config:

---
driver:
  name: azurerm
  subscription_id: '<id>'
  location: 'West US 2'
  machine_size: 'Standard_DS2_v2'

platforms:
- name: centos-7-7
  driver:
    image_urn: OpenLogic:CentOS:7.7:latest
    vm_name: cookbook-cent7

Scenario:

I've configured my Test Kitchen environment to use the azurerm driver as above and am attempting to create the VM with kitchen create default-centos-7-7

Expected Result:

Test Kitchen/azurerm creates the VM in Azure.

Actual Result:

The creation fails with the following error:

>>>>>> ------Exception-------
>>>>>> Class: Kitchen::ActionFailed
>>>>>> Message: 1 actions failed.
>>>>>>     Failed to complete #create action: [{
  "message": "MsRestAzure::AzureOperationError: InvalidTemplate: Deployment template validation failed: 'The template
  parameters 'zone' in the parameters file are not valid; they are not present in the original template and can
  therefore not be provided at deployment time. The only supported parameters for this template are 'location, vmSize,
  newStorageAccountName, adminUsername, adminPassword, dnsNameForPublicIP, secretUrl, vaultName, vaultResourceGroup,
  existingStorageAccountBlobContainer, imagePublisher, imageOffer, imageSku, imageVersion, osDiskNameSuffix, vmName,
  nicName, publicIPSKU, publicIPAddressType, storageAccountType, systemAssignedIdentity, userAssignedIdentities,
  bootDiagnosticsEnabled'. Please see https://aka.ms/arm-deploy/#parameter-file for usage details.'."

This issue is not present if I downgrade to version 1.8.0 of the gem. It appears the issue was introduced in PR Support vm availability zone #228.

jasonwbarnett commented 2 years ago

For what it's worth the implementation is missing some critical details. I think #228 was merged too soon. In Azure you can specify a zone or not, but it definitely shouldn't default to 1. It makes it harder for Azure to place the VM. This is especially true when certain SKUs are only available in certain zones.

The zone should default to nil and all of the added things should be excluded when nil.

jasonwbarnett commented 2 years ago

@tas50 as I started to dig into this I realized it's going to be a much bigger effort than I originally anticipated (to establish solid unit tests). The actual "fix" I think is done, but I don't have much confidence. I would recommend yanking #228 and then re-releasing if you think this is impactful enough.

jasonwbarnett commented 2 years ago

@tas50 funny enough on further inspection the way #228 was implemented the zone config param is completely disconnected from the template. In other words zone 1 was hard coded into the template.

tas50 commented 2 years ago

@jasonwbarnett might as well roll forward if the config isn't doing anything at this point.

jasonwbarnett commented 2 years ago

@jasonwbarnett might as well roll forward if the config isn't doing anything at this point.

@tas50 Well it's hard coding folks to zone 1. Not sure if you mean why not roll out #233 without tests or saying why not leave the zone feature as-is.

jasonwbarnett commented 2 years ago

This is causing significant issues for our testing pipelines because the feature is broken and everything is being pinned to zone 1 in whatever region you're using which makes it more difficult for azure to place and provision the VM.

>>>>>>     Failed to complete #create action: [#<Azure::Resources::Mgmt::V2020_06_01::Models::StatusMessage:0x00007fe03ce13858 @status="Failed", @error=#<Azure::Resources::Mgmt::V2020_06_01::Models::ErrorResponse:0x00007fe03ce13128 @code="ZonalAllocationFailed", @message="Allocation failed. We do not have sufficient capacity for the requested VM size in this zone. Read more about improving likelihood of allocation success at http://aka.ms/allocation-guidance">>] on default-windows-2019-core

I consider this a serious bug in two ways:

  1. The actual feature doesn't work as intended. The template is completely disconnected from the config.
  2. This reduces the probability of the VM creation process to succeed.

I'm moving forward with a straight up rollback of the feature and am looking for support from Progress Chef to help get this merged and released.