VNet-injected azurerm_databricks_workspace creation fails with: SubnetIsNotWithinVnetError Subnet CIDR range '<null>' is not within the Virtual Network CIDR range

dehouwerd commented 3 months ago

Is there an existing issue for this?

[X] I have searched the existing issues

Community Note

Please vote on this issue by adding a :thumbsup: reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave comments along the lines of "+1", "me too" or "any updates", they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment and review the contribution guide to help.

Terraform Version

1.6.2

AzureRM Provider Version

3.115.0

Affected Resource(s)/Data Source(s)

azurerm_databricks_workspace

Terraform Configuration Files

// ----------------------------------------
// Main file
// ----------------------------------------

terraform {
  required_providers {
    azurerm = {
      source    = "hashicorp/azurerm"
      version   = ">=3.115.0"
    }
  }
}

provider "azurerm" {
  tenant_id = "xxxx"
  subscription_id = "xxxx" 
  skip_provider_registration = true

  features {}
}

module "dbx" {
  source = "./Modules/databricks_workspace"
  location = "westEurope"
  dbx_worskpace_name = "testdbxworkspace"
  dbx_workspace_managed_rg_name = "testdbxworkspacemrg"
  resource_group = "xxxxx"

  dbx_custom_parameters = {
    private_subnet_nsg_id = "/subscriptions/xxxx/resourceGroups/xxxx/providers/Microsoft.Network/networkSecurityGroups/nsgpriv"
    public_subnet_nsg_id = "/subscriptions/xxxx/resourceGroups/xxxx/providers/Microsoft.Network/networkSecurityGroups/nsgpub"
    virtual_network_id = "/subscriptions/xxxx/resourceGroups/xxxx/providers/Microsoft.Network/virtualNetworks/mydbx-vnet"
    private_subnet_name = "privdbx"
    public_subnet_name = "pubdbx"
    no_public_ip = true
  }
}

// ----------------------------------------
// Module/databricks_workspace/main.tf
// ----------------------------------------
resource "azurerm_databricks_workspace" "dbx_worskpace" {
  name                        = var.dbx_worskpace_name
  location                    = var.location
  sku                         = var.dbx_workspace_sku

  resource_group_name         = var.resource_group
  managed_resource_group_name = var.dbx_workspace_managed_rg_name

  custom_parameters {
    no_public_ip                                         = var.dbx_custom_parameters.no_public_ip // var.dbx_custom_parameters.no_public_ip
    virtual_network_id                                   = var.dbx_custom_parameters.virtual_network_id
    private_subnet_name                                  = var.dbx_custom_parameters.private_subnet_name
    public_subnet_name                                   = var.dbx_custom_parameters.public_subnet_name
    private_subnet_network_security_group_association_id = var.dbx_custom_parameters.private_subnet_nsg_id
    public_subnet_network_security_group_association_id  = var.dbx_custom_parameters.public_subnet_nsg_id

  }

  network_security_group_rules_required = var.databricks_nsg_rules
  public_network_access_enabled         = var.public_network_access

  tags = var.tags
}

// ----------------------------------------
// Module/databricks_workspace/variables.tf
// ----------------------------------------

variable "dbx_custom_parameters" {
  type = object({
    public_subnet_name             = string
    private_subnet_name            = string
    virtual_network_id             = string
    private_subnet_nsg_id          = string
    public_subnet_nsg_id           = string
    no_public_ip                   = string
  })
}

variable "databricks_nsg_rules" {
  type        = string
  validation {
    condition     = can(regex("AllRules|NoAzureDatabricksRules",var.databricks_nsg_rules))
    error_message = "Allowed values are 'AllRules' or 'NoAzureDatabricksRules'"
  }
  default     = "AllRules"
}

variable "dbx_workspace_managed_rg_name" {
  type        = string
}

variable "dbx_worskpace_name" {
  type        = string
}

variable "dbx_workspace_sku" {
  type        = string
  validation {
    condition     = can(regex("premium|standard",var.dbx_workspace_sku))
    error_message = "The allowed values of the SKU are 'premium' or 'standard'"
  }
  default = "premium"
}

variable "location" {
  type        = string
}

variable "public_network_access" {
  type        = bool
  default     = false
}

variable "resource_group" {
  type        = string
}

variable "tags" {
  type        = map(string)
  default     = null
}

Debug Output/Panic Output

https://gist.github.com/dehouwerd/5752e709f8a38c93539b894662b61020

Expected Behaviour

Succesfull resource deployment

Actual Behaviour

Terraform fails to deploy the resource. The subnet in the error message has an extra 's' appended at the end of the subnet. The resouce in the Azure portal (failed state) has the correct Custom public subnet name JSON value.

╷
│ Error: creating/updating Workspace (Subscription: "xxxx"
│ Resource Group Name: "xxxx"
│ Workspace Name: "xxxx"): polling after CreateOrUpdate: polling failed: the Azure API returned the following error:
│
│ Status: "SubnetIsNotWithinVnetError"
│ Code: ""
│ Message: "The subnet privdbxs CIDR range '<null>' is not within the Virtual Network CIDR range '192.168.2.0/24'"
│ Activity Id: ""
│
│ ---
│
│ API Response:
│
│ ----[start]----
│ {
│   "status": "Failed",
│   "error": {
│     "code": "SubnetIsNotWithinVnetError",
│     "message": "The subnet privdbxs CIDR range '<null>' is not within the Virtual Network CIDR range '192.168.2.0/24'"
│   }
│ }
│ -----[end]-----
│
│
│   with module.dbx.azurerm_databricks_workspace.dbx_worskpace,
│   on .\Modules\databricks_workspace\main.tf line 2, in resource "azurerm_databricks_workspace" "dbx_worskpace":
│    2: resource "azurerm_databricks_workspace" "dbx_worskpace" {
│

Steps to Reproduce

terraform init terraform apply --auto-approve

Important Factoids

No response

References

No response

gerrytan commented 3 months ago

Hi @dehouwerd I am unable to reproduce your problem. I can deploy the workspace fine with the configuration you posted above, with some modification related to virtual network / subnet config (refer to Modules/databricks_workspace/main.tf).

I recommend checking your virtual network / subnet configuration as recommended by the error message "The subnet privdbxs CIDR range '<null>' is not within the Virtual Network CIDR range '192.168.2.0/24', probably it is referencing to the wrong subnet / misconfigured.

Example of a valid subnet config that is "within virtual network CIDR range":

Virtual network CIDR range: 10.179.0.0/16
Subnet CIDR range: 10.179.1.0/24

Note that you can also use terraform data sources to refer to existing virtual network, subnets, etc. to reduce brittleness of referring things by its string id.

dehouwerd commented 3 months ago

Hi @gerrytan, the provided value is correct. The apply command fails when a non-existing subnet name is provided. An API call looks to be made but the value mentioned in the response has an s appended to the output.

API Response:
│
│ ----[start]----
│ {
│   "status": "Failed",
│   "error": {
│     "code": "SubnetIsNotWithinVnetError",
│     "message": "The subnet privdbxs CIDR range '<null>' is not within the Virtual Network CIDR range '192.168.2.0/24'"
│   }
│ }

gerrytan commented 3 months ago

@dehouwerd there's definitely no logic in the code that appends an s suffix at the end of subnet. My suspicion are either a stray terraform variable override, or the NSG association id you set references a different subnet with an s suffix.

Can you please double check the list of subnets currently available in the virtual network?

If still no luck can you please try again with a different subnet (with different name and CIDR range) ?

gerrytan commented 3 months ago

@dehouwerd can you please also refer to this doc: https://learn.microsoft.com/en-us/azure/databricks/security/network/classic/vnet-inject

The subnet <subnet-id> is already in use by workspace <workspace-id> Possible cause: you are creating a workspace in a VNet with host and container subnets that are already being used by an existing Azure Databricks workspace. You cannot share multiple workspaces across a single subnet. You must have a new pair of host and container subnets for each workspace you deploy.

You did not encounter that exact error, but isolating your problem by deploying your workspace against a new pair of subnet will give us more clue towards the root cause.

Anton-Kalashnik88 commented 2 months ago

Hi everyone, facing the same issue BUT using bicep template deployment. So the piece of template is the following:

resource workspace 'Microsoft.Databricks/workspaces@2023-02-01' = {
  name: workspaceName
  location: location
  tags: tags
  sku: {
    name: pricingTier
  }
  properties: {
    managedResourceGroupId: managedResourceGroupId
    parameters: {
      customVirtualNetworkId: {
        value: virtualNetworkId
      }
      customPublicSubnetName: {
        value: publicSubnetName
      }
      customPrivateSubnetName: {
        value: privateSubnetName
      }

The parameters for the subnet names in .parameters.json are the following:

"privateSubnetName": {
            "value": "...-dbr-snet01"
        },
        "publicSubnetName": {
            "value": "...-dbr-snet02"
        },

During today's deployment I am catching the following error:

Status Message: The subnet ...-dbr-snet01s CIDR range '<null>' is not within the Virtual Network CIDR range
'10.141.0.0/20,10.141.16.0/20' (Code:SubnetIsNotWithinVnetError)

The template is quite simple, and there are no code where the 's' suffix at the end of the subnet name could have appended.

Both subnet with the correct CIDR are exist in the Virtual Network.

gerrytan commented 1 month ago

Hi @Anton-Kalashnik88 , can you please post the virtual network configuration with az network vnet show -n [virtual-network-name] -g [resource-group-name]. Please redact any sensitive information.

Alternatively you can also raise a support request through the portal to have us investigate a specific Databricks workspace deployment.

jarteagaf commented 1 month ago

This is not a terraform or databricks issue. It is a Microsoft Azure Bug. This happens with vnet injection. When creating a subnet from the portal, it creates as "addressPrefixes" instead of "addressPrefix", the difference is in the plural. This causes Terraform or ARM deployment to fail because it looks for the singular, not the plural. Since it does not find it, it creates the error. I raised the issue with Microsoft. In the meantime, a temporal solution is to create the subnet using AZ CLI, this will create the subnets with the correct parameter name, and you'll be able to deploy.

hashicorp / terraform-provider-azurerm