hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.52k stars 4.6k forks source link

Proposal: Declaration of Private Endpoints as part of a resource #23724

Open istairbn opened 10 months ago

istairbn commented 10 months ago

Is there an existing issue for this?

Community Note

Description

Problem Statement:

  1. Azure is composed of two planes: Management Plane & the Data plane. Whilst the Management plane is generally accessible, in a secure network the Data plane should not be accessible outside the network.
  2. Certain AzureRM resources (key_vault and storage_account) now require both kinds of access to successfully complete.
  3. To connect these resources to a network, a private_endpoint resource would normally be used. However, it is impossible using our current setup to consistently create a "private endpoint only" setup: the key_vault or storage_account cannot be "completed" in Terraform until the data plane is reachable and invariably the endpoint reference is an output of that resource. Even if you try hardcoding what the account endpoint will be, generally Terraform fails since an endpoint cannot be created until the strorage account exists.

This problem is not an "Azure" issue - becaus it is possible today to create a resource using only the Management plane client. The issue is that Terraform is treating Resources and Endpoints as two separate things. Whilst this is strictly true from an API perspective, it makes much more sense to treat a Private Endpoint as an extension of a resource: since it cannot exist without it.

This becomes a real problem in environments heavily using Azure policy. If there is a Deny on "creating Storage/Vaults with public firewall exposed" it is impossible to work around this issue.

Proposal: Private_Endpoint Blocks

I would propose that any resource using a Data Plane Client should include "private_endpoint" blocks: allowing the user to declare Private Endpoints as part of the Resource configuration. From an implementation perspective, this would entail:

  1. Using the Management plane client to create the resource.
  2. For every "private_endpoint" block, creating and validating the private endpoint has been created.
  3. Continue using the data plane client to complete data plane operations.

Advantages of private_endpoint blocks

  1. Means that we can actually build Key Vaults/Storage Accounts etc. in a secured environment.
  2. Makes the actual reading of the Terraform far more efficient, since the resource and it's network are in one place.
  3. Type safety. Where the endpoint has a single Private Endpoint type (i.e. Vault) we can infer the entry and set it as default. Where an endpoint supports multiple (Storage has Blob/DFS/Files) then we can validate the inputs.

It would make sense to make this block easily extendable to every resource eventually, because every time a resource wants to hit the Data Plane layer it will be broken for those of us who are mandated to use Private Endpoint connectivity only.

Disadvantages of private_endpoint blocks

The only thing I can think of is that it's moving slightly away from the purist view of a tf resource being as granular as possible. However, it stems from the fact that Private Endpoints are at the moment treated as optional (both by TF and Azure) whereas in some environments they are a core dependency.

Alternative Suggestions (i.e. what else hasn't or won't work)

  1. Two separate resources (one with Data plane and one without) - poor quality experience, maintenance hell
  2. azapi provider call to close off the data plane after creation of endpoint: Can't fight Azure deny policies, very ugly experience
  3. Do nothing - our current solution, which means either ARM templates or AZAPI calls to build two of our most common resources

New or Affected Resource(s)/Data Source(s)

azurerm_key_vault, azurerm_storage_account

Potential Terraform Configuration

My proposal would be to use blocsk - that way for Storage we can do a for_each over each of the endpoints we want attached. 

resource "azurerm_key_vault" "example" {
  name                        = "examplekeyvault"
  location                    = azurerm_resource_group.example.location
  resource_group_name         = azurerm_resource_group.example.name
  enabled_for_disk_encryption = true
  tenant_id                   = data.azurerm_client_config.current.tenant_id

  private_endpoint {
    name                = "example-endpoint"
    location            = data.azurerm_resource_group.example.location
    resource_group_name = data.azurerm_resource_group.example.name
    subnet_id           = data.azurerm_subnet.subnet.id

    private_service_connection {
      name                           = "example-privateserviceconnection"
      #private_connection_resource_id = azurerm_storage_account.example.id #THIS CAN BE INFERRED
      subresource_names              = ["blob"]  # THIS CAN BE EITHER INFERRED OR VALIDATED
      is_manual_connection           = false     # THIS CAN DEFAULT TO FALSE 
     }

    private_dns_zone_group {
      name                 = "example-dns-zone-group"
      private_dns_zone_ids = [azurerm_private_dns_zone.example.id]
    }
  }
}


### References

Related issues
https://github.com/hashicorp/terraform-provider-azurerm/issues/19730
https://github.com/hashicorp/terraform-provider-azurerm/issues/22792
https://github.com/hashicorp/terraform-provider-azurerm/issues/17863
magodo commented 10 months ago

@istairbn Thank you for raising this request!

We are aware of this issue, especially for its existance in storage account and key vault, where during their creation, they'll reach to data plane APIs. This will indeed block users who always block public access for those data plane endpoint, which means they won't be able to create the resource at the first place, and won't have a chance to create the depended resources (e.g. the PE).

In order to fix this, for the storage account, we are reaching out to the Azure storage team to see whether they can provide the management plane API that fulfills the data plane APIs we are using. But we didn't get active response from them for a while.

That's why we are suggesting users are blocked by this issue to use azapi provider to workaround the issue as azapi only surface the management plane APIs. Regarding your statement below:

azapi provider call to close off the data plane after creation of endpoint: Can't fight Azure deny policies, very ugly experience

Could you please elaborate more on the detail?

For the proposal you raised, as you've mentioned, it combines the PE and the actual resource together, which seems a bit of weired. Meanwhile, the implementation is quite inconsistent for different resources, as the developer have to figure out which part is data plane related, and postpone that call untill the PE is created. This is a maintainance burden.

Based on above, I'd defer to others to comment more on this, while I still tag this issue as "enhancement" for now.

istairbn commented 10 months ago

Apologies for the essay - but I'm not convinced of the arguments at the moment.

This will indeed block users who always block public access for those data plane endpoin

Yes - which is should be every Financial & Regulated customer. It's not a small pool of users, nor is it low value data we're protecting. Everyone with a VNET Integration is at risk of being affected.

we are reaching out to the Azure storage team to see whether they can provide the management plane API that fulfills the data plane APIs we are using.

Unwise idea - the entire point of the split responsibilities between Management and Data planes is to allow for segregation and the protection of data. Microsoft are never going to do that, so we should drop that entirely as an avenue of solving the issue.

That's why we are suggesting users are blocked by this issue to use azapi provider to workaround the issue

I'd say that's not the right approach. This is meant to be the flagship Terraform provider for one of the largest public clouds. Storage & Key Vault aren't fringe services: they are core elements to using the Cloud. Saying "nah, we're just going to leave it unusable for any customers in regulated industries" isn't the right approach here - we should fix it.

azapi provider call to close off the data plane after creation of endpoint: Can't fight Azure deny policies, very ugly experience Could you please elaborate more on the detail?

AZAPI provider has an update resource facility. In some environments, I could build Storage/Key Vault with a public endpoint, then use AZAPI to amend the storage account. However it has two main issues:

  1. In an environment with Azure Policies applied, they can deny if the firewall is open at all (i.e. prevent anyone from ever trying to create with a public endpoint
  2. It's an ugly hack and makes my Terraform bloated and brittle: going against the core principles of declarative code.

For the proposal you raised, as you've mentioned, it combines the PE and the actual resource together, which seems a bit of weired.

This is the point I raised at the end: and it comes down to the question of what a AzureRM primitive could or should be.

We already have examples in the code base where the same resource is amended in multiple methods. For example Network Security Groups have the ability to declare the rules "inline" as part of the resource OR as a separate resource. In this case, the AzureRM provider has decided that, even though Azure declares NSG's as one thing, Terraform will allow us to treat them as two.

This is the inverse principle: Terraform explicitly saying "we will treat a resource and it's private networking as a unified resource from a Terraform perspective. This IS unusual, I admit. However, the reasons for doing so are logical.

The base assumption of any Terraform resource is that is atomic: it stands or falls on it's own. Then the graph can be built to discover dependencies. However, this is broken for these two resource types. In secure environments, it is impossible to create the correct graph, because we have a circular dependency.

If I can't deploy storage/keyvault without the endpoint - and I can't deploy the endpoint without the storage/keyvault - then logically they must be deployed together to make the resource atomic.

Meanwhile, the implementation is quite inconsistent for different resources, as the developer have to figure out which part is data plane related, and postpone that call untill the PE is created. This is a maintainance burden.

As far as I can tell, that is already happening Storage account creates clients for Blob / File / Queue etc and Key Vaults create a specific Data Plane Client. This means that the maintenance burden is already present - AzureRM provider is already explicitly saying "this is the part where we go beyond Management and into the world of Data.

If we ignore "what APIs we're using" and focus on higher level questions, it becomes simple.

Is a Storage Account completely built if it doesn't have a usable endpoint?

I argue no - and therefore to make the resouce complete, we need to enable the resource to correctly configure the required networking.

istairbn commented 10 months ago

It's been brought to my attention I didn't reference the "core issue" properly - #2977

michasacuer commented 4 months ago

@magodo any updates on that request?