hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.59k stars 4.63k forks source link

Breaking change - AzureRM provider 3.65 forces `Microsoft.Kubernetes` and `Microsoft.KubernetesConfiguration` resource provider registration #22515

Closed bubbletroubles closed 1 year ago

bubbletroubles commented 1 year ago

Is there an existing issue for this?

Community Note

Terraform Version

1.x

AzureRM Provider Version

3.65

Affected Resource(s)/Data Source(s)

Any

Terraform Configuration Files

Any

Debug Output/Panic Output


│ Original Error: Cannot register providers: Microsoft.Kubernetes, Microsoft.KubernetesConfiguration. Errors were: Cannot register provider Microsoft.Kubernetes with Azure Resource Manager: providers.ProvidersClient#Register: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client 'redacted' with object id 'redacted' does not have authorization to perform action 'Microsoft.Kubernetes/register/action' over scope '/subscriptions/redacted' or the scope is invalid. If access was recently granted, please refresh your credentials.".
│ Cannot register provider Microsoft.KubernetesConfiguration with Azure Resource Manager: providers.ProvidersClient#Register: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client 'redacted' with object id 'redacted' does not have authorization to perform action 'Microsoft.KubernetesConfiguration/register/action' over scope '/subscriptions/redacted' or the scope is invalid. If access was recently granted, please refresh your credentials.".
│ 
│   with provider["registry.terraform.io/hashicorp/azurerm"],
│   on provider.tf line 3, in provider "azurerm":
│    3: provider "azurerm" {

Expected Behaviour

Resource providers should not be force registered.

Actual Behaviour

Microsoft.Kubernetes and Microsoft.KubernetesConfiguration were added as a required resource provider in AzureRM 3.65.0. However, it's not a mandatory resource, and is a breaking change.

Microsoft's recommendation is to "Only register a resource provider when you're ready to use it. The registration step enables you to maintain least privileges within your subscription. A malicious user can't use resource providers that aren't registered.".

The AzureRM provider requiring registration of a non-mainstream resource is not best practice.

The release notes for the AzureRM provider release 3.65.0 do not mention that a new resource registration is required.

Steps to Reproduce

Deploy any AzureRM resource (except Azure Kubernetes) to a subscription without the Kubernetes resource provider registered, using a service connection scoped at any level other than the subscription. The deployment fails with the error.

Important Factoids

This also happened in provider version 3.51 - see this 21363

References

https://github.com/hashicorp/terraform-provider-azurerm/pull/22463 https://github.com/hashicorp/terraform-provider-azurerm/issues/22462 https://github.com/hashicorp/terraform-provider-azurerm/issues/21363

tombuildsstuff commented 1 year ago

hi @bubbletroubles

Thanks for opening this issue.

Azure's concept of a Resource Provider requires that the Resource Provider is registered before any API's within that Resource Provider can be called. As such, rather than failing during an API call / to improve the user experience Terraform automatically attempts to register any Resource Providers that it supports by default - to ensure that the API's are available when they're needed; and since this list is stored within Terraform, we extend this list from time to time as Terraform is updated to support new functionality (provided in new Resource Providers).

If you'd prefer to manage Resource Provider Registration outside of Terraform (for example you're running in a restricted environment) - it's possible to opt-out of this behaviour by setting this field in the Provider block.

Since this is working as intended I'm going to close this issue for the moment - but please let us know if disabling this functionality doesn't work for you and we'll take another look.

Thanks!

davisowb commented 1 year ago

@tombuildsstuff - What is the minimum level of permissions the azurerm provider now needs to work before we have to disable auto registration?

Previously this worked for us with a service principal that only had permissions to read secrets from keyvault but now the Kubernetes resource provider auto registration wants Microsoft.Kubernetes/register/action & Microsoft.KubernetesConfiguration/register/action at the subscription level despite not even using them?

tony-harverson-moonpig commented 1 year ago

Good Morning, @tombuildsstuff

According to https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/azure-services-resource-providers#registration, those providers aren't registered by default. As @bubbletroubles above noted, the recommendation by Microsoft is not to enable them unless they're going to be in use. This will probably result in those providers being disabled by default in a noticeable portion of accounts, and users running in pipelines with least-privilege permissions are likely to see the registration error.

It's probably worth at least a breaking change notification in the release notes for v3.65.0 mentioning the error message, and pointing people to the skip_provider_registration flag on the provider.

ivan-zaitsev commented 1 year ago

Is there a way to register them if the components are really in use? Or at least skip provider registration for specific resources, but not all of them?

anwarnk commented 1 year ago

FYI - Via GitHub Actions we are seeing that it is registering "Microsoft.Kubernetes" "Microsoft.KubernetesConfiguration" and successfully registering in the azore portal, however the actions is still failing even re-running the job.

bubbletroubles commented 1 year ago

Also in the required.go file it says:

// new core Resource Providers should be added to this list as they're used in the Provider

I wouldn't call Kubernetes a core resource provider. Many organizations don't require or use Kubernetes.

bubbletroubles commented 1 year ago

FYI - Via GitHub Actions we are seeing that it is registering "Microsoft.Kubernetes" "Microsoft.KubernetesConfiguration" and successfully registering in the azore portal, however the actions is still failing even re-running the job.

It may take time for the provider to fully register. Try again a bit later.

davisowb commented 1 year ago

So to answer my own question above....

The only built-in roles that have the permission to run the now required actions (Microsoft.Kubernetes/register/action & Microsoft.KubernetesConfiguration/register/action) are:

Which being that they have to be applied at the subscription level for this to work certainly goes against the best practice of providing least privilege access considering both of these essentially provide full access to the subscription.

tombuildsstuff commented 1 year ago

@davisowb @ivan909020 @tony-harverson-moonpig @bubbletroubles

As mentioned above, the Provider automatically registers Resource Providers as is needed - this is by design - and this list can be added to as needed. As such, this is not a breaking change since we assume the credentials used to launch the Provider have access to register Resource Providers by default.

If the credentials used to launch the Provider don't have permission to register Resource Providers, then as mentioned above (and in the documentation) you'll need to ensure the skip_provider_registration flag is set. This allows the Resource Provider registration to be managed outside of Terraform (for example, using a more privileged account) and have the Provider use only the permissions it needs.

Whilst it's unfortunate that this has broken your environments, since the Provider is being launched without the permissions needed to register Resource Providers, this is ultimately a configuration/permissions issue - and as such is not considered a breaking change on the Provider side (this is also documented in the changelog).

Based on the discussion above, it sounds like you may not be pinning the version of the Provider being used and so this change has rolled out unexpectedly - as such we'd recommend ensuring that you're pinning the version of the Provider being used, which would have avoided this issue.

As such whilst it's unfortunate that this has broken your pipelines, since this is not a breaking change we have no plans to revert this PR and you'll need to update the provider block to account for the fact that the Service Principal being used has a limited set of permissions.

Thanks!

tombuildsstuff commented 1 year ago

@ivan909020

Is there a way to register them if the components are really in use? Or at least skip provider registration for specific resources, but not all of them?

Whilst that's something we'd like to support in the future, unfortunately Terraform Core doesn't expose the functionality needed to do that at this time - implementing this would require a change to the Terraform Protocol (which would require bumping the minimum of Core to account for that) - so that's not possible in the short-term.

At this point in time the "least-bad" thing we can do is to work through this defined list of Resource Providers - since unfortunately the error messages returned from Azure are unclear (e.g. API Version 2022-02-01 was not found) when the associated Resource Provider is not registered, which has caused a large amount of (new) user confusion over the years.

On the flip side, if a user is opting to run Terraform with a more limited set of permissions, we have to assume they're more familiar with Azure and so will be aware of Resource Providers and the need to register these - which is why this flag in the provider block exists.

All that to say, whilst we'd like to conditionally register the Resource Providers being used in advance, unfortunately this isn't possible at this time with the current Terraform Protocol design.

Thanks!

mloskot commented 1 year ago

My Azure Pipelines have just been hit by this issue and I second https://github.com/hashicorp/terraform-provider-azurerm/issues/22515#issuecomment-1635653005 about the new requirement being against the best practices.

For the records, here is full error dump I'm getting from Terraform 1.5.2

Error: Error ensuring Resource Providers are registered.

Terraform automatically attempts to register the Resource Providers it supports to
ensure it's able to provision resources.

If you don't have permission to register Resource Providers you may wish to use the
"skip_provider_registration" flag in the Provider block to disable this functionality.

Please note that if you opt out of Resource Provider Registration and Terraform tries
to provision a resource from a Resource Provider which is unregistered, then the errors
may appear misleading - for example:

> API version 2019-XX-XX was not found for Microsoft.Foo

Could indicate either that the Resource Provider "Microsoft.Foo" requires registration,
but this could also indicate that this Azure Region doesn't support this API version.

More information on the "skip_provider_registration" flag can be found here:
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs#skip_provider_registration

Original Error: Cannot register providers: Microsoft.KubernetesConfiguration, Microsoft.Kubernetes.
Errors were: Cannot register provider Microsoft.KubernetesConfiguration with Azure Resource Manager:
providers.ProvidersClient#Register: Failure responding to request:
StatusCode=403 -- Original Error: autorest/azure: Service returned an error.
Status=403 Code="AuthorizationFailed"
Message="The client 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with object id 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
does not have authorization to perform action 'Microsoft.KubernetesConfiguration/register/action'
over scope '/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' or the scope is invalid.
If access was recently granted, please refresh your credentials.".

Cannot register provider Microsoft.Kubernetes with Azure Resource Manager: providers.ProvidersClient#Register:
Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error.
Status=403 Code="AuthorizationFailed"
Message="The client 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with object id 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
does not have authorization to perform action 'Microsoft.Kubernetes/register/action' over scope '/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
or the scope is invalid. If access was recently granted, please refresh your credentials.".

  with provider["registry.terraform.io/hashicorp/azurerm"].backend,
  on provider.tf line 25, in provider "azurerm":
  25: provider "azurerm" {
rw-hardin commented 1 year ago

100% a breaking change.

To clarify, this breaks existing pipelines in environments where the identity executing the run does not have permission to register providers. This should only be required if kubernetes is actually being used in the context of the provider.

stuydnz commented 1 year ago

hi @bubbletroubles

Thanks for opening this issue.

Azure's concept of a Resource Provider requires that the Resource Provider is registered before any API's within that Resource Provider can be called. As such, rather than failing during an API call / to improve the user experience Terraform automatically attempts to register any Resource Providers that it supports by default - to ensure that the API's are available when they're needed; and since this list is stored within Terraform, we extend this list from time to time as Terraform is updated to support new functionality (provided in new Resource Providers).

If you'd prefer to manage Resource Provider Registration outside of Terraform (for example you're running in a restricted environment) - it's possible to opt-out of this behaviour by setting this field in the Provider block.

Since this is working as intended I'm going to close this issue for the moment - but please let us know if disabling this functionality doesn't work for you and we'll take another look.

Thanks!

I have also just been hit with this issue and agree that this is a breaking change. I am trying to run a pipeline at a resource group level E.g. the SPN doesn't have subscription level access nor should it... least privilege access and all.

What you are saying now is I need to go and add the Kubernetes providers to all our subscriptions even when they are not needed or have the "skip_provider_registration" flag set in the provider block of all the pipelines that were working.

Note: the error from Terraform implies that setting the flag above isn't a good idea:

If you don't have permission to register Resource Providers you may wish to use the "skip_provider_registration" flag in the Provider block to disable this functionality.

Please note that if you opt out of Resource Provider Registration and Terraform tries to provision a resource from a Resource Provider which is unregistered, then the errors may appear misleading - for example:

API version 2019-XX-XX was not found for Microsoft.Foo

Could indicate either that the Resource Provider "Microsoft.Foo" requires registration, but this could also indicate that this Azure Region doesn't support this API version.

Someone has been very narrow minded in the approach taken here. I sure as hell don't want every developer in the organisation to have the level of rights expected here.

Also, closing the ticket so quickly (the same day it was raised) before understanding the impact, well lets just say its wrong.

bubbletroubles commented 1 year ago

@tombuildsstuff hoping you can read this (and the other comments in this bug) and see the impact it has on organisations large and small.

The AzureRM Terraform provider is used in many different organisations with various levels of cloud maturity and governance. I have no doubt its used in highly regulated environments such as banking and government.

I understand the stance that this is not a breaking change, however the other side of this is organisations have configured their Azure environments with Microsoft best practices (e.g., only register the providers required) and rely on timely updates to the AzureRM provider (as cloud is always evolving - if you pin to an older version for too long, it's a big leap to get up to the latest, more testing, more coordination, a bigger release). I don't think it's unreasonable for an organisation to make these decisions.

For many customers, they are now having to:

The option of disabling provider registration in the provider block may work, but in some organisations this may have to be done in 10's or 100's of configs, and requires communication with impacted development teams. Then there needs to be a way to register providers when teams do need them. Azure Infrastructure is not the focus of many of these development teams (they just want their app to work), and the work of core infrastructure teams to guide the wider developer community within an organisation through this could be large. Granted all of this is not the Hashicorp AzureRM provider problem, but the breaking change brings this issue to the forefront.

Great products don't just technically deliver, they also understand how their product is used and build trust within their userbase and community. This change is doing the opposite. Major changes can happen, but its how they are communicated and implemented that makes a difference.

Again, I understand that you don't see this as a breaking change, however the impact this has on customers who have configured their Azure clouds in good faith are now faced with a mountain of work to keep their platforms running and maintained, with no advance notice, without a major version update.

I don't expect you to change your decision (although it would be welcomed if you did), however I hope it helps realise how these decisions can have a downstream impact on 100's of organisations, potentially costing 100's of hours in large organisations, impacting productivity, creating noise around "Terraform problems" and taking away from delivering core business.

rw-hardin commented 1 year ago

'creating noise around "Terraform problems"'

And coming from someone that deals with these issues on a daily basis, this extra noise is bad for an organization's relationship with Hashi. Many see these issues occur and wonder why Terraform is required.

Support requests are raised for something that could have been avoided, and as @bubbletroubles mentioned above, teams are having to set version constraints to 3.64.0 to avoid pipeline disruption over something that isn't even related to their code/infrastructure.

Kubernetes may be widely used, but in a lot of environments, Kubernetes is isolated/limited to particular subscriptions. Not allowing the registration of this provider globally is actually seen as a protection against random Kubernetes clusters being built outside of the appropriate space.

A better option would have been to allow an "opt-in" switch for Kubernetes.

EDIT:

@bubbletroubles, in reference to the use of the provider within regulated environments, I can speak with 100% confidence that it is, as I have direct knowledge/experience with this use case.

anwarnk commented 1 year ago

Can you re-open this issue, we are running as 'Owner' and it is still failing.

Original Error: Cannot register providers: Microsoft.Kubernetes, Microsoft.KubernetesConfiguration. Errors were: Cannot register provider Microsoft.Kubernetes with Azure Resource Manager: providers.ProvidersClient#Register: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" image

kaovd commented 1 year ago

I'm not exactly sure what warrants this issue to be closed and shuffled off when #21363 was addressed? Not sure if you want to add any commentary @stephybun as you approved the rollback on the last occurrence

stephybun commented 1 year ago

Thank you all for your input and concerns on this issue.

It's unfortunate that these two RPs made it into the list of auto-registered providers since I agree that Kubernetes and KubernetesConfiguration should not be considered core RPs.

I do however agree with @tombuildsstuff in that it is documented that we expect/recommend the provider be run with a user that has sufficient permissions to be able to register RPs (albeit there is improvement to be made in the documentation to call that out) and so I would not consider this a breaking change.

Whether this is the right approach or requirement to have for AzureRM is something that we should probably revisit but to reiterate we are sadly bound by the current functionality available in core.

That said I can understand the disruption that this change has caused and I can empathise with the frustration. We have corrected oversights like this in the past and I believe we should do so here as well.

@bubbletroubles I hope you don't mind - instead of reopening your PR I will open a new one since I would like to lump the removal of these RPs from the list with an update to the documentation to call out the permissions more explicitly.

stevehipwell commented 1 year ago

FWIW this is 100% a breaking change and blaming the poor UX on top of this on the AzureRM API is missing the point that Terraform wraps the API and has plenty of options to improve the UX.

rw-hardin commented 1 year ago

I'm just glad someone listened. Thanks @stephybun.

I would like to reiterate a point that @bubbletroubles made previously. Microsoft recommends a least privileges approach for the resource providers. It seems like Hashi should hear that and consider use cases such as regulated environments.

If my pipeline needs to do one single thing in another subscription (let's say register Private DNS), then it shouldn't need permission to register a resource provider in that subscription....especially if that subscription is intentionally locked down to specific use cases and something like Kubernetes isn't deployed there at all.

To be honest, auto registering the providers has never made sense in these environments and actually strikes me as a lazy way to avoid logic that checks to see if a given provider is even needed during that run.

Anyway, thanks again for listening and getting this reverted.

stephybun commented 1 year ago

@rw-hardin I'm glad we could resolve this, yours as well as the community's feedback is valuable to us.

From the discussions here it sounds like everyone is striving for the same outcome. The improvement we want to make isn't possible currently but I've linked the issue raised on core down below for those that want to understand the limitation.

It could help to get a 👍 on both the issue raised on core, as well as the feature request on our provider for improvement on how provider registration is done.

Core issue: https://github.com/hashicorp/terraform/issues/25568 AzureRM feature request: https://github.com/hashicorp/terraform-provider-azurerm/issues/20673

rw-hardin commented 1 year ago

Done and done. I like those ideas. Thanks!

buddhamangler-cbre commented 1 year ago

Yes, this was a breaking change. My question is why was this allowed to occur on a minor version update???

NOT COOL

buddhamangler-cbre commented 1 year ago

@davisowb @ivan909020 @tony-harverson-moonpig @bubbletroubles

As mentioned above, the Provider automatically registers Resource Providers as is needed - this is by design - and this list can be added to as needed. As such, this is not a breaking change since we assume the credentials used to launch the Provider have access to register Resource Providers by default.

If the credentials used to launch the Provider don't have permission to register Resource Providers, then as mentioned above (and in the documentation) you'll need to ensure the skip_provider_registration flag is set. This allows the Resource Provider registration to be managed outside of Terraform (for example, using a more privileged account) and have the Provider use only the permissions it needs.

Whilst it's unfortunate that this has broken your environments, since the Provider is being launched without the permissions needed to register Resource Providers, this is ultimately a configuration/permissions issue - and as such is not considered a breaking change on the Provider side (this is also documented in the changelog).

Based on the discussion above, it sounds like you may not be pinning the version of the Provider being used and so this change has rolled out unexpectedly - as such we'd recommend ensuring that you're pinning the version of the Provider being used, which would have avoided this issue.

As such whilst it's unfortunate that this has broken your pipelines, since this is not a breaking change we have no plans to revert this PR and you'll need to update the provider block to account for the fact that the Service Principal being used has a limited set of permissions.

Thanks!

This is a complete misunderstanding of what constitutes a breaking change. Yes, the credentials used don't have the permissions required to register the provider, but the list of providers required to be registered changed and changed in a breaking way.

Sorry, but simply listing this as an enhancement in the release notes is not enough here. This should have been a new major version and the breaking change explicitly pointed out...or should we grow accustomed to the idea that breaking changes will continue to be released in minor version updates of the provider?

Juan4SAS commented 8 months ago

@tombuildsstuff & @stephybun ,

there is something I really cannot understand from here and I could not find in the documentation either. All of this is clear when the access is through a User (Service Principal). But, what about when the Terraform is launched and using an Azure client ID through an azure App Registration? What permissions should be granted when finding this kind of errors and how to grant them exactly? Without it, Terraform scripts will never be able to have all required permissions. Any guidance would be appreciated.

Thanks in advance for any supporting reply

github-actions[bot] commented 6 months ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.