hashicorp / terraform-provider-azurerm

Terraform provider for Azure Resource Manager
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Mozilla Public License 2.0
4.52k stars 4.6k forks source link

known-issue: `ExpiredAuthenticationToken` when running a long-running operation #20867

Closed op7ic closed 1 year ago

op7ic commented 1 year ago

Is there an existing issue for this?

Community Note

Terraform Version

1.3.9

AzureRM Provider Version

3.46.0

Affected Resource(s)/Data Source(s)

azurerm_active_directory_domain_service

Terraform Configuration Files

[...]
resource "azurerm_active_directory_domain_service" "test" {
  name                  = "ad"
  location              = azurerm_resource_group.aadds.location
  resource_group_name   = azurerm_resource_group.aadds.name
  filtered_sync_enabled = false
  domain_name           = "test.test.com"
  sku                   = "Standard"

  initial_replica_set {
    subnet_id = azurerm_subnet.aadds.id
  }

  notifications {
    notify_dc_admins      = true
    notify_global_admins  = true
  }

  security {
    sync_kerberos_passwords = true
    sync_ntlm_passwords     = true
    sync_on_prem_passwords  = true
  }

  timeouts {
    create = "12h"
    update = "12h"
    read = "120m"
    delete = "12h"
  }

  tags = {
    environment = "prod"
  }
}
[...]

### Debug Output/Panic Output

```shell
Error: creating/updating Domain Service (Name: "ad", Resource Group: "test"): polling after CreateOrUpdate: Future#WaitForCompletion: the number of retries has been exceeded: StatusCode=401 -- Original Error: Code="ExpiredAuthenticationToken" Message="The access token expiry UTC time '3/9/2023 11:01:31 AM' is earlier than current UTC time '3/9/2023 11:05:09 AM'.


### Expected Behaviour

Build process should finish successfully despite long build time.

### Actual Behaviour

The build process for the azurerm_active_directory_domain_service resource takes over 1.5h which means that, before build is completed, the authentication token expires. As the polling for completion is done every 10 seconds, eventually the authentication token will expire mid-way through build process and whole resource creation will fail resulting in the above error. 

### Steps to Reproduce

terraform apply

### Important Factoids

No

### References

None
magodo commented 1 year ago

@op7ic From the error code, it seems the token is expired and somehow not refreshed. Would you please provide the debug log so that we can further investigate?

op7ic commented 1 year ago

hi @magodo that's correct - it exactly what happens mid-way through build. I'll get the log and upload it here.

tombuildsstuff commented 1 year ago

@op7ic I believe the root cause (the auth token not renewing correctly in all cases) has been fixed in v3.47.0, would you mind updating to the latest version and seeing if that works for you?

op7ic commented 1 year ago

@tombuildsstuff sure let me try. In the meantime here is the last error from debug log:

{"error":{"code":"ExpiredAuthenticationToken","message":"The access token expiry UTC time '3/13/2023 6:48:43 PM' is earlier than current UTC time '3/13/2023 6:48:59 PM'."}}: timestamp=2023-03-13T21:49:01.030+0300
2023-03-13T21:49:01.030+0300 [DEBUG] provider.terraform-provider-azurerm_v3.46.0_x5: Unlocking "azurerm_active_directory_domain_service.ad": timestamp=2023-03-13T21:49:01.030+0300
2023-03-13T21:49:01.030+0300 [DEBUG] provider.terraform-provider-azurerm_v3.46.0_x5: Unlocked "azurerm_active_directory_domain_service.ad": timestamp=2023-03-13T21:49:01.030+0300
2023-03-13T21:49:01.032+0300 [ERROR] provider.terraform-provider-azurerm_v3.46.0_x5: Response contains error diagnostic: diagnostic_severity=ERROR tf_proto_version=5.3 tf_req_id=xxxxxxxxxxx tf_resource_type=azurerm_active_directory_domain_service @caller=github.com/hashicorp/terraform-plugin-go@v0.14.3/tfprotov5/internal/diag/diagnostics.go:55 @module=sdk.proto diagnostic_detail= diagnostic_summary="waiting for both domain controllers to become available in initial replica set for Domain Service: (Initial Replica Set Id Name "xxxxxxxx" / Name "xxxxx" / Resource Group "xxxxxxxx"): domainservices.DomainServicesClient#Get: Failure responding to request: StatusCode=401 -- Original Error: autorest/azure: Service returned an error. Status=401 Code="ExpiredAuthenticationToken" Message="The access token expiry UTC time '3/13/2023 6:48:43 PM' is earlier than current UTC time '3/13/2023 6:48:59 PM'."" tf_provider_addr=provider tf_rpc=ApplyResourceChange timestamp=2023-03-13T21:49:01.032+0300
2023-03-13T21:49:01.141+0300 [ERROR] vertex "azurerm_active_directory_domain_service.aadds" error: waiting for both domain controllers to become available in initial replica set for Domain Service: (Initial Replica Set Id Name "xxxxx" / Name "xxxxx" / Resource Group "xxxxx"): domainservices.DomainServicesClient#Get: Failure responding to request: StatusCode=401 -- Original Error: autorest/azure: Service returned an error. Status=401 Code="ExpiredAuthenticationToken" Message="The access token expiry UTC time '3/13/2023 6:48:43 PM' is earlier than current UTC time '3/13/2023 6:48:59 PM'."
2023-03-13T22:19:02.969+0300 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2023-03-13T22:19:02.969+0300 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2023-03-13T22:19:02.969+0300 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2023-03-13T22:19:02.970+0300 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/random/3.4.3/linux_amd64/terraform-provider-random_v3.4.3_x5 pid=27702
2023-03-13T22:19:02.970+0300 [DEBUG] provider: plugin exited
2023-03-13T22:19:02.970+0300 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/null/3.2.1/linux_amd64/terraform-provider-null_v3.2.1_x5 pid=27694
2023-03-13T22:19:02.970+0300 [DEBUG] provider: plugin exited
2023-03-13T22:19:02.974+0300 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/azurerm/3.46.0/linux_amd64/terraform-provider-azurerm_v3.46.0_x5 pid=27711
2023-03-13T22:19:02.974+0300 [DEBUG] provider: plugin exited
magodo commented 1 year ago

@tombuildsstuff Probabaly it is v3.48.0, per https://github.com/hashicorp/terraform-provider-azurerm/pull/20894?

op7ic commented 1 year ago

@magodo I believe 3.48.0 is not yet published but 3.47.0 is as per https://registry.terraform.io/providers/hashicorp/azurerm/latest. I'll try 3.47 first and report back to see if there is an improvement.

tombuildsstuff commented 1 year ago

@op7ic 🀦 apologies for the confusion, I thought that the fix (#20894) went into 3.47.0 last week, but @magodo is right, that'll ship in 3.48.0 later this week

op7ic commented 1 year ago

@tombuildsstuff / @magodo I can confirm that 3.47.0 is still the same. I'll test 3.48 once it arrives but looking at commitment code for the release it seems that it should work ;)

tombuildsstuff commented 1 year ago

Sounds good - since #20894 has been merged and this'll go out in this weeks release - I'm going to close this issue for the moment and assign this to this weeks milestone, once the release is out then a comment will be posted - but if this still an issue in v3.48 then please let us know and we can take another look πŸ‘

github-actions[bot] commented 1 year ago

This functionality has been released in v3.48.0 of the Terraform Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

garbology commented 1 year ago

@tombuildsstuff This is still an issue with v3.48.0, experienced when building the SQL Managed Instance using azurerm_mssql_managed_instance.
At the 60 min mark:

waiting for creation of Managed Instance: (Name "[SQLMI-name]" / Resource Group "[RG-name]"): Future#WaitForCompletion: the number of retries has  
been exceeded: StatusCode=401 -- Original Error: Code="ExpiredAuthenticationToken" Message="The access token expiry UTC time '3/17/2023 2:24:05 AM' is earlier  
than current UTC time '3/17/2023 2:27:38 AM'.
tombuildsstuff commented 1 year ago

@manicminer mind taking a look into this one?

manicminer commented 1 year ago

@garbology Would you be able to provide a debug log showing the error and requests that led to it? Also, can you confirm your system clock is accurate where Terraform is running? Thanks.

op7ic commented 1 year ago

@manicminer / @tombuildsstuff Just for completness, I can confirm that the same symptom appears on v3.48 for AD creation as noted above. Around an hour mark the session expires. Here is debug log from this:

{"error":{"code":"ExpiredAuthenticationToken","message":"The access token expiry UTC time '3/19/2023 11:33:34 AM' is earlier than current UTC time '3/19/2023 11:33:39 AM'."}}: timestamp=2023-03-19T14:33:39.637+0300
2023-03-19T14:33:40.270+0300 [DEBUG] provider.terraform-provider-azurerm_v3.48.0_x5: AzureRM Request: 
GET /subscriptions/xxxxxxx/providers/Microsoft.Compute/locations/uksouth/operations/xxxxxxx&api-version=2022-08-01 HTTP/1.1
Host: management.azure.com
User-Agent: Go/go1.19.3 (amd64-linux) go-autorest/v14.2.1 tombuildsstuff/kermit/v0.20230307.1105329 compute/2022-08-01 HashiCorp Terraform/1.3.9 (+https://www.terraform.io) Terraform Plugin SDK/2.10.1 terraform-provider-azurerm/dev pid-222c6c49-1b0a-5959-a213-6608f9eb8820
X-Ms-Correlation-Request-Id: xxxxxxx
Accept-Encoding: gzip: timestamp=2023-03-19T14:33:40.270+0300
2023-03-19T14:33:40.505+0300 [DEBUG] provider.terraform-provider-azurerm_v3.48.0_x5: AzureRM Response for https://management.azure.com/subscriptions/xxxxxxx/providers/Microsoft.Compute/locations/uksouth/operations/xxxxxxx?p=xxxxxxx&api-version=2022-08-01: 
HTTP/2.0 401 Unauthorized
Content-Length: 174
Cache-Control: no-cache
Content-Type: application/json; charset=utf-8
Date: Sun, 19 Mar 2023 11:33:39 GMT
Expires: -1
Pragma: no-cache
Strict-Transport-Security: max-age=31536000; includeSubDomains
Www-Authenticate: Bearer authorization_uri="https://login.windows.net/xxxxxxx", error="invalid_token", error_description="The access token has expired."
X-Content-Type-Options: nosniff
X-Ms-Correlation-Request-Id: xxxxxxx
X-Ms-Failure-Cause: gateway
X-Ms-Request-Id: xxxxxxx
X-Ms-Routing-Request-Id: EASTUS:20230319T113340Z:xxxxxxx

It appears that, so long creation of the resource, is less than an hour there are no problems. If resource creation takes over an hour then ExpiredAuthenticationToken error will appear.

hcevensen commented 1 year ago

Can confirm still happening for me when creating sqlmi around the hour mark also with 3.48 , using 3.43 it has no issues

garbology commented 1 year ago

@manicminer System clock is accurate.

2023-03-19T19:54:56.120+1100 [DEBUG] provider.terraform-provider-azurerm_v3.48.0_x5.exe: AzureRM Response for https://management.azure.com/subscriptions/<subscription>/resourceGroups/test/providers/Microsoft.Sql/managedInstances/<SQL MI Name>?api-version=2020-11-01-preview: 
HTTP/2.0 200 OK
Cache-Control: no-cache
Content-Type: application/json; charset=utf-8
Date: Sun, 19 Mar 2023 08:54:56 GMT
Expires: -1
Pragma: no-cache
Server: Microsoft-HTTPAPI/2.0
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Accept-Encoding
X-Content-Type-Options: nosniff
X-Ms-Correlation-Request-Id: <Correlation-Request-Id>
X-Ms-Ratelimit-Remaining-Subscription-Reads: 11884
X-Ms-Request-Id: <Request-Id 1>
X-Ms-Routing-Request-Id: AUSTRALIASOUTHEAST:20230319T085456Z:x

{"identity":{"principalId":"<service principal>","type":"SystemAssigned","tenantId":"<tenant>"},"sku":{"name":"GP_Gen5","tier":"GeneralPurpose","family":"Gen5","capacity":4},"properties":{"provisioningState":"Creating","administratorLogin":"<Adminuser>","subnetId":"/subscriptions/<subscription>/resourceGroups/test/providers/Microsoft.Network/virtualNetworks/vnet-test-2/subnets/sql_mi_subnet_2","state":"Creating","licenseType":"BasePrice","vCores":4,"storageSizeInGB":32,"collation":"SQL_Latin1_General_CP1_CI_AS","publicDataEndpointEnabled":false,"proxyOverride":"Default","timezoneId":"UTC","maintenanceConfigurationId":"/subscriptions/<subscription>/providers/Microsoft.Maintenance/publicMaintenanceConfigurations/SQL_Default","privateEndpointConnections":[],"minimalTlsVersion":"1.2","storageAccountType":"LRS","zoneRedundant":false},"location":"eastus","tags":{},"id":"/subscriptions/<subscription>/resourceGroups/test/providers/Microsoft.Sql/managedInstances/<SQL MI Name>","name":"<SQL MI Name>","type":"Microsoft.Sql/managedInstances"}: timestamp=2023-03-19T19:54:56.120+1100

2023-03-19T19:55:26.128+1100 [DEBUG] provider.terraform-provider-azurerm_v3.48.0_x5.exe: AzureRM Request: 
GET /subscriptions/<subscription>/resourceGroups/test/providers/Microsoft.Sql/managedInstances/<SQL MI Name>?api-version=2020-11-01-preview HTTP/1.1
Host: management.azure.com
User-Agent: Go/go1.19.3 (amd64-windows) go-autorest/v14.2.1 Azure-SDK-For-Go/v66.0.0 sql/v5.0 HashiCorp Terraform/1.4.1 (+https://www.terraform.io) Terraform Plugin SDK/2.10.1 terraform-provider-azurerm/dev pid-222c6c49-1b0a-5959-a213-6608f9eb8820
X-Ms-Correlation-Request-Id: <Correlation-Request-Id>
Accept-Encoding: gzip: timestamp=2023-03-19T19:55:26.128+1100

2023-03-19T19:55:27.423+1100 [DEBUG] provider.terraform-provider-azurerm_v3.48.0_x5.exe: AzureRM Response for https://management.azure.com/subscriptions/<subscription>/resourceGroups/test/providers/Microsoft.Sql/managedInstances/<SQL MI Name>?api-version=2020-11-01-preview: 
HTTP/2.0 200 OK
Cache-Control: no-cache
Content-Type: application/json; charset=utf-8
Date: Sun, 19 Mar 2023 08:55:27 GMT
Expires: -1
Pragma: no-cache
Server: Microsoft-HTTPAPI/2.0
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Accept-Encoding
X-Content-Type-Options: nosniff
X-Ms-Correlation-Request-Id: <Correlation-Request-Id>
X-Ms-Ratelimit-Remaining-Subscription-Reads: 11883
X-Ms-Request-Id: <Request-Id 2>
X-Ms-Routing-Request-Id: AUSTRALIASOUTHEAST:20230319T085527Z:x

{"identity":{"principalId":"<service principal>","type":"SystemAssigned","tenantId":"<tenant>"},"sku":{"name":"GP_Gen5","tier":"GeneralPurpose","family":"Gen5","capacity":4},"properties":{"provisioningState":"Creating","administratorLogin":"<Adminuser>","subnetId":"/subscriptions/<subscription>/resourceGroups/test/providers/Microsoft.Network/virtualNetworks/vnet-test-2/subnets/sql_mi_subnet_2","state":"Creating","licenseType":"BasePrice","vCores":4,"storageSizeInGB":32,"collation":"SQL_Latin1_General_CP1_CI_AS","publicDataEndpointEnabled":false,"proxyOverride":"Default","timezoneId":"UTC","maintenanceConfigurationId":"/subscriptions/<subscription>/providers/Microsoft.Maintenance/publicMaintenanceConfigurations/SQL_Default","privateEndpointConnections":[],"minimalTlsVersion":"1.2","storageAccountType":"LRS","zoneRedundant":false},"location":"eastus","tags":{},"id":"/subscriptions/<subscription>/resourceGroups/test/providers/Microsoft.Sql/managedInstances/<SQL MI Name>","name":"<SQL MI Name>","type":"Microsoft.Sql/managedInstances"}: timestamp=2023-03-19T19:55:27.423+1100

2023-03-19T19:55:57.427+1100 [DEBUG] provider.terraform-provider-azurerm_v3.48.0_x5.exe: AzureRM Request: 
GET /subscriptions/<subscription>/resourceGroups/test/providers/Microsoft.Sql/managedInstances/<SQL MI Name>?api-version=2020-11-01-preview HTTP/1.1
Host: management.azure.com
User-Agent: Go/go1.19.3 (amd64-windows) go-autorest/v14.2.1 Azure-SDK-For-Go/v66.0.0 sql/v5.0 HashiCorp Terraform/1.4.1 (+https://www.terraform.io) Terraform Plugin SDK/2.10.1 terraform-provider-azurerm/dev pid-222c6c49-1b0a-5959-a213-6608f9eb8820
X-Ms-Correlation-Request-Id: <Correlation-Request-Id>
Accept-Encoding: gzip: timestamp=2023-03-19T19:55:57.427+1100

2023-03-19T19:55:57.444+1100 [DEBUG] provider.terraform-provider-azurerm_v3.48.0_x5.exe: AzureRM Response for https://management.azure.com/subscriptions/<subscription>/resourceGroups/test/providers/Microsoft.Sql/managedInstances/<SQL MI Name>?api-version=2020-11-01-preview: 
HTTP/2.0 401 Unauthorized
Content-Length: 172
Cache-Control: no-cache
Content-Type: application/json; charset=utf-8
Date: Sun, 19 Mar 2023 08:55:57 GMT
Expires: -1
Pragma: no-cache
Strict-Transport-Security: max-age=31536000; includeSubDomains
Www-Authenticate: Bearer authorization_uri="https://login.windows.net/<tenant>", error="invalid_token", error_description="The access token has expired."
X-Content-Type-Options: nosniff
X-Ms-Correlation-Request-Id: <Correlation-Request-Id>
X-Ms-Failure-Cause: gateway
X-Ms-Request-Id: <Request-Id 3>
X-Ms-Routing-Request-Id: AUSTRALIASOUTHEAST:20230319T085557Z:<Request-Id 3>

{"error":{"code":"ExpiredAuthenticationToken","message":"The access token expiry UTC time '3/19/2023 8:55:46 AM' is earlier than current UTC time '3/19/2023 8:55:57 AM'."}}: timestamp=2023-03-19T19:55:57.444+1100

2023-03-19T19:56:27.455+1100 [DEBUG] provider.terraform-provider-azurerm_v3.48.0_x5.exe: AzureRM Request: 
GET /subscriptions/<subscription>/resourceGroups/test/providers/Microsoft.Sql/managedInstances/<SQL MI Name>?api-version=2020-11-01-preview HTTP/1.1
Host: management.azure.com
User-Agent: Go/go1.19.3 (amd64-windows) go-autorest/v14.2.1 Azure-SDK-For-Go/v66.0.0 sql/v5.0 HashiCorp Terraform/1.4.1 (+https://www.terraform.io) Terraform Plugin SDK/2.10.1 terraform-provider-azurerm/dev pid-222c6c49-1b0a-5959-a213-6608f9eb8820
X-Ms-Correlation-Request-Id: <Correlation-Request-Id>
Accept-Encoding: gzip: timestamp=2023-03-19T19:56:27.455+1100

2023-03-19T19:56:27.474+1100 [DEBUG] provider.terraform-provider-azurerm_v3.48.0_x5.exe: AzureRM Response for https://management.azure.com/subscriptions/<subscription>/resourceGroups/test/providers/Microsoft.Sql/managedInstances/<SQL MI Name>?api-version=2020-11-01-preview: 
HTTP/2.0 401 Unauthorized
Content-Length: 172
Cache-Control: no-cache
Content-Type: application/json; charset=utf-8
Date: Sun, 19 Mar 2023 08:56:27 GMT
Expires: -1
Pragma: no-cache
Strict-Transport-Security: max-age=31536000; includeSubDomains
Www-Authenticate: Bearer authorization_uri="https://login.windows.net/<tenant>", error="invalid_token", error_description="The access token has expired."
X-Content-Type-Options: nosniff
X-Ms-Correlation-Request-Id: <Correlation-Request-Id>
X-Ms-Failure-Cause: gateway
X-Ms-Request-Id: <Request-Id 4>
X-Ms-Routing-Request-Id: AUSTRALIASOUTHEAST:20230319T085627Z:<Request-Id 4>

{"error":{"code":"ExpiredAuthenticationToken","message":"The access token expiry UTC time '3/19/2023 8:55:46 AM' is earlier than current UTC time '3/19/2023 8:56:27 AM'."}}: timestamp=2023-03-19T19:56:27.474+1100

2023-03-19T19:57:27.486+1100 [DEBUG] provider.terraform-provider-azurerm_v3.48.0_x5.exe: AzureRM Request: 
GET /subscriptions/<subscription>/resourceGroups/test/providers/Microsoft.Sql/managedInstances/<SQL MI Name>?api-version=2020-11-01-preview HTTP/1.1
Host: management.azure.com
User-Agent: Go/go1.19.3 (amd64-windows) go-autorest/v14.2.1 Azure-SDK-For-Go/v66.0.0 sql/v5.0 HashiCorp Terraform/1.4.1 (+https://www.terraform.io) Terraform Plugin SDK/2.10.1 terraform-provider-azurerm/dev pid-222c6c49-1b0a-5959-a213-6608f9eb8820
X-Ms-Correlation-Request-Id: <Correlation-Request-Id>
Accept-Encoding: gzip: timestamp=2023-03-19T19:57:27.486+1100

2023-03-19T19:57:27.502+1100 [DEBUG] provider.terraform-provider-azurerm_v3.48.0_x5.exe: AzureRM Response for https://management.azure.com/subscriptions/<subscription>/resourceGroups/test/providers/Microsoft.Sql/managedInstances/<SQL MI Name>?api-version=2020-11-01-preview: 
HTTP/2.0 401 Unauthorized
Content-Length: 172
Cache-Control: no-cache
Content-Type: application/json; charset=utf-8
Date: Sun, 19 Mar 2023 08:57:27 GMT
Expires: -1
Pragma: no-cache
Strict-Transport-Security: max-age=31536000; includeSubDomains
Www-Authenticate: Bearer authorization_uri="https://login.windows.net/<tenant>", error="invalid_token", error_description="The access token has expired."
X-Content-Type-Options: nosniff
X-Ms-Correlation-Request-Id: <Correlation-Request-Id>
X-Ms-Failure-Cause: gateway
X-Ms-Request-Id: <Request-Id 5>
X-Ms-Routing-Request-Id: AUSTRALIASOUTHEAST:20230319T085727Z:<Request-Id 5>

{"error":{"code":"ExpiredAuthenticationToken","message":"The access token expiry UTC time '3/19/2023 8:55:46 AM' is earlier than current UTC time '3/19/2023 8:57:27 AM'."}}: timestamp=2023-03-19T19:57:27.502+1100

2023-03-19T19:59:27.513+1100 [DEBUG] provider.terraform-provider-azurerm_v3.48.0_x5.exe: AzureRM Request: 
GET /subscriptions/<subscription>/resourceGroups/test/providers/Microsoft.Sql/managedInstances/<SQL MI Name>?api-version=2020-11-01-preview HTTP/1.1
Host: management.azure.com
User-Agent: Go/go1.19.3 (amd64-windows) go-autorest/v14.2.1 Azure-SDK-For-Go/v66.0.0 sql/v5.0 HashiCorp Terraform/1.4.1 (+https://www.terraform.io) Terraform Plugin SDK/2.10.1 terraform-provider-azurerm/dev pid-222c6c49-1b0a-5959-a213-6608f9eb8820
X-Ms-Correlation-Request-Id: <Correlation-Request-Id>
Accept-Encoding: gzip: timestamp=2023-03-19T19:59:27.513+1100
2023-03-19T19:59:27.528+1100 [DEBUG] provider.terraform-provider-azurerm_v3.48.0_x5.exe: AzureRM Response for https://management.azure.com/subscriptions/<subscription>/resourceGroups/test/providers/Microsoft.Sql/managedInstances/<SQL MI Name>?api-version=2020-11-01-preview: 
HTTP/2.0 401 Unauthorized
Content-Length: 172
Cache-Control: no-cache
Content-Type: application/json; charset=utf-8
Date: Sun, 19 Mar 2023 08:59:27 GMT
Expires: -1
Pragma: no-cache
Strict-Transport-Security: max-age=31536000; includeSubDomains
Www-Authenticate: Bearer authorization_uri="https://login.windows.net/<tenant>", error="invalid_token", error_description="The access token has expired."
X-Content-Type-Options: nosniff
X-Ms-Correlation-Request-Id: <Correlation-Request-Id>
X-Ms-Failure-Cause: gateway
X-Ms-Request-Id: <Request-Id 6>
X-Ms-Routing-Request-Id: AUSTRALIASOUTHEAST:20230319T085927Z:<Request-Id 6>

{"error":{"code":"ExpiredAuthenticationToken","message":"The access token expiry UTC time '3/19/2023 8:55:46 AM' is earlier than current UTC time '3/19/2023 8:59:27 AM'."}}: timestamp=2023-03-19T19:59:27.528+1100

2023-03-19T19:59:27.528+1100 [ERROR] provider.terraform-provider-azurerm_v3.48.0_x5.exe: Response contains error diagnostic: diagnostic_detail="waiting for creation of Managed Instance: (Name "<SQL MI Name>" / Resource Group "test"): Future#WaitForCompletion: the number of retries has been exceeded: StatusCode=401 -- Original Error: Code="ExpiredAuthenticationToken" Message="The access token expiry UTC time '3/19/2023 8:55:46 AM' is earlier than current UTC time '3/19/2023 8:59:27 AM'."" diagnostic_severity=ERROR diagnostic_summary="waiting for creation of Managed Instance: (Name "<SQL MI Name>" / Resource Group "test"): Future#WaitForCompletion: the number of retries has been exceeded: StatusCode=401 -- Original Error: Code="ExpiredAuthenticationToken" Message="The access token expiry UTC time '3/19/2023 8:55:46 AM' is earlier than current UTC time '3/19/2023 8:59:27 AM'."" tf_req_id=6a4ecbc0-83cf-0097-09fe-4afe278cd697 tf_rpc=ApplyResourceChange @caller=github.com/hashicorp/terraform-plugin-go@v0.14.3/tfprotov5/internal/diag/diagnostics.go:55 @module=sdk.proto tf_provider_addr=provider tf_resource_type=azurerm_mssql_managed_instance tf_proto_version=5.3 timestamp=2023-03-19T19:59:27.528+1100
2023-03-19T19:59:27.557+1100 [ERROR] vertex "module.sql_mi.azurerm_mssql_managed_instance.sql_mi" error: waiting for creation of Managed Instance: (Name "<SQL MI Name>" / Resource Group "test"): Future#WaitForCompletion: the number of retries has been exceeded: StatusCode=401 -- Original Error: Code="ExpiredAuthenticationToken" Message="The access token expiry UTC time '3/19/2023 8:55:46 AM' is earlier than current UTC time '3/19/2023 8:59:27 AM'."
2023-03-19T19:59:27.593+1100 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2023-03-19T19:59:27.621+1100 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/azurerm/3.48.0/windows_amd64/terraform-provider-azurerm_v3.48.0_x5.exe pid=30412
2023-03-19T19:59:27.621+1100 [DEBUG] provider: plugin exited
garbology commented 1 year ago

Can confirm still happening for me when creating sqlmi around the hour mark also with 3.48 , using 3.43 it has no issues

Agree with @verius-code that the issue does not occur with v3.43. Deployment succeeded after approx 3 hours.

manicminer commented 1 year ago

Thanks @op7ic, @verius-code, @garbology for the additional context and log data, I'll keep digging into this.

manicminer commented 1 year ago

I unfortunately haven't yet been able to uncover the cause of, or reproduce this error nor the circumstance of tokens not renewing when they should. Noting that everyone above is using mixed platforms and are in different locations, I've tested with the mentioned services on macOS, Linux and Windows 11, whilst modifying the system timezone around the world from Tokyo to New York (once even during a terraform apply), to no avail.

We have now merged #21100 which extends the buffer time prior to token renewal to 20 minutes, which means that from the next release, tokens will be renewed earlier. I also spotted a bug in handling tokens with shorter issuance validity than 40 minutes, so in that case the token will be refreshed after 50% validity has elapsed - see the following table for examples.

Token Validity Renewed after
60 mins 40 mins
50 mins 35 mins
40 mins 20 mins
30 mins 15 mins
20 mins 10 mins
10 mins 5 mins

Please do report back if you continue to experience this after upgrading to v3.49.0 (to be released later today). Please also include the authentication method you're using, your platform and architecture, and any other pertinent information. Thanks.

garbology commented 1 year ago

Success with v3.49.0

azurerm_mssql_managed_instance.sql_mi:  Creation complete after 2h36m12s 

Thanks @manicminer & @tombuildsstuff

tombuildsstuff commented 1 year ago

Since it sounds like this issue's been resolved, I'm going to close this issue for the moment - but if you're still seeing this issue on the latest version of the Provider then let us know and we'll take another look.

op7ic commented 1 year ago

hi @tombuildsstuff / @manicminer so got a bit of bad news. I started to get timeouts much earlier in the lifecycle of AD creation. While previously it was around an hour mark, I managed to get the same error at around 18min mark now (on v 3.49.0):

azurerm_active_directory_domain_service.aadds: Still creating... [14m40s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [14m50s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [15m0s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [15m10s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [15m20s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [15m30s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [15m40s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [15m50s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [16m0s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [16m10s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [16m20s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [16m30s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [16m40s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [16m50s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [17m0s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [17m10s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [17m20s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [17m30s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [17m40s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [17m50s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [18m0s elapsed]
azurerm_active_directory_domain_service.aadds: Still creating... [18m10s elapsed]

β”‚ Error: creating/updating Domain Service (Name: "xxx", Resource Group: "xxxxx"): polling after CreateOrUpdate: Future#WaitForCompletion: the number of retries has been exceeded: StatusCode=401 -- Original Error: Code="ExpiredAuthenticationToken" Message="The access token expiry UTC time '3/26/2023 12:21:33 PM' is earlier than current UTC time '3/26/2023 12:25:04 PM'."

Clock is synced, and debug log is pretty much the same as last time.

hcevensen commented 1 year ago

I am pretty much experiencing the same as @op7ic , still failing on 3.49.0 now sometimes even earlier it seems, had to revert back to 3.43.0 to get SQLMI provisioned.

iokonto commented 1 year ago

Also observing the same error with azurerm_linux_virtual_machine, azure_backup_protected_vm and many other resource types with latest azure_rm provider 3.49.0. Token expires sometimes after 27 minutes, in other cases after 6 minutes of waiting for provisioning, there is no standard waiting time.

manicminer commented 1 year ago

Thanks for reporting back. This is quite strange behavior, the expiry time should not be shorter. The error must relate to the replaced token, perhaps something is wrong with the second token acquisition πŸ€”

@op7ic, @verius-code, @iokonto, or anyone else affected, please can you confirm the following details about how you are authenticating?

This looks like a lot of data but every scrap of information helps when trying to reproduce this, as so far I haven't been able to do so, and have had to work with some assumptions.

Thanks!

iokonto commented 1 year ago

Terraform v1.4.2 on windows_amd64: native windows 10 pro workstation

Authentication: Azure CLI (2.39) through browser window Only one tenant and one subscription visible to my user with az account list No proxy server (but transparent proxy in corporate network at firewall level)

I hope this information helps. Terraform based automation has become almost 100% problematic due to this issue since upgrade from 3.48.0 but not sure if the issue was not present at 3.48.0 because it seems to get worse day by day. Resources are created normally while Terraform thinks they are not created due to the issue, we are forced to import in state manually a lot of resources upon every execution. Do you know what is the max version we can downgrade to avoid the blocking issue ?

Update: started happening on random resources e.g.:

Error: Future#WaitForCompletion: the number of retries has been exceeded: StatusCode=401 -- Original Error: Code="ExpiredAuthenticationToken" Message="The access
token expiry UTC time '3/29/2023 11:37:45 AM' is earlier than current UTC time '3/29/2023 11:41:41 AM'."
β”‚
β”‚   with azurerm_virtual_machine_extension.controlplane[0],
manicminer commented 1 year ago

@iokonto Many thanks for the info. When signing in to Azure CLI, do you use the default interactive flow, device code flow, or perhaps the Web Auth Manager preview (e.g. az config get core.allow_broker). Do you configure Terraform with a subscription ID or tenant ID, or do you leave this to autodetection?

The most recent version before we switched the auth library is v3.43.0, others have reported that this issue affects 3.44 and onwards.

iokonto commented 1 year ago

Hi @manicminer, Thank you for the version tip. I use the default interactive flow which opens a brower window. One important note: My account is connected to multiple tenants but only one tenant has visible subscriptions. The account is also connected to a tenant that has disabled the account and I get the error "Failed to authenticate xxxx-yyyy-zzzz-wwww 'XYZ' due to error 'AADSTS500571: The guest user account is disabled." So finally I login with: az login --tenant xxxx-yyyy and I only see one tenant and subscription. This tenant has 2FA enabled, code supplied by authenticator app.

hcevensen commented 1 year ago

Thanks for reporting back. This is quite strange behavior, the expiry time should not be shorter. The error must relate to the replaced token, perhaps something is wrong with the second token acquisition πŸ€”

@op7ic, @verius-code, @iokonto, or anyone else affected, please can you confirm the following details about how you are authenticating?

  • the environment you're using - (macOS, windows, linux), running on native hardware or VM (or container on either of those), host OS & guest OS combo if applicable
  • if windows, native or WSL2 (or WSL1), powershell or bash or other?
  • any proxy server in your environment? what type and how is that configured?
  • method of authentication - native service principal via client_id / ARM_CLIENT_ID, Azure CLI, managed identity or OIDC
  • if service principal, with client secret or client certificate? configured via environment variable or provider block?
  • if azure CLI, are you a guest/invited user from another tenant, or working in a sub linked to your home tenant? how many subs & tenants can you see with az account list -o table? are you providing the subscription ID and/or the tenant ID or relying on autodetection? how are you providing this/these, environment variable or provider block?
  • if OIDC, github actions or manual (providing the ID token)? if the latter, via env var or input variable?
  • any other pertinent information I haven't thought of.

This looks like a lot of data but every scrap of information helps when trying to reproduce this, as so far I haven't been able to do so, and have had to work with some assumptions.

Thanks!

MacOS when running locally , also fails when running on Linux vm hosted agent in pipeline. No proxy. Service principal with client secret, provider block.

iokonto commented 1 year ago

Update: I installed the newest Azure CLI 2.46.0, performed logout and login again with 2FA directly to the correct tenant. Now ExpiredAuthenticationToken error appears on resources taking more than 50 minutes to provision. Is there any case it has to do with the time 2FA was enforced by organization while Azure CLI/Terraform was probably already logged in ?

op7ic commented 1 year ago

Thanks for reporting back. This is quite strange behavior, the expiry time should not be shorter. The error must relate to the replaced token, perhaps something is wrong with the second token acquisition πŸ€”

@op7ic, @verius-code, @iokonto, or anyone else affected, please can you confirm the following details about how you are authenticating?

  • the environment you're using - (macOS, windows, linux), running on native hardware or VM (or container on either of those), host OS & guest OS combo if applicable
  • if windows, native or WSL2 (or WSL1), powershell or bash or other?
  • any proxy server in your environment? what type and how is that configured?
  • method of authentication - native service principal via client_id / ARM_CLIENT_ID, Azure CLI, managed identity or OIDC
  • if service principal, with client secret or client certificate? configured via environment variable or provider block?
  • if azure CLI, are you a guest/invited user from another tenant, or working in a sub linked to your home tenant? how many subs & tenants can you see with az account list -o table? are you providing the subscription ID and/or the tenant ID or relying on autodetection? how are you providing this/these, environment variable or provider block?
  • if OIDC, github actions or manual (providing the ID token)? if the latter, via env var or input variable?
  • any other pertinent information I haven't thought of.

This looks like a lot of data but every scrap of information helps when trying to reproduce this, as so far I haven't been able to do so, and have had to work with some assumptions.

Thanks!

CSanches commented 1 year ago

provider "azurerm" { skip_provider_registration = "true" subscription_id = "XXXXXXXXXXXXXx" disable_correlation_request_id = true features {} }

The server has correct date/time configure. The same happens if I execute the comamnd locally, not using a Server under VM.

Terraform -v: Terraform v1.4.2 on windows_amd64

garbology commented 1 year ago

For my successful deployment I executed Terraform from the PowerShelll console (PS v7.2.9) within VSCode on Win10 Pro. SQL MI was deployed by Service Principal, supplying creds as environment variables and using a SP client secret.

ie.
$Env:ARM_SUBSCRIPTION_ID=""
$Env:ARM_TENANT_ID=""
$Env:ARM_CLIENT_ID=""
$Env:ARM_CLIENT_SECRET=""
Terraform v1.4.1
on windows_amd64
+ provider registry.terraform.io/hashicorp/azurerm v3.49.0
magodo commented 1 year ago

One thing worth mentioning that azurerm_active_directory_domain_service (reported at the begining of this issue) is using the go-azure-sdk client, hence using the new transport layer. Whilst there are also reports for the same expiry issue for different resources like SQLMI (https://github.com/hashicorp/terraform-provider-azurerm/issues/20867#issuecomment-1484118607, and #21361), which are using the Azure track1 SDK, hence the go-autorest transport layer.

LaurentLesle commented 1 year ago

on the error #213261 I reported I am using a service principal in an ubuntu container amd64. I have also the az cli connected with the sp

export ARM_CLIENT_ID=xxx
# and all other export variable for azurerm
az login --service-principal -u $ARM_CLIENT_ID -p $ARM_CLIENT_SECRET -t $ARM_TENANT_ID
terraform apply
manicminer commented 1 year ago

@magodo That's true, however both client types are using the go-azure-sdk/sdk/auth package for authentication.

magodo commented 1 year ago

@manicminer Sorry to correct my previous comment: though the ADDS is using the go-azure-sdk, it is still using the go-autorest transport layer.

Meanwhile, I've noticed almost all the reports were about using azure cli to auth. Since the go-autorest's async polling retries on token expiry error (though the new auth package's default policy seems not, but not related here), so I don't think the race condition fix in https://github.com/hashicorp/go-azure-sdk/pull/362 matters too much here. When I look at the tokenDueForRenewal function, it assumes the oauth2.Token has set the Expiry: https://github.com/hashicorp/go-azure-sdk/blob/63408294321d6e958d45999dbed0105e35150e6e/sdk/auth/token.go#L21-L24

    // Some tokens may never expire
    if token.Expiry.IsZero() {
        return false
    }

This is correctly set for client credentials: https://github.com/hashicorp/go-azure-sdk/blob/63408294321d6e958d45999dbed0105e35150e6e/sdk/auth/client_credentials.go#L343-L360

    token := &oauth2.Token{
        AccessToken: tokenRes.AccessToken,
        TokenType:   tokenRes.TokenType,
    }

    var secs time.Duration
    if exp, ok := tokenRes.ExpiresIn.(string); ok && exp != "" {
        if v, err := strconv.Atoi(exp); err == nil {
            secs = time.Duration(v)
        }
    } else if exp, ok := tokenRes.ExpiresIn.(int64); ok {
        secs = time.Duration(exp)
    } else if exp, ok := tokenRes.ExpiresIn.(float64); ok {
        secs = time.Duration(exp)
    }
    if secs > 0 {
        token.Expiry = time.Now().Add(secs * time.Second)
    }

While seems not for the CLI auth: https://github.com/hashicorp/go-azure-sdk/blob/63408294321d6e958d45999dbed0105e35150e6e/sdk/auth/azure_cli_authorizer.go#L84-L87

    return &oauth2.Token{
        AccessToken: token.AccessToken,
        TokenType:   token.TokenType,
    }, nil

So that is probably the reason why token by CLI auth always expire (and the time that users reports are a bit of random)?

manicminer commented 1 year ago

@magodo Thanks that's a great catch, and might explain this issue for those using Azure CLI. Either way, that's definitely a bug and one I'll fix shortly.

stevehipwell commented 1 year ago

@manicminer what's the status of the changes to the SDK and this issue?

manicminer commented 1 year ago

We'll be merging a fix for Azure CLI authentication in today's release which we're hoping will resolve this for those using that authentication method. Further testing for other authentication methods is still ongoing.

tombuildsstuff commented 1 year ago

x-ref: https://github.com/hashicorp/terraform-provider-azurerm/pull/21583

saujanyasoni commented 1 year ago

@katbyte I can see you have update milestones, since v.3.54.0 is released; Is ExpiredAuthenticationToken issue resolved ? I have some production deployments are on hold because of this. @manicminer - Any update from your end ?

manicminer commented 1 year ago

@saujanyasoni This is a complex issue which may have more than one root cause. As mentioned earlier, we released a fix for Azure CLI authentication yesterday in v3.54.0 which may resolve the issue for those using that authentication method. However, other potential causes remain under investigation and so this issue is being kept open until we are confident it has been fixed. Please continue to test and report back, as this is the only way we can get meaningful feedback. Thanks!

saujanyasoni commented 1 year ago

image

@manicminer - It seems issue resolved for me after latest release. However other testers should also test this to be confirmed.

cveld commented 1 year ago

on the error #213261 I reported I am using a service principal in an ubuntu container amd64. I have also the az cli connected with the sp

@laurentlesle It seems you are authenticating with az cli through service principal -- I didn't know that was supported?

https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/guides/azure_cli

Authenticating via the Azure CLI is only supported when using a User Account. If you're using a Service Principal (for example via az login --service-principal) you should instead authenticate via the Service Principal directly (either using a Client Secret or a Client Certificate).

Would be great if the team could enable this 😊 This remediates the requirement for setting credentials through environment variables.

LaurentLesle commented 1 year ago

Hi @cveld no I use the ARM_CLIENT_ID, ARM_CLIENT_SECRET to deploy with service principal.

manicminer commented 1 year ago

@LaurentLesle Are you still seeing this issue with long-running operations? In an earlier comment you stated you were authenticating via Azure CLI - would you be able to confirm whether you are seeing this and if you are authenticating via az-cli or using the provider's native support for service principal authentication? Thanks!

LaurentLesle commented 1 year ago

I can confirm it is now working with the version 3.55.0. Thanks all for the fix.

tombuildsstuff commented 1 year ago

πŸ‘ thanks for the confirmation @LaurentLesle - since this appears to be resolved I'm going to close this issue for the moment, if you're still seeing this issue with the latest version of the Provider then please feel free to open a new issue and we can take another look.

Thanks!

SunnyOswal commented 1 year ago

getting same issue with 3.56.0 Error: waiting for enabling HTTPS: Future#WaitForCompletion: the number of retries has been exceeded: StatusCode=401 -- Original Error: Code="ExpiredAuthenticationToken" Message="The access token expiry UTC time '5/16/2023 5:03:53 PM' is earlier than current UTC time '5/16/2023 5:07:17 PM'."