kubernetes-sigs / external-dns

Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services
Apache License 2.0
7.73k stars 2.57k forks source link

Azure Managed Identity - Failed to refresh the Token for request #2489

Closed 1stewart closed 2 years ago

1stewart commented 2 years ago

What happened:

Receiving below error when running with a managed identity

level=error msg="azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/**REDACTED**/resourceGroups/**REDACTED**/providers/Microsoft.Network/dnsZones?api-version=2018-05-01: StatusCode=404 -- Original Error: adal: Refresh request failed. Status Code = '404'. Response body: getting assigned identities for pod external-dns/external-dns-79784f8fc5-vh8ph in CREATED state failed after 16 attempts, retry duration [5]s, error: <nil>\n Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=b8f3f**REDACTED**00d7a&resource=https%3A%2F%2Fmanagement.core.windows.net%2F"

What you expected to happen

Retrieve DNS records

How to reproduce it (as minimally and precisely as possible):

Get managed identity used by kubeletidentity (managed identity given contributor to whole subscription containing AKS/VMSS & DNS zone for testing)

az aks show -n **REDACTED** -g **REDACTED**  --query "identityProfile.kubeletidentity.clientId" -o tsv
The behavior of this command has been altered by the following extension: aks-preview
b8f3f**REDACTED**00d7a

Set up secret content as below:

{
  "tenantId": "d481b**REDACTED**d1a50",
  "subscriptionId": "9c3bd**REDACTED**6669fa",
  "resourceGroup": "**REDACTED**, (trimmed because it could potentially give away the company)
  "useManagedIdentityExtension": true,
  "userAssignedIdentityID": "b8f3f**REDACTED**00d7a"
}

Deploy via helm with debug level, gather logs

PS C:\Users\**REDACTED**\Downloads> stern -n external-dns external -t --since 1h -o raw --tail 10
+ external-dns-79784f8fc5-vh8ph › external-dns
2021-12-21T14:42:50.424355143Z time="2021-12-21T14:42:50Z" level=info msg="config: {APIServerURL: KubeConfig: RequestTimeout:30s DefaultTargets:[] ContourLoadBalancerService:heptio-contour/contour GlooNamespace:gloo-system SkipperRouteGroupVersion:zalando.org/v1 Sources:[ingress] Namespace: AnnotationFilter: LabelFilter: FQDNTemplate: CombineFQDNAndAnnotation:false IgnoreHostnameAnnotation:false IgnoreIngressTLSSpec:false IgnoreIngressRulesSpec:false Compatibility: PublishInternal:false PublishHostIP:false AlwaysPublishNotReadyAddresses:false ConnectorSourceServer:localhost:8080 Provider:azure GoogleProject: GoogleBatchChangeSize:1000 GoogleBatchChangeInterval:1s GoogleZoneVisibility: DomainFilter:[] ExcludeDomains:[] RegexDomainFilter: RegexDomainExclusion: ZoneNameFilter:[] ZoneIDFilter:[] AlibabaCloudConfigFile:/etc/kubernetes/alibaba-cloud.json AlibabaCloudZoneType: AWSZoneType: AWSZoneTagFilter:[] AWSAssumeRole: AWSBatchChangeSize:1000 AWSBatchChangeInterval:1s AWSEvaluateTargetHealth:true AWSAPIRetries:3 AWSPreferCNAME:false AWSZoneCacheDuration:0s AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: AzureSubscriptionID: AzureUserAssignedIdentityClientID: BluecatConfigFile:/etc/kubernetes/bluecat.json CloudflareProxied:false CloudflareZonesPerPage:50 CoreDNSPrefix:/skydns/ RcodezeroTXTEncrypt:false AkamaiServiceConsumerDomain: AkamaiClientToken: AkamaiClientSecret: AkamaiAccessToken: AkamaiEdgercPath: AkamaiEdgercSection: InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InfobloxView: InfobloxMaxResults:0 InfobloxFQDNRegEx: InfobloxCreatePTR:false DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 OCIConfigFile:/etc/kubernetes/oci.yaml InMemoryZones:[] OVHEndpoint:ovh-eu OVHApiRateLimit:20 PDNSServer:http://localhost:8081 PDNSAPIKey: PDNSTLSEnabled:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:upsert-only Registry:txt TXTOwnerID:default TXTPrefix: TXTSuffix: Interval:1m0s MinEventSyncInterval:5s Once:false DryRun:false UpdateEvents:false LogFormat:text MetricsAddress::7979 LogLevel:debug TXTCacheInterval:0s TXTWildcardReplacement: ExoscaleEndpoint:https://api.exoscale.ch/dns ExoscaleAPIKey: ExoscaleAPISecret: CRDSourceAPIVersion:externaldns.k8s.io/v1alpha1 CRDSourceKind:DNSEndpoint ServiceTypeFilter:[] CFAPIEndpoint: CFUsername: CFPassword: RFC2136Host: RFC2136Port:0 RFC2136Zone: RFC2136Insecure:false RFC2136GSSTSIG:false RFC2136KerberosRealm: RFC2136KerberosUsername: RFC2136KerberosPassword: RFC2136TSIGKeyName: RFC2136TSIGSecret: RFC2136TSIGSecretAlg: RFC2136TAXFR:false RFC2136MinTTL:0s RFC2136BatchChangeSize:50 NS1Endpoint: NS1IgnoreSSL:false NS1MinTTLSeconds:0 TransIPAccountName: TransIPPrivateKeyFile: DigitalOceanAPIPageSize:50 ManagedDNSRecordTypes:[A CNAME] GoDaddyAPIKey: GoDaddySecretKey: GoDaddyTTL:0 GoDaddyOTE:false OCPRouterName:}"
2021-12-21T14:42:50.424426044Z time="2021-12-21T14:42:50Z" level=info msg="Instantiating new Kubernetes client"
2021-12-21T14:42:50.424436844Z time="2021-12-21T14:42:50Z" level=debug msg="apiServerURL: "
2021-12-21T14:42:50.424633246Z time="2021-12-21T14:42:50Z" level=debug msg="kubeConfig: "
2021-12-21T14:42:50.424651147Z time="2021-12-21T14:42:50Z" level=info msg="Using inCluster-config based on serviceaccount-token"
2021-12-21T14:42:50.428767798Z time="2021-12-21T14:42:50Z" level=info msg="Created Kubernetes client https://192.168.0.1:443"
2021-12-21T14:42:50.529605066Z time="2021-12-21T14:42:50Z" level=info msg="Using managed identity extension to retrieve access token for Azure API."
2021-12-21T14:42:50.529635666Z time="2021-12-21T14:42:50Z" level=info msg="Resolving to user assigned identity, client id is b8f3f**REDACTED**00d7a."
2021-12-21T14:42:55.531194984Z time="2021-12-21T14:42:55Z" level=debug msg="Retrieving Azure DNS zones for resource group: **REDACTED**."
2021-12-21T14:51:27.605529895Z time="2021-12-21T14:51:27Z" level=error msg="azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/9c3bd**REDACTED**6669fa/resourceGroups/**REDACTED**/providers/Microsoft.Network/dnsZones?api-version=2018-05-01: StatusCode=404 -- Original Error: adal: Refresh request failed. Status Code = '404'. Response body: getting assigned identities for pod external-dns/external-dns-79784f8fc5-vh8ph in CREATED state failed after 16 attempts, retry duration [5]s, error: <nil>\n Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=b8f3f**REDACTED**00d7a&resource=https%3A%2F%2Fmanagement.core.windows.net%2F"
2021-12-21T14:51:27.605599096Z time="2021-12-21T14:51:27Z" level=debug msg="Retrieving Azure DNS zones for resource group: **REDACTED**."

Setup works as expected when using a service principal with same permissions to subscription, in same tenant, with config like below:

{
  "tenantId": "d481b**REDACTED**d1a50",
  "subscriptionId": "9c3bd**REDACTED**669fa",
  "resourceGroup": "**REDACTED**",
  "aadClientId": "3c1c2**REDACTED**2e93b",
  "aadClientSecret": "**REDACTED**"
}

Anything else we need to know?:

We've got basic connectivity to the internal IP

~ $ hostname
external-dns-79784f8fc5-vh8ph
~ $ nc 169.254.169.254 80

HTTP/1.1 400 Bad Request
Content-Type: text/plain; charset=utf-8
Connection: close

Environment:

1stewart commented 2 years ago

Nevermind, got it working. I was missing the basic aad-pod-identity stuff (binding label, identity, identitybinding) since I assumed the config in azure.json was sufficient. I could see the same error on the nmi pod, so after reviewing: https://github.com/kubernetes-sigs/external-dns/issues/1456, and the required steps for an MSI in https://azure.github.io/aad-pod-identity/docs/demo/standard_walkthrough/, it works as expected.

Is this an issue with the documentation (https://github.com/kubernetes-sigs/external-dns/blob/master/docs/tutorials/azure.md) missing the aad-pod-identity parts (there's no label in the deployment example), or would those examples work on newer AKS versions/via the extension, which perhaps could be made clearer in the guide how to verify the environment is suitable.