Open dks0296586 opened 1 year ago
I was able to find a workaround to get Workload Identity working by using these helm values:
podLabels:
azure.workload.identity/use: "true"
rbac:
serviceAccount:
create: false
name: workload-identity-sa
azureAuthentication:
mode: "UserAssignedManagedIdentity"
identity:
id: <Client ID>
azureAuthentication.identity.id needs to be set to the client id associated with the workload-identity-sa service account, so that it passes Promitor startup validation, and it needs to be the correct client id so that it doesn't override to an incorrect value when calling ManagedIdentityCredential:
tokenCredential = new ManagedIdentityCredential(authenticationInfo.IdentityId, tokenCredentialOptions);
I have made some small changes to the AzureAuthenticationFactory to make this a little more streamlined. @tomkerkhove would you like me to submit a PR for review?
100%, thanks a ton!
I made an assumption that the changes/workaround config would also work for the Scraper. I was testing the Scraper and am now getting different Identity issues.
Will spend some time looking at the Azure Monitor auth flow and see if I can find a resolution.
[20:56:50 FTL] Failed to scrape resource for metric 'azure_storageaccount_success_server_latency_api_name_average'
System.Net.Http.HttpRequestException: Code: 400 ReasonReasonPhrase: Bad Request Body: {"error":"invalid_request","error_description":"Identity not found"}
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.RetrieveTokenFromIMDSWithRetryAsync(String resource, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetTokenFromIMDSEndpointAsync(String resource, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetAuthenticationHeaderForVirtualMachineAsync(String resource, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetAuthenticationHeaderAsync(CancellationToken cancellationToken)
at Microsoft.Rest.TokenCredentials.ProcessHttpRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.AzureCredentials.ProcessHttpRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsOperations.ListWithHttpMessagesAsync(String resourceUri, String metricnamespace, Dictionary`2 customHeaders, CancellationToken cancellationToken)
at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsOperationsExtensions.ListAsync(IMetricDefinitionsOperations operations, String resourceUri, String metricnamespace, CancellationToken cancellationToken)
at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsImpl.ListByResourceAsync(String resourceId, CancellationToken cancellationToken)
at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsImpl.Microsoft.Azure.Management.Monitor.Fluent.IMetricDefinitions.ListByResourceAsync(String resourceId, CancellationToken cancellationToken)
at Promitor.Integrations.AzureMonitor.AzureMonitorClient.GetMetricDefinitionsAsync(String resourceId) in /src/Promitor.Integrations.AzureMonitor/AzureMonitorClient.cs:line 128
at Promitor.Integrations.AzureMonitor.AzureMonitorClient.QueryMetricAsync(String metricName, String metricDimension, AggregationType aggregationType, TimeSpan aggregationInterval, String resourceId, String metricFilter, Nullable`1 metricLimit) in /src/Promitor.Integrations.AzureMonitor/AzureMonitorClient.cs:line 84
at Promitor.Core.Scraping.AzureMonitorScraper`1.ScrapeResourceAsync(String subscriptionId, ScrapeDefinition`1 scrapeDefinition, TResourceDefinition resourceDefinition, AggregationType aggregationType, TimeSpan aggregationInterval) in /src/Promitor.Core.Scraping/AzureMonitorScraper.cs:line 54
at Promitor.Core.Scraping.Scraper`1.ScrapeAsync(ScrapeDefinition`1 scrapeDefinition) in /src/Promitor.Core.Scraping/Scraper.cs:line 78
Maybe this is also because of https://github.com/tomkerkhove/promitor/issues/2160 & https://github.com/tomkerkhove/promitor/issues/2209
It looks to be due to Azure Monitor integration using Microsoft.Azure.Management.ResourceManager.Fluent.Authentication
instead of the Azure SDK/Azure Identity library like Resource Discovery is using.
There is a way to get it working with Scraper by using the Azure Workload Identity proxy sidecar
Interesting, thanks for sharing!
I'm having the same issue - workload-identity with resource-discovery works great using:
podLabels:
azure.workload.identity/use: "true"
rbac:
serviceAccount:
create: false
name: workload-identity-sa
azureAuthentication:
mode: UserAssignedManagedIdentity
but similar settings for scraper don't work (note that I have to also explicitly add the clientId for the workload identity, and provide tenantId and a default subscriptionId, or startup validation will fail):
podLabels:
azure.workload.identity/use: "true"
rbac:
serviceAccount:
create: false
name: workload-identity-sa
azureAuthentication:
mode: UserAssignedManagedIdentity
identity:
id: 00000000-0000-0000-0000-000000000000
azureMetadata:
tenantId: 00000000-0000-0000-0000-000000000000
subscriptionId: 00000000-0000-0000-0000-000000000000
and I get Multiple user assigned identities exist, please specify the clientId / resourceId of the identity in the token request
:
[14:13:16 FTL] Failed to scrape resource for metric 'azure_service_bus_active_messages'
System.Net.Http.HttpRequestException: Code: 400 ReasonReasonPhrase: Bad Request Body: {"error":"invalid_request","error_description":"Multiple user assigned identities exist, please specify the clientId / resourceId of the identity in the token request"}
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.RetrieveTokenFromIMDSWithRetryAsync(String resource, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetTokenFromIMDSEndpointAsync(String resource, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetAuthenticationHeaderForVirtualMachineAsync(String resource, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetAuthenticationHeaderAsync(CancellationToken cancellationToken)
at Microsoft.Rest.TokenCredentials.ProcessHttpRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.AzureCredentials.ProcessHttpRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsOperations.ListWithHttpMessagesAsync(String resourceUri, String metricnamespace, Dictionary`2 customHeaders, CancellationToken cancellationToken)
at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsOperationsExtensions.ListAsync(IMetricDefinitionsOperations operations, String resourceUri, String metricnamespace, CancellationToken cancellationToken)
at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsImpl.ListByResourceAsync(String resourceId, CancellationToken cancellationToken)
at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsImpl.Microsoft.Azure.Management.Monitor.Fluent.IMetricDefinitions.ListByResourceAsync(String resourceId, CancellationToken cancellationToken)
at Promitor.Integrations.AzureMonitor.AzureMonitorClient.GetMetricDefinitionsAsync(String resourceId) in /src/Promitor.Integrations.AzureMonitor/AzureMonitorClient.cs:line 128
at Promitor.Integrations.AzureMonitor.AzureMonitorClient.QueryMetricAsync(String metricName, String metricDimension, AggregationType aggregationType, TimeSpan aggregationInterval, String resourceId, String metricFilter, Nullable`1 metricLimit) in /src/Promitor.Integrations.AzureMonitor/AzureMonitorClient.cs:line 84
at Promitor.Core.Scraping.AzureMonitorScraper`1.ScrapeResourceAsync(String subscriptionId, ScrapeDefinition`1 scrapeDefinition, TResourceDefinition resourceDefinition, AggregationType aggregationType, TimeSpan aggregationInterval) in /src/Promitor.Core.Scraping/AzureMonitorScraper.cs:line 54
at Promitor.Core.Scraping.Scraper`1.ScrapeAsync(ScrapeDefinition`1 scrapeDefinition) in /src/Promitor.Core.Scraping/Scraper.cs:line 78
Unfortunately I can't get the workload identity proxy sidecar workaround to work (I think because I don't have the legacy pod identity support in my AKS cluster).
I'm not a C# developer at all, but I suspect that if it were possible to get to the default
case in this switch statement and rely on DefaultAzureCredential()
then it might "just work" as a result of the environment variables that Workload Identity injects (AZURE_TENANT_ID
, AZURE_CLIENT_ID
, AZURE_FEDERATED_TOKEN_FILE
, and AZURE_AUTHORITY_HOST
):
switch (authenticationInfo.Mode)
{
case AuthenticationMode.ServicePrincipal:
tokenCredential = new ClientSecretCredential(tenantId, authenticationInfo.IdentityId, authenticationInfo.Secret, tokenCredentialOptions);
break;
case AuthenticationMode.UserAssignedManagedIdentity:
var clientId = authenticationInfo.GetIdentityIdOrDefault();
tokenCredential = new ManagedIdentityCredential(clientId, tokenCredentialOptions);
break;
case AuthenticationMode.SystemAssignedManagedIdentity:
tokenCredential = new ManagedIdentityCredential(options:tokenCredentialOptions);
break;
default:
tokenCredential = new DefaultAzureCredential(); // <-- doesn't appear to be possible to get here, but I think this might work
break;
}
(The reason I think this might work is because I've used Azure Identity's Python SDK with user-assigned managed identity and DefaultAzureCredential()
with no problems.)
Based on the configuration, it should use ManagedIdentityCredential though which is reflected in the logs from above:
at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.RetrieveTokenFromIMDSWithRetryAsync(String resource, CancellationToken cancellationToken) at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetTokenFromIMDSEndpointAsync(String resource, CancellationToken cancellationToken) at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetAuthenticationHeaderForVirtualMachineAsync(String resource, CancellationToken cancellationToken) at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.MSITokenProvider.GetAuthenticationHeaderAsync(CancellationToken cancellationToken) at Microsoft.Rest.TokenCredentials.ProcessHttpRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken) at Microsoft.Azure.Management.ResourceManager.Fluent.Authentication.AzureCredentials.ProcessHttpRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken) at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsOperations.ListWithHttpMessagesAsync(String resourceUri, String metricnamespace, Dictionary`2 customHeaders, CancellationToken cancellationToken)
This is using aad identity though, not workload identity. I haven't used the proxy though, maybe @dks0296586 can help a bit on this?
@davecaplinger
Here are the labels and annotations I needed to use for the sidecar proxy to work
securityContext:
runAsNonRoot: false #Required for Azure Workload Identity Proxy Injection
podLabels:
azure.workload.identity/use: "true"
annotations:
azure.workload.identity/inject-proxy-sidecar: "true"
azure.workload.identity/proxy-sidecar-port: "8080"
I don't believe you need to have aad-pod-identity configured to make use of the workload identity sidecar. I think since the resource discovery is working with workload identity, its just a matter of getting the sidecar proxy working correctly.
I was missing the securityContext
setting; I'll give that a shot. Thanks!
Hi, we have same problem, but using securityContext .runAsNonRoot = false
it works.
Unfortunately, this is not safe solution to add root privileges to the container. We have Azure Defender on AKS firing alerts (recommendations) when using this setting.
Is there any way to not use it? Why agent-scraper needs root privileges on the node?
@tomkerkhove @dks0296586 any news on that?
No, unfortunately I am not actively contributing code anymore but happy to review PRs.
Learn more on https://blog.tomkerkhove.be/2023/12/09/the-future-of-promitor/
Proposal
With aad-pod-identity being deprecated in favor of Azure Workload Identity, Promitor should support Workload Identity.
In my testing using the current version of Resource Discovery, attempting to use Workload Identity results in the following error:
AADSTS70021: No matching federated identity record found for presented assertion.
I don't believe this is a configuration issue on my end, as I have verified the configuration using the azwi quick-start guide and got that working as expected.
Component
Resource Discovery, Scraper
Contact Details
benjamin.lawson@dcsg.com