microsoft / azure-pipelines-agent

Azure Pipelines Agent 🚀
MIT License
1.7k stars 856 forks source link

[BUG]: ContainerOperationsProvider is unauthorized to convert AADToken to ACR AccessToken #4742

Open WilliamXieMSFT opened 3 months ago

WilliamXieMSFT commented 3 months ago

What happened?

Given the recent push towards using ManagedServiceIdentity, we've updated our service connection from DockerRegistry backed by ServicePrincipal to one that's backed by User-Assigned Managed Service Identity.

Our yaml schema relies on the jobs.job.containers and are passing in the service connection as the endpoint:

jobs:
- job:
  displayName: ... 
  pool: ...
  container:
    image: ${{ parameters.ContainerImage }}
    endpoint: ${{ parameters.Endpoint }}
    options: ${{ parameters.ContainerOptions }}

However, the Initialize Container step fails with Unauthorized. From the build logs, this is the line of code throwing: https://github.com/microsoft/azure-pipelines-agent/blob/ce65205f5cd516d293753f502eaad6a93672c716/src/Agent.Worker/ContainerOperationProvider.cs#L228C31-L228C52

I've verified the managed identity has both AcrPull and AcrPush roles to our ACR. This is either a configuration issue on our side (which I'd happily fix once I understand what to do) or it's a bug in the ContainerOperationsProvider. Looking at the code, I think it might be due to the AADToken that's returned from GetMSIAccessToken was created using a ManagedIdentityCredential with a null clientid.

Build Logs:

[debug]Evaluating condition for step: 'Initialize containers'

[debug]Evaluating: SucceededNode()

[debug]Evaluating SucceededNode:

[debug]=> True

[debug]Result: True

Starting: Initialize containers DockerActionRetries variable value: True /usr/bin/docker version --format '{{.Server.APIVersion}}' '1.43' Docker daemon API version: '1.43' /usr/bin/docker version --format '{{.Client.APIVersion}}' '1.43' Docker client API version: '1.43'

[debug]Delete stale containers from previous jobs

/usr/bin/docker ps --all --quiet --no-trunc --filter "label=0bb77a"

[debug]Delete stale container networks from previous jobs

/usr/bin/docker network prune --force --filter "label=0bb77a"

[debug]Attempting to get endpoint authorization scheme...

[debug]Retrieving AAD token using MSI authentication...

[debug]Successfully retrieved AAD token using the MSI authentication scheme.

[debug]Attempting to convert AAD token to an ACR token

[debug]Status Code: Unauthorized

[error]Could not fetch access token for ACR. Please configure Managed Service Identity (MSI) for Azure Container Registry with the appropriate permissions - https://docs.microsoft.com/en-us/azure/app-service/tutorial-custom-container?pivots=container-linux#configure-app-service-to-deploy-the-image-from-the-registry.

[debug]System.NotSupportedException: Could not fetch access token for ACR. Please configure Managed Service Identity (MSI) for Azure Container Registry with the appropriate permissions - https://docs.microsoft.com/en-us/azure/app-service/tutorial-custom-container?pivots=container-linux#configure-app-service-to-deploy-the-image-from-the-registry.

at Microsoft.VisualStudio.Services.Agent.Worker.ContainerOperationProvider.GetAcrPasswordFromAADToken(IExecutionContext executionContext, String AADToken, String tenantId, String registryServer, String loginServer) in /mnt/vss/_work/1/s/src/Agent.Worker/ContainerOperationProvider.cs:line 228 at Microsoft.VisualStudio.Services.Agent.Worker.ContainerOperationProvider.PullContainerAsync(IExecutionContext executionContext, ContainerInfo container) in /mnt/vss/_work/1/s/src/Agent.Worker/ContainerOperationProvider.cs:line 303 at Microsoft.VisualStudio.Services.Agent.Worker.ContainerOperationProvider.StartContainersAsync(IExecutionContext executionContext, Object data) in /mnt/vss/_work/1/s/src/Agent.Worker/ContainerOperationProvider.cs:line 116 at Microsoft.VisualStudio.Services.Agent.Worker.JobExtensionRunner.RunAsync() in /mnt/vss/_work/1/s/src/Agent.Worker/JobExtensionRunner.cs:line 38 at Microsoft.VisualStudio.Services.Agent.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken) in /mnt/vss/_work/1/s/src/Agent.Worker/StepsRunner.cs:line 264 Finishing: Initialize containers

Versions

Agent name: 'AzurePipelines-EO 10' Agent machine name: '80d8db40c000000' Current agent version: '3.236.1'

Operating System Image: ubuntu-20.04 LTS Version: 20240324.1.0

Environment type (Please select at least one enviroment where you face this issue)

Azure DevOps Server type

dev.azure.com (formerly visualstudio.com)

Azure DevOps Server Version (if applicable)

No response

Operation system

No response

Version controll system

No response

Relevant log output

##[debug]Attempting to get endpoint authorization scheme...
##[debug]Retrieving AAD token using MSI authentication...
##[debug]Successfully retrieved AAD token using the MSI authentication scheme.
##[debug]Attempting to convert AAD token to an ACR token
##[debug]Status Code: Unauthorized
##[error]Could not fetch access token for ACR. Please configure Managed Service Identity (MSI) for Azure Container Registry with the appropriate permissions - https://docs.microsoft.com/en-us/azure/app-service/tutorial-custom-container?pivots=container-linux#configure-app-service-to-deploy-the-image-from-the-registry.
##[debug]System.NotSupportedException: Could not fetch access token for ACR. Please configure Managed Service Identity (MSI) for Azure Container Registry with the appropriate permissions - https://docs.microsoft.com/en-us/azure/app-service/tutorial-custom-container?pivots=container-linux#configure-app-service-to-deploy-the-image-from-the-registry.
   at Microsoft.VisualStudio.Services.Agent.Worker.ContainerOperationProvider.GetAcrPasswordFromAADToken(IExecutionContext executionContext, String AADToken, String tenantId, String registryServer, String loginServer) in /mnt/vss/_work/1/s/src/Agent.Worker/ContainerOperationProvider.cs:line 228
   at Microsoft.VisualStudio.Services.Agent.Worker.ContainerOperationProvider.PullContainerAsync(IExecutionContext executionContext, ContainerInfo container) in /mnt/vss/_work/1/s/src/Agent.Worker/ContainerOperationProvider.cs:line 303
   at Microsoft.VisualStudio.Services.Agent.Worker.ContainerOperationProvider.StartContainersAsync(IExecutionContext executionContext, Object data) in /mnt/vss/_work/1/s/src/Agent.Worker/ContainerOperationProvider.cs:line 116
   at Microsoft.VisualStudio.Services.Agent.Worker.JobExtensionRunner.RunAsync() in /mnt/vss/_work/1/s/src/Agent.Worker/JobExtensionRunner.cs:line 38
   at Microsoft.VisualStudio.Services.Agent.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken) in /mnt/vss/_work/1/s/src/Agent.Worker/StepsRunner.cs:line 264
vmapetr commented 3 months ago

Hi @WilliamXieMSFT, thanks for reporting! We are working on more prioritized issues at the moment, but will get back to this one soon.