microsoft / azure-pipelines-agent

Azure Pipelines Agent 🚀
MIT License
1.7k stars 857 forks source link

[BUG]: v3.230.2 not starting jobs | _apis/FeatureFlags/DistributedTask.Agent.UseMaskingPerformanceEnhancements is not authorized #4562

Open jensheidrich-acn opened 7 months ago

jensheidrich-acn commented 7 months ago

What happened?

Hi, Installed a new Agent (v.3.230.2) on a new Build Machine (MAC OS Sonoma). Configured it against our Azure DevOps Server 2020 Update 1.2

Starting an existing build pipeline....process is stuck at the very beginning.

Agent logs showing:

ERR  VisualStudioServices] GET request to https://xxx.com/_apis/FeatureFlags/DistributedTask.Agent.UseMaskingPerformanceEnhancements is not authorized. Details: TF400813: The user 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' is not authorized to access this resource.
INFO MessageListener] Sleeping for 13.746 seconds before retrying.
[2023-12-11 14:53:27Z INFO JobDispatcher] Successfully renew job request 894423, job is valid till 11.12.2023 15:03:27
...

Potentially introduced here: https://github.com/microsoft/azure-pipelines-agent/pull/4415

Versions

Azure Pipeline Agent v3.230.2 Azure DevOps Server 2020 Update 1.2 MAC OS Sonoma

Environment type (Please select at least one enviroment where you face this issue)

Azure DevOps Server type

Azure DevOps Server (Please specify exact version in the textbox below)

Azure DevOps Server Version (if applicable)

Azure DevOps Server 2020 Update 1.2

Operation system

No response

Version controll system

No response

Relevant log output

ERR  VisualStudioServices] GET request to https://xxx.com/_apis/FeatureFlags/DistributedTask.Agent.UseMaskingPerformanceEnhancements is not authorized. Details: TF400813: The user 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' is not authorized to access this resource.
INFO MessageListener] Sleeping for 13.746 seconds before retrying.
[2023-12-11 14:53:27Z INFO JobDispatcher] Successfully renew job request 894423, job is valid till 11.12.2023 15:03:27
DmitriiBobreshev commented 7 months ago

Hi @jensheidrich-acn! Thank you for the feedback. We'll try to fix it in the next release, for now, could you please try the previous version?

jensheidrich-acn commented 7 months ago

Hi, already done...downgraded to v3.226.3 and everything works again.

DmitriiBobreshev commented 7 months ago

Could you please share us agent's diagnostic logs? If you don't want to share them in the ticket, you could send me it directly on v-bobreshevd@microsoft.com

jensheidrich-acn commented 7 months ago

Sent via email.

tinylabspace commented 7 months ago

Can confirm that this happens for me also using both the Linux and Windows agent version 3.230.2 with Azure DevOps Server 2022.0.1 Patch 4 and 2022.1 RC1. Version 3.232.0 the pre-release behaves the same way. Jobs are assigned to the agent and hang with a job is already running or has completed message in the UI and the renew job request entries in the agent logs like in @jensheidrich-acn 's initial post. Versions 3.230.0 and below work.

I don't believe it's related to the URL error, that returns the same exception pointed at both Server and Services.

/_apis/FeatureFlags/DistributedTask.Agent.UseMaskingPerformanceEnhancements.

{ "$id": "1", "innerException": null, "message": "DistributedTask.Agent.UseMaskingPerformanceEnhancements", "typeName": "Microsoft.TeamFoundation.Framework.Server.MissingFeatureException, Microsoft.TeamFoundation.Framework.Server", "typeKey": "MissingFeatureException", "errorCode": 0, "eventId": 3000 }

ismayilov-ismayil commented 7 months ago

@tinylabspace thanks for reporting. Could you please check reconfiguration the agent? I mean config.cmd remove' then config.cmd again

tinylabspace commented 7 months ago

Ive done that a number of times and in more than one Azure DevOps Server environment. I'm able to reproduce the symptom easily. In the versions of Server I have running currently a job assigned to a 3.230.2 agent never fully starts. Canceling a job in that state can take a long time to show as canceled in the pool UI but from the pipeline UI it cancels quickly. An agent in the same pool running 3.230.0 or below executes the job normally. My org has as open support case about this.

kirill-ivlev commented 6 months ago

Hi @jensheidrich-acn, @tinylabspace. The issue should be resolved in the latest agent (which can be downloaded from this page). Could you please confirm that now everything is working as expected?

imami777 commented 6 months ago

Hello @kirill-ivlev , I just tried deploying version 3.232.1 and I am still getting this error ( ERR VisualStudioServices] POST request to https://donthackmebro.com/tfs/_apis/CustomerIntelligence/Events is not authorized. Details: TF400813: The user 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' is not authorized to access this resource. )

This is on a self-hosted Ubuntu 20.04 VM for use with Azure Devops Server 2020.

HT-Jens commented 6 months ago

Worked for me to upgrade the PipeLine to version 3.232.1 Have on prem. DevOps server 2022.1

jensheidrich-acn commented 5 months ago

Hi @jensheidrich-acn, @tinylabspace. The issue should be resolved in the latest agent (which can be downloaded from this page). Could you please confirm that now everything is working as expected?

Yes works fine at our systems! Thanks for the quick support.

Lotti commented 5 months ago

The bug is still present in 3.232.1 agent. Let me know how I can provide details to help you.

mkriventsev commented 5 months ago

Can confirm it occurring on Windows agent v.3.232.3.

donikatz commented 4 months ago

Issue still exists in 3.236.1 agent on Linux. Downgrading to 3.225.3 resolves it. On-prem ADO Server 2020 w/ latest patches.

imami777 commented 3 months ago

@donikatz Hello, do you mind if I ask what version of Linux you are using?

donikatz commented 3 months ago

@imami777 Amazon Linux 2023

imami777 commented 3 months ago

@donikatz ah interesting, it hadn't occurred to me to try a different Linux distro. I had been operating under the assumption that I could not get V3 agents to work because we're still using version 2020 of Azure DevOps Server, therefore I was planning on updating to 2022 as soon as possible. However, one of the devs wanted to mess around with his own build agent and was able to get the latest version to work instantly without any problem whatsoever. The only obvious difference between his VM and mine is that he is using Ubuntu 22.04 and I'm still using 20.04 for reasons of compatibility with the v2 agent. I'm going to try installing a v3 agent on a 22.04 VM to see if it works. Incidentally I am starting to wonder if the problem is related in some way to openssl versions, even though the way in which the issue manifests, at least superficially, does not seem to be related.

joperator commented 3 months ago

I have the same issue when trying to register an agent:

Microsoft.VisualStudio.Services.Common.VssUnauthorizedException: TF400813: Resource not available for anonymous access. Client authentication required.
  at Microsoft.VisualStudio.Services.Common.VssHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
  at Microsoft.VisualStudio.Services.Common.VssHttpRetryMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
  at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
  at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
  at Microsoft.VisualStudio.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken)
  at Microsoft.VisualStudio.Services.Location.Client.LocationHttpClient.GetConnectionDataAsync(ConnectOptions connectOptions, Int64 lastChangeId, CancellationToken cancellationToken, Object userState)
  at Microsoft.VisualStudio.Services.WebApi.Location.VssServerDataProvider.GetConnectionDataAsync(ConnectOptions connectOptions, Int32 lastChangeId, CancellationToken cancellationToken)
  at Microsoft.VisualStudio.Services.WebApi.Location.VssServerDataProvider.ConnectAsync(ConnectOptions connectOptions, CancellationToken cancellationToken)
  at Microsoft.VisualStudio.Services.WebApi.Location.VssServerDataProvider.EnsureConnectedAsync(ConnectOptions optionsNeeded, CancellationToken cancellationToken)
  at Microsoft.VisualStudio.Services.WebApi.Location.VssServerDataProvider.CheckForServerUpdatesAsync(CancellationToken cancellationToken)
  at Microsoft.VisualStudio.Services.WebApi.Location.VssServerDataProvider.FindServiceDefinitionsAsync(String serviceType, CancellationToken cancellationToken)
  at Microsoft.VisualStudio.Services.WebApi.Location.VssServerDataProvider.GetResourceLocationsAsync(CancellationToken cancellationToken)
  at Microsoft.VisualStudio.Services.WebApi.VssConnection.GetClientInstanceAsync(Type managedType, Guid serviceIdentifier, CancellationToken cancellationToken, VssHttpRequestSettings settings, DelegatingHandler[] handlers)
  at Microsoft.VisualStudio.Services.WebApi.VssConnection.GetClientServiceImplAsync(Type requestedType, Guid serviceIdentifier, Func`4 getInstanceAsync, CancellationToken cancellationToken)
  at Microsoft.VisualStudio.Services.WebApi.VssConnection.GetClientAsync[T](CancellationToken cancellationToken)
  at Microsoft.VisualStudio.Services.WebApi.TaskExtensions.SyncResult[T](Task`1 task)
  at Microsoft.VisualStudio.Services.WebApi.VssConnection.GetClient[T]()
  at Agent.Listener.Configuration.FeatureFlagProvider.GetFeatureFlagWithCred(IHostContext context, String featureFlagName, ITraceWriter traceWriter, AgentSettings settings, VssCredentials creds, CancellationToken ctk) in src/Agent.Listener/Configuration/FeatureFlagProvider.cs:line 59

As the stack trace indicates, the exception is thrown when vssConnection.GetClient\<FeatureAvailabilityHttpClient>() is called. Is there any reason for not moving this line to the following try block? I'd guess that this is what @kirill-ivlev was already trying to accomplish with #4568.

kirill-ivlev commented 3 months ago

@Lotti @mkriventsev, Could you please share details about your environment and attach agent diagnostic logs, we will take a look.

Thanks!

joperator commented 3 months ago

I've found a workaround by setting the environment variable STORE_AGENT_KEY_IN_CSP_CONTAINER to true before configuring the agent. I'd guess the handling of the option to store tokens in named containers is inconsistent, resulting in this now being a mandatory requirement.

Lotti commented 3 months ago

@kirill-ivlev I'll ask to sysadmin. Meanwhile, I can add that the problem occurred while having VPN problems between Azure and our datacenter. Solved that problem, the errore never came back. Btw, we are running agents on Ubuntu 18/22 and RHEL 7/8 machines.