microsoft / azure-pipelines-agent

Azure Pipelines Agent 🚀
MIT License
1.72k stars 865 forks source link

Multiple agents on the same machine using tf.exe can lead to getting sources failure #898

Closed JamesNK closed 6 years ago

JamesNK commented 7 years ago

A build is intermittently getting this error when fetching source from TFS:

2017-03-23T23:49:31.0591599Z ##[section]Starting: Build solution Roster.WebService.sln
2017-03-23T23:49:31.0591599Z ==============================================================================
2017-03-23T23:49:31.0591599Z Task         : Visual Studio Build
2017-03-23T23:49:31.0591599Z Description  : Build with MSBuild and set the Visual Studio version property
2017-03-23T23:49:31.0591599Z Version      : 1.113.0
2017-03-23T23:49:31.0591599Z Author       : Microsoft Corporation
2017-03-23T23:49:31.0591599Z Help         : [More Information](https://go.microsoft.com/fwlink/?LinkID=613727)
2017-03-23T23:49:31.0591599Z ==============================================================================
2017-03-23T23:49:31.6529233Z Unable to determine the workspace. You may be able to correct this by running 'tf workspaces /collection:TeamProjectCollectionUrl'.
2017-03-23T23:49:31.8872843Z ##[error]Exit code 100 returned from process: file name 'tf', arguments 'vc resolvePath "$\My Development\Trunk\src\Rostering\trunk\Roster.WebService.sln" /loginType:OAuth /login:.,******** /noprompt'.
2017-03-23T23:49:31.8872843Z ##[section]Finishing: Build solution Roster.WebService.sln

Copied from https://github.com/Microsoft/vsts-tasks/issues/3853

ericsciple commented 7 years ago

If you can catch it while system.debug is set, the logs will have more details.

ericsciple commented 7 years ago

@JamesNK Does this only happen when using the hosted build agent? And do you have more than one hosted build agent?

bootrider commented 7 years ago

I first installed the agent1, it shows this error, I had to manual perform a "get-latest version" operation in the workspace created by the agent, then I change the workspace to Server (a lot of projects and a lot of files). then I run the following script in the build server:

Add-type -AssemblyName "Microsoft.TeamFoundation.Client, Version=12.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a"
Add-type -AssemblyName "Microsoft.TeamFoundation.Common, Version=12.0.0.0, Culture=Neutral, PublicKeyToken=b03f5f7f11d50a3a"
Add-type -AssemblyName "Microsoft.TeamFoundation.VersionControl.Client, Version=12.0.0.0, Culture=Neutral, PublicKeyToken=b03f5f7f11d50a3a"
Add-type -AssemblyName "Microsoft.TeamFoundation.VersionControl.Common, Version=12.0.0.0, Culture=Neutral, PublicKeyToken=b03f5f7f11d50a3a"

# work folder
Push-Location C:\_Agent1\1\s\

$teamProjectCollection="http://tfs.roseninspection.net:8080/tfs/Bogcollection"

$tfsTeamProjectCollection= New-Object Microsoft.TeamFoundation.Client.TfsTeamProjectCollection($teamProjectCollection)

$versioncontrolServer = $tfsTeamProjectCollection.GetService([Microsoft.TeamFoundation.VersionControl.Client.VersionControlServer])

# Afolder or a file in the workspace
$localReference = "C:\_Agent1\1\s\Environment"

[Microsoft.TeamFoundation.VersionControl.Client.Workstation]::Current.EnsureUpdateWorkspaceInfoCache($versionControlServer, $env:USERNAME); 

$workspace = $versioncontrolServer.GetWorkspace($localReference)
[Microsoft.TeamFoundation.VersionControl.Client.Workstation]::Current.GetLocalWorkspaceInfo($localReference)
$workspace.GetServerItemForLocalItem($localReference)

Pop-Location

now is working perfectly

ericsciple commented 7 years ago

@bootrider it sounds like you are hitting a different issue (probably the Get sources not downloading some files authz problem. If that doesn't resolve please open a separate issue.

ericsciple commented 7 years ago

@JamesNK I just merged a fix for a TFVC issue that affects folks using the multiple hosted build agent in parallel. I suspect this may be related to the error you are experiencing.

It would be great if you can confirm whether the bug you are seeing only manifests when using multiple hosted agents in parallel.

The fix will roll out to all scale units within the next couple weeks.

Details about the bug: there was a change to the hosted images so the machine names are not always unique. This introduces a race condition for TFVC when using multiple hosted agents. In your case my suspicion (untested hypothesis) is that a workspace is being deleted by a parallel build, and causing issues the issue.

JamesNK commented 7 years ago

Hi

In our case we are running two agents on the same Azure VM (with different Agent and working directories so they don't conflict with each other).

Is it the same bug?

ericsciple commented 7 years ago

@JamesNK thanks for confirmation. Nope this sounds like a different bug.

I haven't heard of anyone else running into this issue. So my guess is that it has something to do with multiple agents on the same machine (running TFVC builds). I'll talk with folks from the version control team next week to see if they have heard of this issue.

Assuming the issue is related to multiple agents on the same machine... There are a couple changes we want to make for TFVC that might mitigate the issue you are seeing. 1) We want to redirect the local cache folder away from C:\programdata and into the _work folder. Mainly we want to do this for better cleanup. I'll poke through TFVC code a little bit next week to see if this is low hanging fruit. 2) We want to offer the option on the build to instruct the agent to create a server workspace instead of local. Server workspaces makes more sense for CI, but we ran into issues when we tried to switch (some tasks and user scripts assumed files were readonly). So it needs to be opt-in. My thinking is that server workspaces may not have the bug... although it may as well.

Again, I'll talk with TFVC folks next week to see if they know anything about this issue.

bootrider commented 7 years ago

Unfortunately my issue is related to server workspaces​ ( at least i think that) so far i have installed 3 agents in the same machine. Every agent has it's own workspace folder and do not overlap each other

ericsciple commented 7 years ago

@JamesNK I'm trying to think of some workarounds...

Assuming you run the agents as a Windows service (and not interactively from command prompt), one thing that could mitigate is running the agents as different service accounts. My best guess right now is that some race condition is stomping on some local file.

If you don't have complex mappings, another thing you could try is specify the file name as a relative path. The agent will prepend the build sources directory (e.g. C:\Agent1\_work\1\s). Or use $(Build.SourcesDirectory)/relative/path/to/file.ext.

Also I'm planning to add support to opt-in to server workspaces, by setting a variable on the definition (something like Build.TfvcWorkspaceType=Server). I can add that to the next agent release if you want to try it, and see if the issue goes away. My thinking is XAML build supported side-by-side agents and this issue afaik wasn't a problem.

I'll talk with VC folks again to see if they have ideas how to get to the bottom of this issue.

michha commented 7 years ago

We are facing the same error in our on-premise environment. 4 Agents (v2.112.0) are running on the same machine. We have some scheduled builds and two of them are set to run at the same time (6 am). Altough they build the same TFS project, each agent has a different working directory. About twice a week, one of these builds (its not the same build every time) fails with the messages below (system.debug = true).

2017-04-26T04:01:43.0575150Z ##[section]Starting: Apply transforms $(build.sourcesdirectory)\.configs\XXXDev.config => $(build.artifactstagingdirectory)\website\web.config
2017-04-26T04:01:43.0575150Z ==============================================================================
2017-04-26T04:01:43.0575150Z Task         : XDT Transform
2017-04-26T04:01:43.0575150Z Description  : Apply XDT transforms on XML files
2017-04-26T04:01:43.0575150Z Version      : 2.0.0
2017-04-26T04:01:43.0575150Z Author       : Guillaume Rouchon
2017-04-26T04:01:43.0575150Z Help         : v2.0.0, [More Information](https://github.com/qetza/vsts-xdttransform-task#readme)
2017-04-26T04:01:43.0575150Z ==============================================================================
2017-04-26T04:01:43.0575150Z ##[debug]tf vc resolvePath $\XXX\Source /loginType:OAuth /login:.,******** /noprompt
2017-04-26T04:01:43.6043790Z ##[debug]Unable to determine the workspace. You may be able to correct this by running 'tf workspaces /collection:TeamProjectCollectionUrl'.
2017-04-26T04:01:43.6199909Z Unable to determine the workspace. You may be able to correct this by running 'tf workspaces /collection:TeamProjectCollectionUrl'.
2017-04-26T04:01:43.6512403Z ##[error]Exit code 100 returned from process: file name 'tf', arguments 'vc resolvePath $\XXX\Source /loginType:OAuth /login:.,******** /noprompt'.
2017-04-26T04:01:43.6512403Z ##[debug]Microsoft.VisualStudio.Services.Agent.ProcessExitCodeException: Exit code 100 returned from process: file name 'tf', arguments 'vc resolvePath $\XXX\Source /loginType:OAuth /login:.,******** /noprompt'.
   at Microsoft.VisualStudio.Services.Agent.Worker.Build.TfsVCCommandManager.<RunPorcelainCommandAsync>d__28.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.VisualStudio.Services.Agent.Worker.Build.TFCommandManager.ResolvePath(String serverPath)
   at Microsoft.VisualStudio.Services.Agent.Worker.Build.TfsVCSourceProvider.GetLocalPath(IExecutionContext executionContext, ServiceEndpoint endpoint, String path)
   at Microsoft.VisualStudio.Services.Agent.Worker.Build.BuildJobExtension.GetRootedPath(IExecutionContext context, String path)
   at Microsoft.VisualStudio.Services.Agent.Worker.TaskRunner.TranslateFilePathInput(String inputValue)
   at Microsoft.VisualStudio.Services.Agent.Worker.TaskRunner.<RunAsync>d__20.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.VisualStudio.Services.Agent.Worker.StepsRunner.<RunAsync>d__0.MoveNext()
2017-04-26T04:01:43.6512403Z ##[section]Finishing: Apply transforms $(build.sourcesdirectory)\.configs\XXXDev.config => $(build.artifactstagingdirectory)\website\web.config
ericsciple commented 7 years ago

@michha thanks, that gives more evidence this likely a TFVC issue due to multiple agents on the same machine. a quick workaround would be to stagger the times.

michha commented 7 years ago

@ericsciple will this be fixed in the next v2 Agent version?

TingluoHuang commented 7 years ago

agent with fix should be deployed everywhere.

TingluoHuang commented 7 years ago

this haven't been fixed, thought a different issue. :(

axelheer commented 7 years ago

Will this get fixed with TFS 2017 Update 2?

bryanmacfarlane commented 7 years ago

@ericsciple @TingluoHuang Is there a QU2 bug logged for TFSVC? This is an issue in tf.exe and I want to make sure it gets some traction. (tfsvc issues aren't open but at least we can ref progress)

ericsciple commented 7 years ago

@bryanmacfarlane 987895

ericsciple commented 7 years ago

@JamesNK @michha @axelheer please send ersciple a build log if you have a failed build (at microsoft com). I'm working with the VC team and we have an app to run the commands in a loop but so far not getting a repro. We tried create/get/delete/resolvePath in a loop in two separate processes for an hour.

What I'm wondering is what combination of tf commands are running during your builds. For instance, is the workspace created new everytime? If not, are you running a clean build (i.e. is scorch running)?

michha commented 7 years ago

I will try to reproduce it during the next days (we worked around this issue by spreading our build start times) and I will report here again, when the e-mail is sent.

ericsciple commented 7 years ago

@michha sorry i thought i updated this issue. we found the bug. right now i'm waiting until it makes it into a stable branch so i can bundle a new copy of tf and all dependencies. probably a couple weeks before it goes out with an agent release.

michha commented 7 years ago

Thank you for the update. When the new agent is available I will deploy it to our environment and confirm if this bug is fixed.

jmsvl commented 7 years ago

Hello,

I 'm also on a project with several agents on the same build server and with the previous version (116, 117) and this new version (119) I's still getting the messages below every once in a while.

2017-06-29T12:44:20.3873620Z ##[command]tf vc scorch D:\b\a4_work\8\s /recursive /diff /unmapped /loginType:OAuth /login:.,**** /noprompt 2017-06-29T12:44:22.6494345Z No appropriate mapping exists for D:\b\a4_work\8\s.

In this build server there's also other older agents that execute XAML Builds.

Can you help?

Thank you

ericsciple commented 7 years ago

Work is in progress to get a new agent that carries the updated tf.exe and dependency DLLs (pr https://github.com/Microsoft/vsts-agent/pull/1056)

jmsvl commented 7 years ago

Thanks for the update

michha commented 7 years ago

After installing TFS 2017 Update 2 and updating the agent from version 2.112.0 to 2.117.2 the problem persists.

@ericsciple: could we have a a minimum TFS agent version or TFS Update number where this fix will be available

ericsciple commented 7 years ago

@michha The fix is in master if you want to build master and try it. One more small change (not a fix, command line interface change) is needed before we release the next agent. The fix will be in the 2.120.0 agent which will be released early next week.

michha commented 7 years ago

Thanks, I will wait for the 120 release.

P.S.: make more use of https://github.com/Microsoft/vsts-agent/milestones to spend less time answering those little questions ;)

rikkigouda commented 7 years ago

+1 We are having the exact same issue - more of it recently. Is there a solution around these errors? We keep having random build failures, it's frustrating for the team. Thanks.

Adding more information: We are using VSTS Online (So the on-prem solution might not apply here.)

ericsciple commented 7 years ago

https://github.com/Microsoft/vsts-agent/releases/tag/v2.120.0

ZeroNull42 commented 7 years ago

Currently, release 2.120.0 shows as a Pre-release, when will it be considered as a full release i.e. ready for use in production environments?

ericsciple commented 7 years ago

It's being used in production environments now. It just rolled to the first ring. It will continue rolling out to all of the rings over the next several days.

ericsciple commented 7 years ago

...investigating a bug in the 2.120.0 agent. Wait for 2.120.1.

nero2001 commented 7 years ago

Hi everyone,

I still have occasional errors in getting sources, while using agent 2.120.1..

" Exit code 100 returned from process: file name 'tf', arguments 'vc workspace /new /location:local /permission:Public ws_4_69 /collection:https://mediahub.visualstudio.com/ /loginType:OAuth /login:.,**** /noprompt'. "

Thanks!

ericsciple commented 7 years ago

@nero2001 the output (stdout/stderr) from tf workspace /new ... should be in the log. Did it output anything?

nero2001 commented 7 years ago

Hi @ericsciple, is this what you want?

2017-08-10T10:46:07.7176994Z ##[section]Starting: Get Sources 2017-08-10T10:46:08.0141146Z Prepending Path environment variable with directory containing 'tf.exe'. 2017-08-10T10:46:08.0141146Z Setting environment variable TFVC_BUILDAGENT_POLICYPATH 2017-08-10T10:46:08.0141146Z Querying workspace information. 2017-08-10T10:51:49.5936306Z ##[command]tf vc scorch D:\b\ag2_work\4\s /recursive /diff /unmapped /loginType:OAuth /login:.,**** /noprompt 2017-08-10T10:51:52.1521618Z No appropriate mapping exists for D:\b\ag2_work\4\s. 2017-08-10T10:51:52.1677626Z ##[warning]Exit code 100 returned from process: file name 'tf', arguments 'vc scorch D:\b\ag2_work\4\s /recursive /diff /unmapped /loginType:OAuth /login:.,**** /noprompt'. 2017-08-10T10:51:52.1989642Z ##[command]tf vc workspace /delete ws_4_69;93855f07-66c0-4079-9936-550abd63d390 /collection:https://mediahub.visualstudio.com/ /loginType:OAuth /login:.,**** /noprompt 2017-08-10T10:56:10.6418170Z ##[command]tf vc workspace /new /location:local /permission:Public ws_4_69 /collection:https://mediahub.visualstudio.com/ /loginType:OAuth /login:.,**** /noprompt 2017-08-10T10:57:23.1699362Z The path D:\b\ag2_work\4\s\dev\BIZ\Components\Bam is already mapped in workspace ws_4_69. 2017-08-10T10:57:23.2323394Z ##[error]Exit code 100 returned from process: file name 'tf', arguments 'vc workspace /new /location:local /permission:Public ws_4_69 /collection:https://mediahub.visualstudio.com/ /loginType:OAuth /login:.,**** /noprompt'. 2017-08-10T10:57:23.2323394Z ##[section]Finishing: Get Sources

ericsciple commented 7 years ago

Did the computer name change? tf workspace /delete was successful, so server registration should be gone. It additionally should remove the workspace metadata on your local computer.

nero2001 commented 7 years ago

No, it did not change.. And the error only occurs sometimes..

Em 10/08/2017 17:37, "ericsciple" notifications@github.com escreveu:

Did the computer name change? tf workspace /delete was successful, so server registration should be gone. It additionally should remove the workspace metadata on your local computer.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Microsoft/vsts-agent/issues/898#issuecomment-321605875, or mute the thread https://github.com/notifications/unsubscribe-auth/AAg7frgo3aSGVrREt953n6reBWBodoEiks5sWzGtgaJpZM4MyGX1 .

ericsciple commented 7 years ago

@nero2001 thanks I just discussed with a coworker. We think we know where to look for the specific issue you hit (sounds like another race condition since you are running multiple agents on the same box). You are running multiple agents on the same machine, correct?

nero2001 commented 7 years ago

Yes indeed.

Em 11/08/2017 22:30, "ericsciple" notifications@github.com escreveu:

@nero2001 https://github.com/nero2001 thanks I just discussed with a coworker. We think we know where to look for the specific issue you hit (sounds like another race condition since you are running multiple agents on the same box). You are running multiple agents on the same machine, correct?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Microsoft/vsts-agent/issues/898#issuecomment-321923612, or mute the thread https://github.com/notifications/unsubscribe-auth/AAg7fg-C_Q0H7P5uIcHJWwfbgjxbdaTZks5sXMf4gaJpZM4MyGX1 .

ericsciple commented 7 years ago

@nero2001 did you update all agents your machine to 2.120.1?

greadtm commented 7 years ago

Does 2.120.2 address all known issues with running multiple agents on a single server?

ericsciple commented 7 years ago

@greadtm v2.120.2 carries a new version of git. I still have an outstanding question whether all agents on the machine were updated to 2.1201. or not (if not then that could explain the issue).

nero2001 commented 7 years ago

Yes i did.

Em 14/08/2017 18:52, "ericsciple" notifications@github.com escreveu:

@nero2001 https://github.com/nero2001 did you update all agents your machine to 2.120.1?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Microsoft/vsts-agent/issues/898#issuecomment-322260450, or mute the thread https://github.com/notifications/unsubscribe-auth/AAg7fovi4-4-inWrSUHivlerG9nhRAfJks5sYIligaJpZM4MyGX1 .

greadtm commented 7 years ago

Is this issue still being investigated?

ericsciple commented 7 years ago

@garimasi

GarimaSi commented 7 years ago

The tf workspaces issue for multiple agents is being looked into. We will get back here once we know more and/or if we have any follow up questions. Thanks for your patience!

greadtm commented 7 years ago

I wanted to follow up and see if there is any progress on this fix? We are interested in getting this fix to our development user base.

jmsvl commented 7 years ago

Hi, we're also waiting for follow up on this matter. Thanks

syedhassanabbas commented 7 years ago

last Saturday I upgraded to TFS 2017 Update 2 with new agents on build server and Dev teams reported exact same issue which is random issue. I will wait for this fix as this is a major productivity impediment for our development teams.

GarimaSi commented 7 years ago

@greadtm, @jmsvl, @syedhassanabbas My apologies if you have already answered the questions below, but this thread has gotten a little convoluted for us to keep track.

Can you please reply with the following:

  1. Have you updated all your agents to 2.120.1?
  2. What version of VS is installed on the agents?
  3. Are all agents at the same version, with the same VS version?
  4. Build step details: How many agents are you running on the same machine? How often do these agents run in parallel?
  5. If this is a regression, what was the last working version #?
  6. Latest build logs that carry the error

Also, would you be willing to potentially run debug bits to help us root cause. Thanks!