microsoft / azure-pipelines-agent

Azure Pipelines Agent 🚀
MIT License
1.72k stars 867 forks source link

Multiple agents on the same machine using tf.exe can lead to getting sources failure #898

Closed JamesNK closed 6 years ago

JamesNK commented 7 years ago

A build is intermittently getting this error when fetching source from TFS:

2017-03-23T23:49:31.0591599Z ##[section]Starting: Build solution Roster.WebService.sln
2017-03-23T23:49:31.0591599Z ==============================================================================
2017-03-23T23:49:31.0591599Z Task         : Visual Studio Build
2017-03-23T23:49:31.0591599Z Description  : Build with MSBuild and set the Visual Studio version property
2017-03-23T23:49:31.0591599Z Version      : 1.113.0
2017-03-23T23:49:31.0591599Z Author       : Microsoft Corporation
2017-03-23T23:49:31.0591599Z Help         : [More Information](https://go.microsoft.com/fwlink/?LinkID=613727)
2017-03-23T23:49:31.0591599Z ==============================================================================
2017-03-23T23:49:31.6529233Z Unable to determine the workspace. You may be able to correct this by running 'tf workspaces /collection:TeamProjectCollectionUrl'.
2017-03-23T23:49:31.8872843Z ##[error]Exit code 100 returned from process: file name 'tf', arguments 'vc resolvePath "$\My Development\Trunk\src\Rostering\trunk\Roster.WebService.sln" /loginType:OAuth /login:.,******** /noprompt'.
2017-03-23T23:49:31.8872843Z ##[section]Finishing: Build solution Roster.WebService.sln

Copied from https://github.com/Microsoft/vsts-tasks/issues/3853

jmsvl commented 7 years ago

Reply inline

  1. Have you updated all your agents to 2.120.1? [jmsvl] All agents updated to 2.120.2
  2. What version of VS is installed on the agents? [jmsvl] VS 2010, 2012 and 2015
  3. Are all agents at the same version, with the same VS version? [jmsvl] All agents on the server are the same version with multiple VS versions
  4. Build step details: How many agents are you running on the same machine? How often do these agents run in parallel? [jmsvl] 4 agents. All checkins trigger multiple builds.
  5. If this is a regression, what was the last working version #? [jmsvl] Certainly a version previous to 2.117 but I cannot be sure of the number
  6. Latest build logs that carry the error [jmsvl] logs with different errors attached

On Tue, Aug 29, 2017 at 4:09 PM, Garima Singh notifications@github.com wrote:

@greadtm https://github.com/greadtm, @jmsvl https://github.com/jmsvl, @syedhassanabbas https://github.com/syedhassanabbas My apologies if you have already answered the questions below, but this thread has gotten a little convoluted for us to keep track.

Can you please reply with the following:

  1. Have you updated all your agents to 2.120.1?
  2. What version of VS is installed on the agents?
  3. Are all agents at the same version, with the same VS version?
  4. Build step details: How many agents are you running on the same machine? How often do these agents run in parallel?
  5. If this is a regression, what was the last working version #?
  6. Latest build logs that carry the error

Also, would you be willing to potentially run debug bits to help us root cause. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Microsoft/vsts-agent/issues/898#issuecomment-325694937, or mute the thread https://github.com/notifications/unsubscribe-auth/AGnnXPXRPC_jcRK8LxKneNKn4ax3gJ4Eks5sdCmngaJpZM4MyGX1 .

GarimaSi commented 7 years ago

@jmsvl Can you try reattaching the logs please. It looks like they didn't get through the first time.

Cristie commented 7 years ago

Deployment Groups are not available in Team Foundation. Learn more, https://www.visualstudio.com/en-us/docs/build/concepts/definitions/release/deployment-groups/#deploy-agents-to-a-deployment-group

jmsvl commented 7 years ago

I sent via email reply and it didn't work. Via site now.

logs_57650.zip logs_57664.zip

GarimaSi commented 7 years ago

@Cristie Wrong thread?

Cristie commented 7 years ago

@GarimaSi No that link is valid

GarimaSi commented 7 years ago

@Cristie I know the link is valid. I was just wondering if it is on the wrong thread. If it isn't, can you please expand on how it pertains to the issue being discussed here.

Cristie commented 7 years ago

Build Agent Pools, Azure VMs, Team Foundation Servers, etc. are experiencing the same related issues. Related threads #1159 #1141 #1163

ericsciple commented 7 years ago

those are separate issues. let's keep this thread focused.

Cristie commented 7 years ago

@ericsciple I strongly disagree. A build failing to deploy on a Team Foundation server is a server side issue wherein Deployment Groups are not available (i.e. supported) by the server.

bryanmacfarlane commented 7 years ago

@Cristie - this issue is only tracking the error mentioned at the beginning of this issue

From a build, get sources with multiple agents can cause:

Unable to determine the workspace. You may be able to correct this by running 'tf workspaces /collection:TeamProjectCollectionUrl'. 2017-03-23T23:49:31.8872843Z ##[error]Exit code 100 returned from process

That is an issue in tf.exe shipped with the agent

syedhassanabbas commented 7 years ago

@GarimaSi

Sorry for the late response. Below are the answer for your questions for the issue "/loginType:OAuth /login:.,**** /noprompt'" while building. We have also noticed we are not able to deploy as windows copy file task is unable to find the network path, (it says path dose not exist). while running the same release with old agent it works ok. FYI, we have enable the old agents (shipped with TFS 2017 Update 1 R2) and currently only using the old one.

Have you updated all your agents to 2.120.1? Yes

What version of VS is installed on the agents? 2013 and 2015

Are all agents at the same version, with the same VS version? No, the latest one dose not work but all old agents are working.

Build step details: How many agents are you running on the same machine? How often do these agents run in parallel? We have 3 machines. One machine is running 2 old agents, one machine have 2 new agents (agents are disabled because of the issue), the main build server have 9 agents installed (two agents are new (disabled) and other 7 are old)

If this is a regression, what was the last working version #? Please find attached. logs_8215.zip ReleaseLogs_1693.zip

Latest build logs that carry the error Not sure how can I share as I don't see attachment option here. Do you want me to upload it somewhere?

Also, would you be willing to potentially run debug bits to help us root cause. Yes, based on my and my team's availability.

syedhassanabbas commented 7 years ago

FYI the version i see in file names of old agents is vsts-agent-win7-x64-2.111.1. and the version of new agents (that have this issue) is vsts-agent-win7-x64-2.117.2.

greadtm commented 7 years ago

We have 2.120.2 but have yet to deploy it our production environment, We are still on agent 2.105. We held off on pushing the newest agent out as user nero2001 claimed they were still experiencing the issue with the newest agent. GarimaSi then responded the issue was still being researched. I have yet to see a definitive answer on whether this specific bug was resolved:

From a build, get sources with multiple agents can cause: Unable to determine the workspace. You may be able to correct this by running 'tf workspaces /collection:TeamProjectCollectionUrl'. 2017-03-23T23:49:31.8872843Z ##[error]Exit code 100 returned from process

when a server has multiple agents.

GarimaSi commented 7 years ago

@syedhassanabbas, @greadtm Thanks for letting us know the state the agents are in. The old agents (prior to 2.120.1) have a bug that causes this problem in some cases which is why we need you to upgrade to a newer agent. When I said we are still researching and investigating the issue, we meant we are trying to root cause a different bug that may be causing similar symptoms. My apologies for the confusion. Please upgrade at your earliest convenience and let us know if you are still seeing issues.

syedhassanabbas commented 7 years ago

@GarimaSi I have installed 2.120.2 version of agents and it looks like the workspace error is not coming anymore but we still have issues with publish artefacts task. Error below. Not sure If this is the right thread to discuss this new issue though.

System.IO.DirectoryNotFoundException: Could not find a part of the path 'G:\agent3_work\20\a_PublishedWebsites'.

System.Management.Automation.ParameterBindingValidationException: Cannot bind argument to parameter 'Path' because it is null. PowerShell script completed with 2 errors.

ericsciple commented 7 years ago

@syedhassanabbas please open a separate issue

syedhassanabbas commented 7 years ago

Please ignore the previous error as it was executed with old agent. Sorry my mistake. I just executed the build on new agents and got same original error.

Starting: Get Sources


Prepending Path environment variable with directory containing 'tf.exe'. Setting environment variable TFVC_BUILDAGENT_POLICYPATH Querying workspace information. tf vc workspace /new /location:local /permission:Public ws_2_56 /collection:http://wcs-tfs.comfin.ge.com:8080/tfs/Factorlink/ /loginType:OAuth /login:.,**** /noprompt The path F:\agent2_work\2\s is already mapped in workspace ws_2_53;Build\ecf431a2-806c-44bb-8caf-79f0f4b87699 [http://wcs-tfs.comfin.ge.com:8080/tfs/WCSOnline]. Exit code 100 returned from process: file name 'tf', arguments 'vc workspace /new /location:local /permission:Public ws_2_56 /collection:http://wcs-tfs.comfin.ge.com:8080/tfs/Factorlink/ /loginType:OAuth /login:.,**** /noprompt'.

ericsciple commented 7 years ago

2.120.2 fixes a couple of the code paths that can cause the race condition. One more code path was identified today and the version control team is aware and will provide a fix.

syedhassanabbas commented 7 years ago

Thanks Eric, I will keep an eye on this thread.

syedhassanabbas commented 7 years ago

Hi Eric,

Any idea when can we expect this fix?

ericsciple commented 7 years ago

I'm hoping to have a build ready tomorrow available for download. I'm planning to start rolling the agent out through the rings tomorrow will continue to roll over the next week or so.

metaphysico commented 7 years ago

On version 123 I am still having the issue, but I can add some details and a workaround that may be helpful to find this problem and get people back up and running again.

So here is the scenario as I have pinned down.

  1. Remove all agents
  2. Remove all workspaces with tf workspaces /remove:*
  3. Run a build, get the error
  4. Run Developer command Prompt
  5. Navigate to folder inside of workspace
  6. Run tf workfold > get same error 8a. Run tf workspaces /s:serverurl > a workspace shows up 8b. Alternative to 8a, open visual studio and open manage workspaces, this makes tf workfold show details as well. I think it calls the same workspaces call when it opens.
  7. Run tf workfold > now the workspace resolves.
  8. From this point the builds will work when they run in that workspace, unless the build has clean enabled, in which will delete the workspace then add it back with the same issue.

I too though it was random until I tracked the issue. Here is what mine did.

  1. Build1 ran on Agent1 and failed for error, I went to build server and looked at the workspaces which fixed that build.
  2. Build2 ran on Agent1 with same problem and I went to VS and opened workspaces again
  3. at this point build1 and build2 had "Fixed" workspaces on that agent.
  4. Now Build1 and Build2 run find as they always get queued on agent1.
  5. Run Build1 and Build2 in Parrallel, now agent1 does build1 and agent2 takes build2 at which time build2 now gets the error.
  6. Opening workspaces will fix Agent2s workspace for build2, or if you run again after build1 finishes it will probably start on Agent1 again which was "Fixed' before.

So my assumption here is the workspace is being created but not attached to something. I think it's the cache based on MSDN Blog
which states "If the workspace mapping exists on the server, then your mappings cache file may need to be refreshed. Try running this command to populate/refresh the cache: tf workspaces /s:http://tfs-server:8080" Which is why I tried 8a mentioned earlier and is the workaround if implemented as below.

WORKAROUND:

In the powershell call "tf workspaces /s:http://tfs-server.com" before calling tf whatever.

The only other current "Workaround" we have is,

  1. Disable clean workspace in all build definitions.
  2. Every time a new workspace is created (because a build runs on an agent it has not run on yet), manually run what is mentioned in 8a. From there you have to do this for every workspace, which in my case is 42 worst case, which is not feasible.

Here are the details for the questions asked.

  1. Have you updated all your agents to 2.120.1? on 123
  2. What version of VS is installed on the agents? 2015.2
  3. Are all agents at the same version, with the same VS version? Not sure how the versions relate. Looks like agents are using ddls for 15 and VS is 14, so I assume they are different. But VS15 compared to 2.123 do not seem to relate well.
  4. Build step details: How many agents are you running on the same machine? How often do these agents run in parallel? 6 agents 7 builds, 27 releases, so 204 workspaces max. Run parallel a lot.
  5. If this is a regression, what was the last working version #? from old agents 14.102...
  6. Latest build logs that carry the error supplied. I would have to go through all of the files and details to not violate our strict NDA and security agreements. But needed info is: VERBOSE: setting location to: C:\Builds\Agent1_work\1\s\mycode VERBOSE: exe: C:\Program Files (x86)\Microsoft Visual Studio 14.0\common7\ide\tf.exe VERBOSE: args: checkout SolutionInfo.cs Calling Cmd: C:\Program Files (x86)\Microsoft Visual Studio 14.0\common7\ide\tf.exe checkout SolutionInfo.cs VERBOSE: stdout: stderr: Unable to determine the workspace. You may be able to correct this by running 'tf workspaces /collection:TeamProjectCollectionUrl'. VERBOSE: exit code 100

Now I am not sure if the issue is caused by having VS2015 installed while using agents for 2017, looking at the prior messages, it looks like everyone having this issue is using this scenario, so that may be the root problem. Several people have said they have VS2015, and I am running tf from version 14 (which is VS2015, so confusing). That may be the key to replicate the issue.

This is my setup for tf.exe $tfsTool = "C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\TF.exe" so directly using 14.

Sorry for the long message, but I hope this is detailed enough and contains enough information to fix or workaround the problem. Personally I think if this is an issue using 2015 TF and 2017 agents, then there may need to be a workaround posted and not necessarily a fix, but a fix would be nice.

jmsvl commented 7 years ago

Any idea when will the 123 version be released? Thank you

TingluoHuang commented 7 years ago

@jmsvl it should already shows in your VSTS account's download agent pages. https://github.com/Microsoft/vsts-agent/releases/tag/v2.123.0

jmsvl commented 7 years ago

Thanks. Already updated and I will be on the lookout for the usual "get sources" error.

jmsvl commented 7 years ago

Now, with all agents updated on our 2 build servers, the number of errors is almost zero. But now and again there is still one error 100 on "'vc workfold /map". Is anyone getting this also?

fiveshotsofespresso commented 6 years ago

We also experience this issue -- TFS 2017 U2 w/ TFS Agent 2.124.0.

Interestingly, the collisions only occur cross-collection, so far as we can tell. Still grasping at straws for how to fix this or help our devs who experience this issue.

Prachiti9 commented 6 years ago

@jmsvl Good to hear that error count has been gone down for you. Can you share logs for error 100 you are still getting? @fiveshotsofespresso The fixes we have made so far are suppose to work cross-collection as well. Are you getting exact same error? If yes, it would be great if you can share logs.

fiveshotsofespresso commented 6 years ago

@Prachiti9 Sure.

Quick overview: TFS 2017 Update 2, Running vsts-agent version 2.124.0 on Windows 2012 R2 machines. 4 agents per machine, separate service names. Happens on both VS2015 and VS2017 agents.

2017-11-27T16:12:42.4031263Z ##[section]Starting: Get Sources 2017-11-27T16:12:42.4500063Z Prepending Path environment variable with directory containing 'tf.exe'. 2017-11-27T16:12:42.4500063Z Setting environment variable TFVC_BUILDAGENT_POLICYPATH 2017-11-27T16:12:42.4500063Z Querying workspace information. 2017-11-27T16:12:47.0751037Z ##[command]tf vc workspace /new /location:local /permission:Public ws_5_268 /collection:https://tfs.company.com/tfs/DefaultCollection2/ /loginType:OAuth /login:.,**** /noprompt 2017-11-27T16:12:47.9501230Z The path F:\tfs\1_work\5\s is already mapped in workspace ws_5_268;Build\4e01aebd-d1a6-4a0d-8ee3-e5a901209e9a [https://tfs.company.com/tfs/DefaultCollection]. 2017-11-27T16:12:48.0282432Z ##[error]Exit code 100 returned from process: file name 'tf', arguments 'vc workspace /new /location:local /permission:Public ws_5_268 /collection:https://tfs.company.com/tfs/DefaultCollection2/ /loginType:OAuth /login:.,**** /noprompt'. 2017-11-27T16:12:48.0282432Z ##[section]Finishing: Get Sources

The errors are always about a build directory sourced from the other collection, we've seen no instance of conflicts within the same collection.

Prachiti9 commented 6 years ago

@fiveshotsofespress looking at your log snippet (line - The path F:\tfs\1_work\5\s is already mapped in workspace) this issue seems different than one we are tracing with @jmsvl . It might be confusing as error looks same on surface but can you please open a new issue and share full logs? Thank you!

fiveshotsofespresso commented 6 years ago

@Prachiti9 Sure. #1311 Created.

Allann commented 6 years ago

I just received this error. We run on VSTS but build on a local agent. This was straight after a successful build/deploy. We ran the exact same build again (with clean set to true). And received the error, except it doesn't appear to be a login error:

2017-12-11T23:24:29.6002476Z ##[command]tf vc undo /recursive D:\agent_work\1\s /loginType:OAuth /login:.,**** /noprompt 2017-12-11T23:24:30.4440202Z TF400024: The change on D:\agent_work\1\s\Dev\Jbssa.FreightRates\src\Jbssa.FreightRates.Web\wwwroot\bundle\bootstrap.css cannot be undone because a file already exists at D:\agent_work\1\s\Dev\Jbssa.FreightRates\src\Jbssa.FreightRates.Web\wwwroot\bundle\bootstrap.css. The file must be deleted from disk for the undo to succeed. 2017-12-11T23:24:30.5221498Z ##[error]Exit code 100 returned from process: file name 'tf', arguments 'vc undo /recursive D:\agent_work\1\s /loginType:OAuth /login:.,**** /noprompt'. 2017-12-11T23:24:30.5377742Z ##[debug]Microsoft.VisualStudio.Services.Agent.ProcessExitCodeException: Exit code 100 returned from process: file name 'tf', arguments 'vc undo /recursive D:\agent_work\1\s /loginType:OAuth /login:.,**** /noprompt'. at Microsoft.VisualStudio.Services.Agent.ProcessInvoker.d19.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.VisualStudio.Services.Agent.Worker.Build.TfsVCCommandManager.d26.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.VisualStudio.Services.Agent.Worker.Build.TFCommandManager.d25.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.VisualStudio.Services.Agent.Worker.Build.TfsVCSourceProvider.d3.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.VisualStudio.Services.Agent.Worker.Build.BuildJobExtension.d17.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.VisualStudio.Services.Agent.Worker.JobExtensionRunner.d20.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.VisualStudio.Services.Agent.Worker.StepsRunner.d__1.MoveNext() 2017-12-11T23:24:30.5377742Z ##[section]Finishing: Get Sources

Happy to supply more details if needed, but attached the full debug log file if that helps logs_376.zip

Update: manually deleted all the files in the Agent work folder and now it builds fine again. There were no locks on the file. It was a clash between the VSTS and the file system.

TingluoHuang commented 6 years ago

I have read through the thread and i think folks either get their problem fixed already or opened a separate issue, i am going to close this extremely long thread.