microsoft / azure-pipelines-tasks

Tasks for Azure Pipelines
https://aka.ms/tfbuild
MIT License
3.47k stars 2.6k forks source link

Windows Machine File Copy task fails due to multiple connections #7621

Closed Novak-Peter closed 5 years ago

Novak-Peter commented 6 years ago

We have multiple Windows Machine File Copy tasks on different Release Agents (+ and different target machines), the target dirs are specified for folders (like D:\TargetFolder), the target machine lists contain fqdn machine names. Lately we randomly started to get the following error:

Failed to Create PSDrive with Destination: '\\TargetMachineFQDN\D$\TargetFolder', ErrorMessage: 'Multiple connections to a server or shared resource by the same user, using more than one user name, are not allowed. Disconnect all previous connections to the server or shared resource and try again'

Suspicious thing to me: https://github.com/Microsoft/vsts-tasks/blob/ea589633e6eb5c3aec562076a240dd4f2f781b0b/Tasks/WindowsMachineFileCopyV2/RoboCopyJob.ps1#L219-L225

For this there isn't a matching Remove-PSDrive command - can it be the issue that the release agents started to run out of free network shares?

Novak-Peter commented 6 years ago

Another suspicion: we use the tasks in parallel copy mode - is it possible that in one thread the actual Remove-PSDrive -Name WFCPSDrive code in the folder checking somehow conflicts with another thread in the task as all drives attached with the same name (WCFPSDrive)?

Novak-Peter commented 6 years ago

Update: it is failing with the same error with Copy Parallel: false, having multiple target servers, on the very first server...

SumiranAgg commented 6 years ago

@NPeete Thanks for reporting the issue. We are looking into the issue, will update you soon.

JamesMBristow2 commented 6 years ago

+1 I am experiencing the same problem. There is no apparent reason to the failure, but I had the same suspicion as NPeete. Can confirm the parallel copy set to off does not fix it. A redeploy or two, and the release succeeds.

djosemartine commented 6 years ago

I am also having the same issue by using the copy files from Task.

giznaj commented 6 years ago

I am also seeing the same error:

Copy started for - IP.IP.IP.IP Failed to Create PSDrive with Destination: '\IP.IP.IP.IP\C$\UAT_Drop\api', ErrorMessage: 'The network path was not found' The network path was not found

SumiranAgg commented 6 years ago

@NPeete What is the powershell version on the agent box?

Novak-Peter commented 6 years ago

According to the Release Agents / Capabilities tab, the Powershell version is 5.1.14409.1012

maknud commented 6 years ago

My team has the same issue. I tried adding a PowerShell step before the file copy to just list out the results of 'Get-PSDrive' but it doesn't show anything unusual:

2018-07-17T23:13:08.7037788Z Name Used (GB) Free (GB) Provider Root CurrentLocation 2018-07-17T23:13:08.7045006Z ---- --------- --------- -------- ---- --------------- 2018-07-17T23:13:08.7117423Z A FileSystem A:\
2018-07-17T23:13:08.7134095Z Alias Alias
2018-07-17T23:13:08.7531918Z C 592.85 430.15 FileSystem C:\ BuildAgent_work\r103\a 2018-07-17T23:13:08.7541380Z Cert Certificate \
2018-07-17T23:13:08.7831237Z D 1.31 14.69 FileSystem D:\
2018-07-17T23:13:08.7841311Z Env Environment
2018-07-17T23:13:08.7851526Z Function Function
2018-07-17T23:13:08.7861713Z HKCU Registry HKEY_CURRENT_USER
2018-07-17T23:13:08.7871375Z HKLM Registry HKEY_LOCAL_MACHINE
2018-07-17T23:13:08.7881077Z Variable Variable
2018-07-17T23:13:08.7890868Z WSMan WSMan

And then the File Copy step fails with: Failed to Create PSDrive with Destination: '\[MachineName]\f$\deploy\10.0.18194.007\Deployment\Tools', ErrorMessage: 'Multiple connections to a server or shared resource by the same user, using more than one user name, are not allowed. Disconnect all previous connections to the server or shared resource and try again'

SumiranAgg commented 6 years ago

We have recognized the issue. Will be deploying a fix soon.

arjgupta commented 6 years ago

Hi @NPeete , can you try to run the 'net use' on the agent machine and see if you can see any shares connected to your target machines ? Can you try removing the share and then use the task. Also, can you share the $PSVersionTable info of the agent machine and the OS info and can you tell if any one of the following situations is true for your case:

  1. You have 2 instances of the task (either as part of the same build/release flow or different build/release ) running on the same agent machine and trying to copy files to the same target but with different credentials.

  2. Did you recently change the username being used for copying ?

Novak-Peter commented 6 years ago

Unfortunately we don't have access to release agents (hence I can't show you the full $PSVersionTable info), but when I originally reported the issue we were validating with ops both net use and Get-PSDrive commands and none of them were showing extra records.

We use the same agents for both QA and PROD deployments with different credentials - QA and PROD target machines are always different. We always used the same credentials for QA (hardcoded TFS deploy user), and different ones for PROD (ops users, their credentials entered in release variables per release) - so it may be possible only in PROD cases that 2 instances of the task were running on the same agent with different credentials targeting the same machines BUT: 1) We were double checking if anything else was running on that agent and there wasn't 2) There are some different agents which are used only for QA deployments (with always the same credentials) and the issue occurred there as well.

arjgupta commented 6 years ago

@NPeete , are you running the 'net use' command in a session running with the same identity with which the agent is running ? can you try creating a new release and running a powershell task on the failing agent and execute net use. You can also use the task to extract psversiontable info and systeminfo. Also, will it be possible for you to reboot the agent machine and then try executing the task ? You can share the info at RM_Customer_Queries at microsoft.com

maknud commented 6 years ago

I was able to fix my issue by just adding a 'net use /y /delete *' step before the file copy tasks. Turns out there were a ton of disconnected net use mappings from previous release attempts.

JamesMBristow2 commented 6 years ago

Great find! Shouldn't this be done by the task for the specific connection being created?

FlorenceDaniel commented 6 years ago

@SumiranAgg Could you please let me know when the fix will be available in VSTS? It is lot of manual work for us to add net use command task for disposing disconnected mappings on all release definitions before each file copy task!

SumiranAgg commented 6 years ago

Added PR #7757 for the fix.

FlorenceDaniel commented 6 years ago

Hi @SumiranAgg , Thanks! How could we make our existing "windows machine file copy task" in VSTS use this new robocopy script?

rajatagrawal-dev commented 6 years ago

@FlorenceDaniel The next deployment will happen sometime next week, after which all accounts will have the updated task. If you want the task before we deploy to your account, you'll have to use tfs-cli to manually upload the task to your account. If you do plan to manually upload and need guidance with tfs-cli, you can mail us at RM_Customer_Queries [at] microsoft [dot] com.

FlorenceDaniel commented 6 years ago

@rajatagrawal-dev Thanks a lot for you response. Would you be able to update this thread once the deployment happens next week? Also right now we are using 1. version of the "windows machine file copy task". Should we update our release definitions to use 2. version of the "windows machine file copy task" in order to have the updated task? Please let me know.

rajatagrawal-dev commented 6 years ago

@FlorenceDaniel I'll update the thread once the task is deployed to all accounts. Yes, the fix is only in version 2 of the task. Please select version 2 to get the task with latest features and bug fixes.

rajatagrawal-dev commented 6 years ago

Please check if you have Windows File Copy v2.1.2. The fix should be deployed.

FlorenceDaniel commented 6 years ago

We are able to select "2.*" as Version. Will this change on release definition tasks pull your latest update that fixes this issue? image

rajatagrawal-dev commented 6 years ago

When you queue a release, the latest available patch of the version that you have selected (2.) will be taken. So please select 2. in your release definition and in the logs, you will find the exact task version that is available for major version 2.

Mibe8 commented 6 years ago

I can see we are using version 2.1.2 of this task, but we still get this error when using the Copy Files in Parralel. I can't see any sessions open from the agent.

FlorenceDaniel commented 6 years ago

Hi @Mibe8, Were you able to figure out the issue? Your response might help us in saving lot of time. We don't want to update the version for all copy file tasks in release definitions and then end up with same issue again. Please let me know.

Hi @rajatagrawal-dev , Any response to Mibe8's comment above will be appreciated.

Thanks! Florence

rajatagrawal-dev commented 6 years ago

@Mibe8 Can you run the command suggested by @maknud (net use /y /delete *) once through a powershell task on your agent box? This should remove any stale mappings which might be interfering. Then try the file copy task which is now patched to remove the psdrive mappings after each execution.

@FlorenceDaniel Can you also try to do the same? You can try with just one of your release definitions or create a new one to mimic your scenario.

Mibe8 commented 6 years ago

@FlorenceDaniel, no I didn't figure it out yet.

@rajatagrawal-dev, I tried the net use command. It returns: There are no entries in the list.

I'm able to reproduce quite easy, by creating a build and new release definition. All I do is copy a zip file to a machine.

The first time it works. At the end of the log I see:

[debug]Attempting to remove PSDrive 'WFCPSDrive'

If I start a new release, I get:

[error]Failed to Create PSDrive with Destination: '\(Machine).(domain)\D$\VSTS-Deploy\CopyFileError', ErrorMessage: 'Multiple connections to a server or shared resource by the same user, using more than one user name, are not allowed. Disconnect all previous connections to the server or shared resource and try again'

Some extra info: Script stack trace:

[debug]at (ScriptBlock), D:\vsts-agent-win-x64-2.133.3_work_tasks\WindowsMachineFileCopy_731004d4-1d66-4f70-8c05-638018b22210\2.1.3\RoboCopyJob.ps1: line 222

[debug]at (ScriptBlock), D:\vsts-agent-win-x64-2.133.3_work_tasks\WindowsMachineFileCopy_731004d4-1d66-4f70-8c05-638018b22210\2.1.3\WindowsMachineFileCopy.ps1: line 55

[debug]at (ScriptBlock), (No file): line 1

[debug]at (ScriptBlock), (No file): line 22

[debug]at (ScriptBlock), (No file): line 18

[debug]at (ScriptBlock), (No file): line 1

rajatagrawal-dev commented 6 years ago

@Mibe8 That's unexpected. I tried the same scenario that you specified but it worked for me. Can you try restarting the agent machine? Also, please share you account name, project name and the release definition name. I'll check a few things on our end as well. Please send it to raagra@microsoft.com, if you don't want to post these details here.

@FlorenceDaniel Please let me know if you are facing similar issue still.

FlorenceDaniel commented 6 years ago

Hi,

We are not seeing the issue after upgrading to version 2.* of Windows Machine File Copy Task.

Thanks for the help!

Regards, Florence

anuragacharya commented 6 years ago

hi @rajatagrawal-dev - i am facing the same issue and using Version 2.* for Windows Machine File Copy. I can share the Account NAme, Project Name and release definition if this helps. Please let me know if there you were able to resolve this.

image

image

Thanks, Anurag

schnitty commented 6 years ago

We are having the same problem on our account.

schnitty commented 6 years ago

I've been troubleshooting the issue and have some information that might help. In our release we have three WindowsMachineFileCopy tasks in a row. The first two succeed and the third fails nine times out of ten. We are running our own build server running the latest version of the agent - 2.136.1 We are running Windows Machine File Copy 2.3.1

When I log onto our build server and run a command prompt as NETWORK SERVICE (our agent is using this account) when the file copy task starts I see the share get created when I type net use. I can see the share. When the file copy task stops and I type net use I can see that it has been removed. When the next file copy task occurs and I type net use I can see it again. So it looks like it is adding and removing the share properly.

For us at least it is looking like a timing issue. Maybe the next copy task tries to create the share before the previous one has fully removed it?

My workaround for the moment is to shift the third copy task until a little bit later in the pipeline. This seems to have resolved the issue for me.

Hope this info helps.

schnitty commented 6 years ago

Further update. I put some sleeps into my process with mixed success. It seems like this is an intermittent issue for me at least. Sometimes it works, sometimes it doesn't but the failure rate is fairly high.

FlorenceDaniel commented 6 years ago

Hi @rajatagrawal-dev ,

We faced the issue again last week even after upgrading to version 2.* of Windows Machine File Copy Task.

image

Please let us know why this issue is observed. Thanks! Florence

cwjbowler commented 6 years ago

Hi,

We are still having this issue intermittently, even after upgrading to use 2. of the Windows Machine File Copy Task. We had the issue previously with 1. and did notice that changing it to 2.* allowed us to complete the releases (without rebooting the agents). At the time, it was picking up version 2.1.3 of the task.

However now it's picking up 2.1.4, and we are having the issue again. Has something regressed in the code? Is there another way we can get around this issue in our release definitions? I've seen posts above where people have added a preceding task for 'net use /y /delete *', is this a safe thing to do on the release agents as part of a release?

Thanks

Craig

ahaleiii commented 6 years ago

We were also running into this issue. We have a build server that also functions as a host server for our lower environments, so the release for a few of our pipelines will be done on that server to that same server. We were getting the above failure (Multiple connections ...) when the Windows Machine File Copy task was not using the same user that the VSTS agent was running as. After updating the task to use the same user as the agent, the step succeeded.

arjgupta commented 6 years ago

Hi Guys, thanks for reporting this. We are looking into this issue. We added Remove-PSDrive at the end of the task in order to delete connection that we make. But I see that it does not work in some situations. We are trying to see if in addition to Remove-PSDrive , we can delete shared paths using net use as part of the task itself.

Panzerbjrn commented 5 years ago

Hi, are there any updates to this? We have this same problem and it is super annoying...

matthew-d-collins commented 5 years ago

I resolved my issue by inserting a "net use" command line script before my copy task, to enumerate any connections in the log. I found that there was a connection to \\IPC$. Then I changed the command line script to "net use /delete \\IPC$". That allowed the copy to execute successfully. Not a fix, but a successful workaround after spending way too much time researching.

arjgupta commented 5 years ago

@mcollins2002 IPC$ is a build-in administrative share that cannot be modified. When you execute the command, does it actually delete this share ? Or does it just reset the path associated with this location ? Also, can you tell what was the value of the connection to this share ? Was it's value the target path that you provide as part of the task input ? And can you check and tell if you are able to run the task successfully after executing this command only once. In other words, I want to know if in your case it is absolutely necessary to run this command in order to run the task successfully or would running it only once suffice ?

arjgupta commented 5 years ago

Hi guys, i have raised a PR to use the net use /delete path to delete the shared path created as part of new-psdrive command. But I am unable to repro this issue. Can someone who is facing this issue pick these changes and see if it resolves the problem. here is the pr: https://github.com/Microsoft/vsts-tasks/pull/8575 . ( You only need to take the changes in the .ps1 file )

matthew-d-collins commented 5 years ago

@arjgupta The cmd doesn't delete the share, it just deletes the agent's mapping to the share. It appears that after the cmd is run once I can remove that cmd from the release and it executes as it should. Perhaps something didn't get cleaned up properly during a failed release attempt.

ricardocovo commented 5 years ago

For anyone with this issue: If you have multiple agents running, restart them (the service I mean).

We had an issue where one of the services was not letting go, once we restarted the build agents, the builds started working again.

VimalanathanT commented 5 years ago

This is because you may be using different agents with different service accounts. Try using IP Address, this should resolve this. Another way to resolve is to use same service account for all the agents. However this may not be possible for all scenarios. Using IP address to access the share will surely help resolve the below error: There is an old KB from MS also talks about this: https://support.microsoft.com/en-in/help/938120/error-message-when-you-use-user-credentials-to-connect-to-a-network-sh
"Multiple connections to a server or shared resource by the same user, using more than one user name, are not allowed. Disconnect all previous connections to the server or shared resource and try again"

KocamanFaruk commented 4 years ago

Restarting the windows service agent is running on, solved the issue for us.

vt1995 commented 4 years ago

Great it worked, yes this will also clear up all the connections.

marncosta commented 4 years ago

Restart the windows services was enough to solve my problem..

johnwhoffman commented 2 years ago

This issue just started on a single Test VM. System error 1219 when copying powershell scripts. I Added a "run" task just prior to the file copy task using the aforementioned net use /delete and that solved the issue. We do have this in a task group so all other VMs will benefit. Thank you to whoever suggested adding the net /use /y /delete

octaviancretu commented 2 years ago

I am facing the same error, but I am not sure if also other people face the same problem as i do.

Filecopy tasks perform the following important tasks.

  1. Creates new-PSDrive (actually it creates it several times and than closes it)
  2. Copies file with robocopy, however not via the PSDrive but as the UNC path
  3. Remove-psdrive

In my case I am copying multiple files in the first file transfer task (quite a lot of DLLs)

Now the interesting fact is that happens is that windows defender/antimalware service ("C:\Program Files\Windows Defender\MsMpEng.exe") kicks in and starts scanning the files copied remotely and even continues after point 3 finished. Now even that the PSDrive gets disconnected the scanning continues over the UNC path. It would be interesting to know if this scanning is done with the credentials used for mapping the drive or with the NT system account.

Now the error message occurs when the next file copy task starts, but the scanning of the previous one did not finish. In order to see if the issue is the same as mine I recommend you to use procmon (process monitor) and filter after the hostname. If you see the MsMpEng.exe process files on the UNC path during the time the error occurs, the following solution will work for you.

Working solution for my scenario:

  1. Exclude from scanning the UNC path to the drive. For example \server.domain.com\c$ Windows defender has also other options to exclude (exclude fileshare or mapped drive from scanning), however i would not recommend disable network share scanning. Mapped drie most probably will not work as robocopy uses the UNC path to copy the files.

Other possible workarounds that I see currently (besides the ones shared before):

  1. Copy only big files. Preferable not exe,zip dlls and files intensively scanned by Windows defender
  2. Do all the copy in one task (However multiple pipeline executions connecting to the same server will still hit this issue)

I hope this helps also for others.