microsoft / azure-pipelines-tasks

Tasks for Azure Pipelines
https://aka.ms/tfbuild
MIT License
3.5k stars 2.61k forks source link

[REGRESSION]: Azure file copy fails on one Agent in Pool #20041

Closed lucavoit closed 1 month ago

lucavoit commented 4 months ago

New issue checklist

Task name

Azure file copy

Breaking task version

6.*

Last working task version

5.*

Regression Description

We recently updated our azure service connection to use Workload identity federation, which broke our pipeline using v5.* so we changed to v6. This fixed the issues in our pipelines except for one pipeline which runs on a specific self hosted agent. On this agent the upload task fails with: Failed to perform Auto-login: PSContextCredentialGet-AzAccessToken: -REDACTED-;1mObject reference not set to an instance of an object. If I run the same task in a mini repro pipeline on any different agent in our pool, the task works fine.

In trying to fix this issue we updated the agent and all the powershell modules to the latest version - no difference. Any help would be greatly appreciated. Thanks

Environment type (Please select at least one enviroment where you face this issue)

Azure DevOps Server type

dev.azure.com (formerly visualstudio.com)

Azure DevOps Server Version (if applicable)

No response

Operation system

Windows 11

Relevant log output

2024-06-19T11:15:14.1507191Z INFO: Scanning...
2024-06-19T11:15:15.5513209Z 
2024-06-19T11:15:15.5514219Z Failed to perform Auto-login: PSContextCredentialGet-AzAccessToken: -REDACTED-;1mObject reference not set to an instance of an object.
2024-06-19T11:15:15.5514447Z .
2024-06-19T11:15:15.5639767Z ##[debug]ExceptionMessage: AzCopy.exe exited with non-zero exit code while uploading files to blob storage.
2024-06-19T11:15:15.5816208Z ##[debug]Processed: ##vso[task.logissue type=error;code={"Task_Internal_Error":"BlobUploadFailed"};]

Full task logs with system.debug enabled

No response

Repro steps

No response

v-schhabra commented 4 months ago

Hi @lucavoit Thanks for reporting the issue. Could you please share the complete pipeline logs by adding variable system.debug to "true"?

v-bsanthanak commented 4 months ago

Duplicate of #20003

lucavoit commented 4 months ago

Duplicate of #20003

@v-bsanthanak I'm sorry, but closing this issue as a duplicate of another issue (which has an entirely different error message) is hardly a satisfactory solution? Is there any info on how to fix this on our side or if you will fix this with an update?

lucavoit commented 4 months ago

@v-bsanthanak @v-schhabra Anymore info on this issue? We really need a solution / fix for this

v-schhabra commented 4 months ago

@v-bsanthanak @v-schhabra Anymore info on this issue? We really need a solution / fix for this

We have created a PR for this issue. Once it is merged and deployed will update here.

lucavoit commented 4 months ago

@v-schhabra could you maybe link the relevant PR or give some more updates?

v-schhabra commented 4 months ago

Hi @lucavoit Thanks for following up. PR has been created but it is waiting for the deployments to complete. https://github.com/microsoft/azure-pipelines-tasks/pull/20117

lucavoit commented 3 months ago

@v-schhabra this is still not working for us, even when using the new task version 6.242.10. Is there anything else we need to do / update?

v-schhabra commented 3 months ago

@lucavoit Could you pls share the complete logs by enabling system.debug to true?

lucavoit commented 3 months ago

@v-schhabra I could share the complete logs - but I would have to go through it and redact some lines. Is there anything in particular you're looking for (e.g. versions?) that I could send you? Otherwise the log looks the same to me, including the auto-login error message

v-schhabra commented 3 months ago

If the logs are same as earlier then it is fine. I just wanted to compare the logs with another cx who is also having same issue.

lucavoit commented 3 months ago

@v-schhabra yeah I see no relevant difference. Do you need anymore information from our side to investigate this further?

v-schhabra commented 3 months ago

@v-schhabra yeah I see no relevant difference. Do you need anymore information from our side to investigate this further?

As of now we dont need any info. Will post here if I need anything.

lucavoit commented 3 months ago

An interesting find: the upload task only fails in one specific release pipeline - there is a build pipeline also running on the same machine / agent that includes a blob upload which does work

lucavoit commented 3 months ago

in our case the agent runs on a physical machine

v-schhabra commented 2 months ago

@lucavoit We observed that the issue is not happening from task side and from the logs we could see the error is throwing from Azcopy.exe. So, can you pls create a new release pipeline and check if this task works fine?

lucavoit commented 2 months ago

@lucavoit We observed that the issue is not happening from task side and from the logs we could see the error is throwing from Azcopy.exe. So, can you pls create a new release pipeline and check if this task works fine?

does it need to be a new pipeline for it to work again?

v-schhabra commented 2 months ago

@lucavoit yes, can you try once?

lucavoit commented 2 months ago

@v-schhabra I have a small update: we manually upgraded the azcopy version from 10.25.1 to 10.26.0 since we found a log message regarding this in the latest run. This however did not change the outcome. However we are yet to test a new pipeline as you suggested, as we are a bit apprehensive about the effort it would take us considering our current workload

v-schhabra commented 2 months ago

@lucavoit We observed that the issue is not happening from task side and from the logs we could see the error is throwing from Azcopy.exe. So, can you pls create a new release pipeline and check if this task works fine?

Please share the complete debug build and release pipeline logs at v-schhabra@microsoft.com?

v-schhabra commented 2 months ago

@lucavoit please share the logs for further investigation?

lucavoit commented 2 months ago

@v-schhabra yes, we will provide the logs later today

lucavoit commented 2 months ago

@v-schhabra any updates?

lucavoit commented 1 month ago

Well @v-schhabra I did some more digging and I have an update for you:

I made a minimal pipeline setup for debug purposes, using a Azure CLI task to execute my own powershell script. Using the following script I got it to work:

$env:AZCOPY_AUTO_LOGIN_TYPE="PSCRED" $env:AZCOPY_TENANT_ID="<tenant-id>" Get-AzContext azcopy.exe copy ......

with the Get-AzContext being vital for it to work properly, before that I ran into the exact same error message I got as using your task - so maybe this could be a solution here to?

v-schhabra commented 1 month ago

Hi @lucavoit Thanks for the update.