microsoft / azure-pipelines-tasks

Tasks for Azure Pipelines
https://aka.ms/tfbuild
MIT License
3.5k stars 2.61k forks source link

[BUG]: Exit 57005 StackOverflowException using AzurePowershell@5 #20156

Open trevors20 opened 3 months ago

trevors20 commented 3 months ago

New issue checklist

Task name

AzurePowerShell

Task version

5.242.1

Issue Description

We are seeing a lot of Exit 57005 build breaks when using AzurePowerShell@5. It looks like from the logs there is a StackOverflowException exception that is happening. Was wondering if you might be able to help debug and fix this and/or help diagnose why this might be happening and provide a workaround or anything like that. Much appreciated.

Environment type (Please select at least one enviroment where you face this issue)

Azure DevOps Server type

dev.azure.com (formerly visualstudio.com)

Azure DevOps Server Version (if applicable)

No response

Operation system

Windows 2022

Relevant log output

2024-07-16T14:22:16.2568594Z VERBOSE: Testing whether type can be loaded: 'Microsoft.VisualStudio.Services.WebApi.VssClientHttpRequestSettings'
2024-07-16T14:22:16.2573359Z VERBOSE: The type was loaded successfully.
2024-07-16T14:22:16.2578674Z VERBOSE: Leaving Get-OMType.
2024-07-16T14:22:16.2793622Z 
2024-07-16T14:22:16.2793956Z Process is terminated due to StackOverflowException.
2024-07-16T14:22:16.2852511Z VERBOSE: AGENT_SKIPCERTVALIDATION (empty)
2024-07-16T14:22:16.2857885Z VERBOSE:  Converted to bool: False
2024-07-16T14:22:16.2863891Z VERBOSE: Constructing HTTP client.
2024-07-16T14:22:16.3287111Z VERBOSE: Leaving Get-VstsVssHttpClient.
2024-07-16T14:22:16.3296507Z VERBOSE: Removing assemlby resolver.
2024-07-16T14:22:16.3347027Z VERBOSE: SYSTEM_PLANID: '[REDACTED]'
2024-07-16T14:22:16.3377086Z VERBOSE: SYSTEM_JOBID: '[REDACTED]'
2024-07-16T14:22:16.3406912Z VERBOSE: SYSTEM_HOSTTYPE: 'build'
2024-07-16T14:22:16.3436924Z VERBOSE: SYSTEM_TEAMPROJECTID: '[REDACTED]'
2024-07-16T14:22:16.7096512Z ##[command]Clear-AzContext -Scope CurrentUser -Force -ErrorAction SilentlyContinue
2024-07-16T14:22:16.8158020Z ##[command]Clear-AzContext -Scope Process
2024-07-16T14:22:16.8497381Z ##[command]Connect-AzAccount 
2024-07-16T14:22:16.8497676Z Name                           Value                                                                                   
2024-07-16T14:22:16.8497908Z ----                           -----                                                                                   
2024-07-16T14:22:16.8498114Z Tenant                         [REDACTED]                                                   
2024-07-16T14:22:16.8498310Z Scope                          Process                                                                                 
2024-07-16T14:22:16.8498491Z Environment                    AzureCloud                                                                              
2024-07-16T14:22:16.8498754Z ApplicationId                  ***                                                    
2024-07-16T14:22:16.8498944Z ServicePrincipal               True                                                                                    
2024-07-16T14:22:16.8499030Z 
2024-07-16T14:22:16.8499074Z 
2024-07-16T14:22:16.8499122Z 
2024-07-16T14:22:16.8635574Z WARNING: The feature related to parameter name 'FederatedToken' is under preview.
2024-07-16T14:22:18.4771754Z ##[debug]Agent environment resources - Disk: D:\ Available 297357.35 MB out of 307198.00 MB, Memory: Used 5979.00 MB out of 32767.00 MB, CPU: Usage 7.00%
2024-07-16T14:22:23.7474754Z ##[debug]Agent environment resources - Disk: D:\ Available 297357.35 MB out of 307198.00 MB, Memory: Used 6086.00 MB out of 32767.00 MB, CPU: Usage 4.00%
2024-07-16T14:22:29.0309961Z ##[debug]Agent environment resources - Disk: D:\ Available 297357.35 MB out of 307198.00 MB, Memory: Used 6055.00 MB out of 32767.00 MB, CPU: Usage 9.00%
2024-07-16T14:22:34.2999428Z ##[debug]Agent environment resources - Disk: D:\ Available 297357.35 MB out of 307198.00 MB, Memory: Used 6075.00 MB out of 32767.00 MB, CPU: Usage 5.00%
2024-07-16T14:22:39.5848325Z ##[debug]Agent environment resources - Disk: D:\ Available 297357.35 MB out of 307198.00 MB, Memory: Used 6110.00 MB out of 32767.00 MB, CPU: Usage 8.00%
2024-07-16T14:22:44.8577747Z ##[debug]Agent environment resources - Disk: D:\ Available 297357.35 MB out of 307198.00 MB, Memory: Used 6151.00 MB out of 32767.00 MB, CPU: Usage 7.00%
2024-07-16T14:22:47.1743146Z ##[debug]Exit code: 57005

Full task logs with system.debug enabled

 LOGGING GOES OVER 65536 characters!  I will respond to this thread and upload it as a file.  

Repro steps

- task: AzurePowerShell@5
          displayName: DisplayName
          inputs:
            azureSubscription: '[REDACTED]'
            workingDirectory: '$(Build.SourcesDirectory)'
            ScriptPath: 'src\Scripts\script.ps1'
            ScriptArguments: '-AccessToken $(System.AccessToken) -Param2 $(param2) -Param3 $(param3)'
            azurePowerShellVersion: LatestVersion
          condition: succeeded()
trevors20 commented 3 months ago

I'm attaching the azurepowershell_error.txt logs as a file to this comment.

trevors20 commented 3 months ago

This is happening 80% of the time. For the 20% of the time, it does work. I'm uploading an example of logging when the task does actually work to this comment. azurepowershell_working.txt

Please help. Thanks muchly!

trevors20 commented 3 months ago

We have decided to pin to an old version of AzurePowerShell@5 until this is fixed or figured out. That works great.

v-schhabra commented 3 months ago

Hi @trevors20 Thanks for reporting this issue. We are investigating on it.

huizhonglong commented 3 months ago

It'd heavily impacted our VS builds across pipelines before we pinned to the old version as a workaround to unblock folks.

You can get all the failed build IDs from below query if comment out the last line (| summarize) if that helps:

Go to Log Analytics and run query

let _start = datetime('2024-07-12T00:00:00Z'); let _end = datetime('2024-07-17T00:00:00Z'); customEvents | extend BuildDefinitionId = tostring(customDimensions.BuildDefinitionId) | extend BuildDefinitionName = tostring(customDimensions.BuildDefinitionName) | extend BuildId = tostring(customDimensions.BuildId) |where timestamp between(_start .. _end) | where customDimensions.RecordName == "Layout BuildArtifacts" and customDimensions.Message contains "PowerShell exited with code '57005'" | distinct BuildDefinitionId, BuildDefinitionName, BuildId | summarize FailedCount = count() by BuildDefinitionId, BuildDefinitionName

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

BuildDefinitionId | BuildDefinitionName | FailedCount -- | -- | -- 10310 | DD-CB-PR | 41 10289 | DD-CB-TestSignVS | 13 11051 | DD-CB-EUCU-TestSignVS | 1

fernando-lopezblanco-lrn commented 3 months ago

Hi. Same. We've started experiencing these stack overflows today, with that AzurePowershell@5 task. Testing now version 4...

trevors20 commented 3 months ago

Thanks @fernando-lopezblanco-lrn . If it helps, we pinned to version 5.241.5 and that has stabilized our builds.

-task: AzurePowerShell@5.241.5

fernando-lopezblanco-lrn commented 3 months ago

Thanks @trevors20 for very quick response. Will try that version later on.

Same issue with AzurePowershell@4 (shown as Version : 4.242.0 in the pipeline logs). Testing now @5.241.5

fernando-lopezblanco-lrn commented 3 months ago

@5.241.5 is looking good, but other pipelines with @4 are starting to fail as well

trevors20 commented 3 months ago

@fernando-lopezblanco-lrn , I'm not sure which version to pin to for the AzurePowershell@4. But, you should be able to find that by looking at your older builds that were working before you started using the 4.242.0 version of the task.

At the top of the AzurePowershell@4 step in the build, you should see a header at the top of the logs. Something like this:

Task : Azure PowerShell Description : Run a PowerShell script within an Azure environment Version : 5.241.5 Author : Microsoft Corporation Help : https://aka.ms/azurepowershelltroubleshooting

If you work backwards and find one of your older builds with a previous version than the 4.242.0 version then try pinning the task to the previous version that you find. To give a rough timeframe, I know that the AzurePowershell@5 task was updated a couple of weeks ago which started the instability here. So, you may need to look back a couple of weeks or so. :(

trevors20 commented 3 months ago

@fernando-lopezblanco-lrn , actually, we should be able to see the versions in the source code history. The version is specified in the task.json for the task. In this case for AzurePowershell@4, that would be here: https://github.com/microsoft/azure-pipelines-tasks/commits/master/Tasks/AzurePowerShellV4/task.json

Looking at the history, the change that was checked in a couple of weeks ago, if you look at the previous change it looks like you may want to pin to 4.241.2. Try it and see?

fernando-lopezblanco-lrn commented 3 months ago

@trevors20 yeah, the version 4.242.0 I gave you above, I took it from the header of the logs as you later mentioned. Testing 4.241.0 now, for 4.241.2 does not look to exist as it's prompting me to install it from the Marketplace

admachina commented 2 months ago

Curious if anyone has seen this issue recently. We took the pinned version workaround, and I'm looking to unwind it if it's fixed.

admachina commented 2 months ago

Was able to get this to reproduce on 5.243.7

admachina commented 2 months ago

This bug seems it may be mitigated also by switching to pwsh mode instead of legacy powershell. Adding pwsh: true to the pipeline

Having good success with this if you wanted to unpin the version

daxianji007 commented 2 months ago

Also have this issue. Waiting for advices.

onetocny commented 1 month ago

Hi @trevors20,

In order to proceed with investigation, could please share following details?

  1. What kind of ARM service endpoint are you using in AzurePowerShell tasks?
  2. Where is the build agent running?
  3. What is content of script.ps1?
  4. How does the whole pipeline definition looks like?
v-schhabra commented 1 month ago

Hi @trevors20 Could you please share the above details if the issue is occurring?

rcpd-alt commented 1 month ago

We're seeing this too, earliest seen build affected is 5.245.1 and still seeing it in 5.245.3

trevors20 commented 1 month ago

@onetocny , @v-schhabra My apologies. Was on vacay when the questions were sent. Just now looking to see the status and just saw the questions.

  1. This is an ARM using workload identity federation with openid connect
  2. Not quite sure what you are looking for here. Do you mean the azure subscription? Or, the agent pool?
  3. I'll send directly
  4. I'll send directly
onetocny commented 3 days ago

We have identified the root cause in Az.Accounts module. You can find more details about that in related issue https://github.com/Azure/azure-powershell/issues/26623. In parallel we were able to mitigate the issue on our side. New fixed versions (X.248.Y) of all PowerShell based tasks will be deployed next week.

If there is a need to use latest AzurePowerShell tasks immediately there is a confirmed workaround. You can prepend following task before the failing AzurePowerShell tasks:

- task: PowerShell@2
        inputs:
          targetType: 'inline'
          script: |
            Install-Module Az.Accounts -Force -AllowClobber -ErrorAction Stop -SkipPublisherCheck
            Import-Module Az.Accounts -Global -PassThru -Force
            Update-AzConfig -CheckForUpgrade $false