microsoft / azure-pipelines-tasks

Tasks for Azure Pipelines
https://aka.ms/tfbuild
MIT License
3.5k stars 2.61k forks source link

[BUG]: Azure Functions Deploy task failing repeatedly #19807

Open karun-verghese opened 6 months ago

karun-verghese commented 6 months ago

New issue checklist

Task name

Azure Functions Deploy

Task version

2.238.1

Issue Description

Our CD pipeline that deploys our services and function apps is failing in one environment only. The environment happens to be in Azure CN cloud. However, other deployments to the Azure CN cloud worked fine, only one environment seems to fail repeatedly. The error message reads:

[error]Error: Failed to get resource ID for resource type 'Microsoft.Web/Sites' and resource name ''. Error: Could not fetch access token for Azure. Status code: endpoints_resolution_error, status message: Error: could not resolve endpoints. Please check network and try again. Detail: ClientAuthError: openid_config_error: Could not retrieve endpoints. Check your authority and verify the .well-known/openid-configuration endpoint returns the required endpoints. Attempted to retrieve endpoints from: https://login.partner.microsoftonline.cn//v2.0/.well-known/openid-configuration

When I call the well known endpoint in my browser it works just fine.

Additionally, all my other deployments work fine, we are not configuring this environment any differently other than choosing the right service principal. The Service Principal itself was checked and is valid.

Environment type (Please select at least one enviroment where you face this issue)

Azure DevOps Server type

dev.azure.com (formerly visualstudio.com)

Azure DevOps Server Version (if applicable)

No response

Operation system

Ubuntu latest

Relevant log output

2024-04-23T06:15:22.5394220Z ##[section]Starting: Deploy Catalog Cache Function App
2024-04-23T06:15:22.5402680Z ==============================================================================
2024-04-23T06:15:22.5402913Z Task         : Azure Functions Deploy
2024-04-23T06:15:22.5403083Z Description  : Update a function app with .NET, Python, JavaScript, PowerShell, Java based web applications
2024-04-23T06:15:22.5403334Z Version      : 2.238.1
2024-04-23T06:15:22.5403491Z Author       : Microsoft Corporation
2024-04-23T06:15:22.5403656Z Help         : https://aka.ms/azurefunctiontroubleshooting
2024-04-23T06:15:22.5403827Z ==============================================================================
2024-04-23T06:15:25.0280162Z Got service connection details for Azure App Service:'biz-common-ecn-sand-catalogcache-func'
2024-04-23T06:15:35.2136506Z ##[error]Error: Failed to get resource ID for resource type 'Microsoft.Web/Sites' and resource name '<my function app>'. Error: Could not fetch access token for Azure. Status code: endpoints_resolution_error, status message: Error: could not resolve endpoints. Please check network and try again. Detail: ClientAuthError: openid_config_error: Could not retrieve endpoints. Check your authority and verify the .well-known/openid-configuration endpoint returns the required endpoints. Attempted to retrieve endpoints from: https://login.partner.microsoftonline.cn/<tenantid>/v2.0/.well-known/openid-configuration
2024-04-23T06:15:35.2205561Z ##[section]Finishing: Deploy Catalog Cache Function App

Full task logs with system.debug enabled

 [REPLACE THIS WITH YOUR INFORMATION] 

Repro steps

No response

karun-verghese commented 6 months ago

Is there any documentation about what permissions are needed by the Service Connection? At the moment my service connection has a Contributor role at the resource group level, the same resource group under which the function app resides. This is the same as the service connections for my other deployments

abagonhishead commented 6 months ago

Also having this issue, but rather than Azure Functions, we're trying to deploy an app service to Azure China. Specifically it's the 'Azure App Service deploy' task v4 that we're using, although I also tried v3 and had the same problem. We are getting this on both Azure-hosted agents and self-hosted agents.

I am fairly certain this isn't a permissions issue. I tested this with a manually configured ARM service principal service connection, and also set up a new ARM identity federation service connection according to this guide. Both service connections exhibit exactly the same issue with app service deployments, but work fine with everything else -- we have multiple Azure Powershell release tasks using the same service connections, some of them doing very privileged things like deploying container apps, and they are all working fine. Our two Azure App Service deploy tasks fail completely, however, with the following:

2024-04-23T13:25:40.3283519Z ##[section]Starting: Deploy: set app service to deployed image
2024-04-23T13:25:40.3293014Z ==============================================================================
2024-04-23T13:25:40.3293173Z Task         : Azure App Service deploy
2024-04-23T13:25:40.3293283Z Description  : Deploy to Azure App Service a web, mobile, or API app using Docker, Java, .NET, .NET Core, Node.js, PHP, Python, or Ruby
2024-04-23T13:25:40.3293471Z Version      : 4.238.1
2024-04-23T13:25:40.3293562Z Author       : Microsoft Corporation
2024-04-23T13:25:40.3293656Z Help         : https://aka.ms/azureappservicetroubleshooting
2024-04-23T13:25:40.3293786Z ==============================================================================
2024-04-23T13:25:44.0584652Z Got service connection details for Azure App Service:'***'
2024-04-23T13:25:53.2883895Z ##[error]Error: Failed to get resource ID for resource type 'Microsoft.Web/Sites' and resource name '***'. Error: Could not fetch access token for Azure. Status code: endpoints_resolution_error, status message: Error: could not resolve endpoints. Please check network and try again. Detail: ClientAuthError: openid_config_error: Could not retrieve endpoints. Check your authority and verify the .well-known/openid-configuration endpoint returns the required endpoints. Attempted to retrieve endpoints from: https://login.partner.microsoftonline.cn/***/v2.0/.well-known/openid-configuration
2024-04-23T13:25:53.2919412Z ##[section]Finishing: Deploy: set app service to deployed image

The important bit is: Status code: endpoints_resolution_error, status message: Error: could not resolve endpoints. Please check network and try again. Detail: ClientAuthError: openid_config_error: Could not retrieve endpoints. Check your authority and verify the .well-known/openid-configuration endpoint returns the required endpoints. Attempted to retrieve endpoints from: https://login.partner.microsoftonline.cn/***/v2.0/.well-known/openid-configuration

Based on the error message, and the fact that routing the request through a proxy server first resolves the issue (see my workaround below,) I think it might be related to this issue in azure/msal-node. Maybe there's a transparent proxy server somewhere along the route to China that the library doesn't like?

This is really frustrating and has taken me almost all day to work around. Not only do we now have to maintain a self-hosted agent purely for app service deployments into China, we're also unable to make use of our parallel jobs on our China deployment pipelines.

Any ideas on a fix, please?

EDIT: Not sure if this is important, but we're based in the UK, which may affect the location of Azure-hosted agents that are assigned to our organisation (and therefore the routing to Azure China.)

Workaround

If you're able to set up your own self-hosted agent and use that, then there is a workaround. It worked for us, at least!

karun-verghese commented 6 months ago

@abagonhishead Thanks for the information on the workaround, I'll take that back to our delivery infrastructure team and see if they can help me with that.

But yes, still hoping for a fix here :(

karun-verghese commented 6 months ago

It looks like more and more people face this issue as seen at the link below. Still waiting for a response here. https://developercommunity.visualstudio.com/t/Deploying-to-Azure-China:-Could-not-fetc/10652428?viewtype=all

karun-verghese commented 6 months ago

@abagonhishead I think I've found another workaround. I switched the agent from a linux agent to a windows agent. The deployment worked fine on that agent. Did you already try this? It looks like this issue only affects linux agents. Still, only a workaround.

nakah commented 5 months ago

I'm also having the same issue recently. I managed to deploy application by switching from AzureRmWebAppDeployment@4 tasks for (Web App & Functions) to AzureFunctionApp@1 & AzureWebApp@1. However, it's failing now on AzureAppServiceManage@0 when starting the App Service. It used to work perfectly, I suspect a regression in these tasks with latest releases.

karun-verghese commented 3 months ago

@FinVamp1 is there any update on this issue? It has been several weeks so just checking in

qindj commented 2 months ago

any updates? the same issue here