microsoft / azure-pipelines-tasks

Tasks for Azure Pipelines
https://aka.ms/tfbuild
MIT License
3.42k stars 2.58k forks source link

[Question]: Inconsistent duration swap slots using AzureAppServiceManage@0 for Web Apps on dedicated Linux App Service plan #19273

Open Ruud2000 opened 7 months ago

Ruud2000 commented 7 months ago

Task name

AzureAppServiceManage@0

Task version

0.228.1

Environment type (Please select at least one enviroment where you face this issue)

Azure DevOps Server type

dev.azure.com (formerly visualstudio.com)

Azure DevOps Server Version (if applicable)

No response

Operation system

Windows 2019 datacenter-core-g2

Question

Recently we introduced a staging deployment slot for our Web Apps. So each Web App now has a staging and production slot. All Web Apps run on a dedicated Linux App Service plan (P1v3). Average CPU percentage is between 10 and 20, and average memory percentage around 80.

We now deploy our software to the staging slot and use Azure DevOps task AzureAppServiceManage@0 to swap the staging and production slots. The duration of a swap is not consistent between multiple deployments. Most of the times the duration is between 1m 30s and 2m 30s, but we also have occurrences where a swap takes more than 12 minutes. Especially when multiple Web Apps have a slow swap the pipeline takes a very long time, risking hitting the 60 minutes timeout.

The diagnostics show the swap starts by invoking:

[POST]https://management.azure.com/subscriptions/[redacted]/resourceGroups/Workload.WestEurope/providers/Microsoft.Web/sites/[redacted]/slots/staging/slotsswap?api-version=2016-08-01

Then we see the following call being invoked every 15 seconds, returning a HTTP response 202

[GET]https://management.azure.com/subscriptions/[redacted]/resourceGroups/Workload.WestEurope/providers/Microsoft.Web/sites/[redacted]/slots/staging/operationresults/08021b2d-33b1-4e10-bddb-8ac2b7ebd2cd?api-version=2016-08-01

And eventually after slightly more than 12 minutes this same call returns a HTTP response 200 and the swap is complete.

When we execute a swap in the Azure Portal we never seem to hit a slow swap. Looking at the developer tools in the browser while executing a swap in the portal shows a more current version for the swap API is used. The portal uses slotsswap?api-version=2018-11-01 while AzureAppServiceManage@0 uses api-version=2016-08-01. Could this perhaps explain the inconsistent durations?

I found this question from November 2021 which is almost identical to our situation: https://learn.microsoft.com/en-us/answers/questions/612601/optimizing-cd-pipeline-when-swapping-multiple-web Unfortunately I have not yet been able to verify if swap times are more consistent when using the AzureCLI@2 task as suggested in the answer to this question, because our self hosted build agent currently has no Azure CLI installed.

ivanBereznev commented 5 months ago

Experiencing the same issue. It takes 4-5 minutes to swap slots for a single web app. Tried both AzureAppServiceManage@0 and AzureCLI@2 and although the latter seems to be slightly faster all the results are still in the same ball park.

211211 commented 3 months ago

still facing the same issue on March 2024 with AzureAppServiceManage@0. Switched to AzureCLI@2 and it works fine.

My command: az webapp deployment slot swap -g {{your_rs_group}} -n {{app_name}} --slot {{source_slot}} --target-slot {{target_slot}}

devdeer-alex commented 2 months ago

I think this is related to Azure slots itself. The task just waits until the slot is swapped and this is sometimes taking a ridicilously long time. Its all over the usual discussions like on SO.

I currently randomly get the following output after 20+ minutes:

Starting: Swap Slot api-dd-alerting
==============================================================================
Task         : Azure App Service manage
Description  : Start, stop, restart, slot swap, slot delete, install site extensions or enable continuous monitoring for an Azure App Service
Version      : 0.238.1
Author       : Microsoft Corporation
Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/deploy/azure-app-service-manage
==============================================================================
Warming-up slots
Swapping App Service '***' slots - 'deploy' and 'production'
Successfully updated deployment History at https://***-deploy.scm.azurewebsites.net/api/deployments/35831713538731739
Successfully updated deployment History at https://***.scm.azurewebsites.net/api/deployments/35831713538731739
##[error]Error: Failed to swap App Service '***' slots - 'deploy' and 'production'. Error: ExpectationFailed - Cannot swap site slots for site '***' because the 'deploy' slot did not respond to http ping. (CODE: 417)
Finishing: Swap Slot api-dd-alerting
P-DHrestak commented 2 months ago

still facing the same issue on March 2024 with AzureAppServiceManage@0. Switched to AzureCLI@2 and it works fine.

My command: az webapp deployment slot swap -g {{your_rs_group}} -n {{app_name}} --slot {{source_slot}} --target-slot {{target_slot}}

Tried this solution but using AzureCLI@2 task takes just as long (20+ minutes) as does the AzureAppServiceManage one. The activity log has no useful data in it: image

tobias-johansson-nltg commented 2 months ago

still facing the same issue on March 2024 with AzureAppServiceManage@0. Switched to AzureCLI@2 and it works fine. My command: az webapp deployment slot swap -g {{your_rs_group}} -n {{app_name}} --slot {{source_slot}} --target-slot {{target_slot}}

Tried this solution but using AzureCLI@2 task takes just as long (20+ minutes) as does the AzureAppServiceManage one. The activity log has no useful data in it: image

We also tried this but with the same result as using AzureAppServiceManage@0. Is there no way of getting more information of what it is actually doing? Our deploy pipeline contains quite a few steps, including creating and deleting a database copy, and the two swaps are the steps that take by far the most time :)

omer-glazer commented 2 months ago

Same here. Our deployment swap takes up to 11 minutes, with no visible reason (activity logs/output logs).

chrisflem commented 2 months ago

Same here. I have noticed that if I access the slot in a browser, the swap completes shortly after. Is there a bug in the code calling the slot, since it works when I do it manually ?

goodmanmd commented 1 month ago

We ran into this during a deploy last night. Both slots were accessible via browser and yet the task timed out after 23 minutes (!) with this error:

Error: Failed to swap App Service 'xxx' slots - 'staging' and 'production'. Error: ExpectationFailed - Cannot swap site slots for site 'xxx' because the 'staging' slot did not respond to http ping. (CODE: 417)

Is the task actually looking at HTTP rather than HTTPS? If so, that could explain what's going on. Our site redirects HTTP => HTTPS and therefore would not be returning a 2xx response code for any HTTP request if that's what the script is looking for to determine success. Even if it's not using HTTP, in our app, all requests to / would redirect to an auth screen so the same issue could still apply.

FWIW we fell back to manually swapping slots for our apps via the portal and those processed successfully within 30-60 seconds.

DennisJensen95 commented 1 month ago

We ran into this during a deploy last night. Both slots were accessible via browser and yet the task timed out after 23 minutes (!) with this error:

Error: Failed to swap App Service 'xxx' slots - 'staging' and 'production'. Error: ExpectationFailed - Cannot swap site slots for site 'xxx' because the 'staging' slot did not respond to http ping. (CODE: 417)

Is the task actually looking at HTTP rather than HTTPS? If so, that could explain what's going on. Our site redirects HTTP => HTTPS and therefore would not be returning a 2xx response code for any HTTP request if that's what the script is looking for to determine success. Even if it's not using HTTP, in our app, all requests to / would redirect to an auth screen so the same issue could still apply.

FWIW we fell back to manually swapping slots for our apps via the portal and those processed successfully within 30-60 seconds.

Besides the varying deployment times, which we also experience. We are also experiencing like you @goodmanmd the same stochastic timeout, if you then rerun it it succeeds. There are no indiciations of why this happens. We are using AzureCLI@2 for the swap operation. How are you doing to swap @goodmanmd?

goodmanmd commented 1 month ago

@DennisJensen95 for this particular app our deploys are infrequent - perhaps once or twice a year. In this case we fell back to swapping the slots via the Azure Portal as it was only 3 applications with 2 swaps each (staging, production, last-known-good).

Edit: Re-reading the question and I think you may be asking what method we're using for the automated swap in our pipeline -- we are currently using AzureAppServiceManage@0

ash-skelton commented 1 month ago

Would be great if Microsoft acknowledged this. We are seeing the exact same thing (using AzureAppServiceManager@0). It's happening sporadically across a few of our apps but it has definitely been getting worse.

pumacln commented 4 weeks ago

@DennisJensen95 @goodmanmd

I am having the same issue. We use Azure PowerShell via Octopus Deploy to Start / Stop / Swap slots.

#Start the Staging Slot Start-AzWebAppSlot -ResourceGroupName "#{ResourceGroup}" -Name "#{Website}" -Slot "Staging"

Swap the staging slot into production

Switch-AzWebAppSlot -ResourceGroupName "#{ResourceGroup}" -Name "#{Website}" -SourceSlotName "Staging" -DestinationSlotName "Production"

Stop the Staging Slot

Stop-AzWebAppSlot -ResourceGroupName "#{ResourceGroup}" -Name "#{Website}" -Slot "Staging"

The behavior is the same, sometimes the swap operation will just time out. Re-running works 99% of the time.

Where is @Microsoft or @Azure support?

Saturate commented 4 weeks ago

We also see random swaps taking 20 minutes plus for a nodejs application. Sometime they timeout, rerunning works often.