pulumi / pulumi-azure-native

Azure Native Provider
Apache License 2.0
125 stars 34 forks source link

Confusing validation errors for WebApp #3411

Open j-fulbright opened 3 months ago

j-fulbright commented 3 months ago

What happened?

I believe I've seen this issue reported in another place but it was related to the handling of the AnotherOperationInProgress error codes

Moving our function app to a Consumption Service plan was causing a lot of issues, with the below error.

"Message":"Cannot modify this site because another operation is in progress. Details: Id: 5ab8c799-ca7d-4aa6-ad95-854713f8ff41, OperationName: Create,

After a full day of logging and trying new things, I believe I have determined it is due to the alwaysOn value in siteConfig

siteConfig: {
    alwaysOn: args.kind === 'functionapp' ? false : true, // Always on not needed for function apps in consumption plan and will break it
},

Adding the above handling in my code, so it is turned off for my function app only, resolved the issue.

https://github.com/Azure/Azure-Functions/wiki/Enable-Always-On-when-running-on-dedicated-App-Service-Plan

Example

Output of pulumi about

na

Additional context

Seems like this likely should be handled at the Pulumi level to either error or ignore the setting, if possible, if we know the plan is a consumption plan

sku: {
    name: 'Y1',
    tier: 'Dynamic',
},

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

danielrbradley commented 3 months ago

Hi @j-fulbright sorry to hear this was a tricky one to diagnose.

This provider doesn't do any validation of input combination itself - it only checks the shape of the schema. The validation of input combinations is done by the Azure service itself as these can vary by service version, of which there are many thousands in total. This provider's responsibility is to relay any validation failures back to the user.

It sounds like there is some odd service behaviour happening when enabling the "always on" option. The best way we can help is if we've got a simple set of steps to reproduce the issue.

It sounds like this was only failing when trying to transition a FunctionApp with the setting alwaysOn: true from a dedication "App Service Plan" to the "Consumption" model at which point the resource operation was reporting that another operation was in progress. Does that summarise it correctly?

There might be an issue to raise with Microsoft for them to improve the error messages for this scenario. We could also add a specific note about this migration type to the documentation to help others hitting this issue. However it would be good to be able to prove the root cause of the issue first.

Please could you share a simple code snippet summarising the options used for the initial and modified configuration of the FunctionApp and it's service plan? Thanks!

j-fulbright commented 3 months ago

Thanks @danielrbradley

Your summarization is pretty spot on, although this was a new function app being spun up on a consumption plan, when we previously were using a App Service Plan. It was from a full torn down stack so it wasn't that we were trying to migrate it from one plan to another. (Just wanted to make sure that was clear)

Regular plan:

const appServicePlan = new azureNative.web.AppServicePlan(`asp-${suffix}`,
    {
        resourceGroupName: resourceGroup.name,
        location: resourceGroup.location,
        kind: 'app',
        reserved: true,
        sku: {
            name: stackName === 'prod' ? 'P3V3' : 'B3',
            tier: stackName === 'prod' ? 'PremiumV3' : 'Basic',
            capacity: stackName === 'prod' ? 2 : 1,
        },
    },
    {
        dependsOn: [resourceGroup]
    }
);

Consumption plan:

const functionAppServicePlan = new azureNative.web.AppServicePlan(`asp-func-${suffix}`,
    {
        resourceGroupName: functionResourceGroup.name,
        location: functionResourceGroup.location,
        kind: 'functionapp',
        reserved: true,
        sku: {
            name: 'Y1',
            tier: 'Dynamic',
        },
    },
    {
        dependsOn: [functionResourceGroup]
    }
);

We have a custom resource built to handle misc things like file blobs and settings per app, since we need to spin up 7 apps in our stack. so ill paste relevant settings for function app

 // Build the app settings array
        let appSettingsArray = [
            {
                name: 'DD_ENV',
                value: stackName === 'prod' ? 'production' : stackName,
            },
            {
                name: 'DD_SERVICE',
                value: 'abcdef',
            },
            {
                name: 'WEBSITE_RUN_FROM_PACKAGE',
                value: fileBlob.url,
            },
            {
                name: 'WEBSITE_START_SCM_ON_SITE_CREATION',
                value: pulumi.output('1'),
            },
            {
                name: 'WEBSITE_TIME_ZONE', // NOT supported for Function App on Consumption Plan
                value: pulumi.output('America/Chicago'),
            },
        ];

        if (args.kind === 'functionapp') {
            appSettingsArray = appSettingsArray.concat([
                {
                    name: 'FUNCTIONS_EXTENSION_VERSION',
                    value: pulumi.output('~4'),
                },
                {
                    name: 'FUNCTIONS_WORKER_RUNTIME',
                    value: pulumi.output('dotnet-isolated'),
                },
                {
                    name: 'AzureWebJobsStorage',
                    value: storageAccountConnectionString,
                }
            ]);
        }
  const app = new azureNative.web.WebApp(
            `as-${name}-${suffix}`,
            {
                resourceGroupName: args.altResourceGroupName || args.resourceGroupName,
                location: args.location,
                serverFarmId: args.serverFarmId,
                name: `${name}-${suffix}`,
                siteConfig: {
                    linuxFxVersion: args.kind === 'functionapp' ? 'DOTNET-ISOLATED|8.0' : 'DOTNETCORE|8.0',
                    appSettings: appSettingsArray,
                    alwaysOn: args.kind === 'functionapp' ? false : true, //alwaysOn set to true breaks deploy of function apps in consumption plan, as it is not supported
                    http20Enabled: true,
                },
                identity: {
                    type: azureNative.web.ManagedServiceIdentityType.SystemAssigned,
                },
                kind: args.kind,
                httpsOnly: true,
                clientAffinityEnabled: false,
            },
            { parent: this, dependsOn: [appInsights] },
        );

Mainly it is just the alwaysOn value that when set to true causes the operation is in progress errors, as soon as it was set to false, it deployed successfully and has every time

j-fulbright commented 3 months ago

I was able to reproduce this with a super simple app.

const functionResourceGroup = new azureNative.resources.ResourceGroup(`rgf-${suffix}`, {
    resourceGroupName: `rgf-${suffix}`,
    location: 'centralus',
});
const functionAppServicePlan = new azureNative.web.AppServicePlan(`asp-func-${suffix}`,
    {
        resourceGroupName: functionResourceGroup.name,
        location: functionResourceGroup.location,
        kind: 'functionapp',
        reserved: true,
        sku: {
            name: 'Y1',
            tier: 'Dynamic',
        },
    },
    {
        dependsOn: [functionResourceGroup]
    }
);
// Create empty initial webapps
const functionsApp = new azureNative.web.WebApp(
    `as-functions-${suffix}`, {
    resourceGroupName: functionResourceGroup.name,
    location: functionResourceGroup.location,
    serverFarmId: functionAppServicePlan.id,
    name: `functions-${suffix}`,
    kind: 'functionapp,linux',
    siteConfig: {
        linuxFxVersion: 'DOTNET-ISOLATED|8.0',
        alwaysOn: false,
    },
});

This one works, but as soon as kind or alwaysOn was removed, it would cause the same issue, so it seems like the already in process message is just hiding the actual error that Azure is returning, which does make an Azure API/Cli issue more than likely.

danielrbradley commented 2 months ago

Thanks for the feedback @j-fulbright

Deploying your example above and updating alwaysOn: true resulted in the error: error: autorest/azure: Service returned an error. Status=<nil> <nil>. There was a conflict. AlwaysOn cannot be set for this site as the plan does not allow it. For more information on pricing and features, please see: https://aka.ms/appservicepricingdetails

Deploying your example and removing the alwaysOn: false deploys without error.

Deploying your example without the WebApp kind property fails with the same error you were seeing: error: autorest/azure: Service returned an error. Status=<nil> <nil>. Cannot modify this site because another operation is in progress. Details: Id: c59de21f-8b95-4717-872a-ff1bdf0ec924, OperationName: Create, CreatedTime: 7/11/2024 8:51:44 AM, RequestId: 032bb8fe-959c-4786-b8e3-1bddb35d9b09, EntityType: 3. Removing the kind property from an existing WebApp will cause a replacement meaning it's the same as the above error as it's just doing a brand new create.

When running with verbose logging (pulumi up --yes --skip-preview --logtostderr --logflow -v=10 2> out.txt) we see that we're actually making two separate PUT requests when attempting the Create operation for the resource. The first PUT request contains the more helpful response message: Consumption pricing tier cannot be used for regular web apps..

Attaching a debugger allowed me to locate the source of the double request. This is triggered by the go-autorest library's retry mechanism. This code specifically looks to retry errors which have "409 Conflict" status. When it then retries again in quick succession, subsequent requests fail with "409 Conflict" status again, but this time with the message "Cannot modify this site because another operation is in progress. Details: Id: 7671ff14-2e7b-4ebd-98d7-6eee53ae2b27, OperationName: Create, CreatedTime: 7/11/2024 10:56:27 AM, RequestId: abdbabfc-49f2-4f6c-bdad-1fcb9c9527b8, EntityType: 3". I think this is because attempting to create a Web App temporarily modifies the associated AppServicePlan internally. The last error message is then returned to the user, therefore hiding the original, more helpful, error message.

From these observations I can confirm that this is an Azure API bug because the validation failure should be returned with a status code of 4xx which would then not be retried and a useful error message would be returned.

I would recommend approaching Azure support with this information and request they fix the status code of the original validation failure error.

j-fulbright commented 2 months ago

Just now seeing this @danielrbradley ! Thank you so much! I copied your command line for reference in the future to look into things.

This makes sense that there is a lot of requests being made and were only getting the one back, so definitely an issue with Azure API unfortunately