pulumi / pulumi-azure-native

Azure Native Provider
Apache License 2.0
127 stars 34 forks source link

azure-native: Pulumi fails to recover after a server-farm gets auto-deleted by Azure #2697

Open hognevevle opened 1 year ago

hognevevle commented 1 year ago

What happened?

 ++ azure-native:web:WebApp Foo-app-staging **creating failed** [diff: ~siteConfig]; error: Code="NotFound" Message="Cannot find ServerFarm with name FooApp-staging." Details=[{"Message":"Cannot find ServerFarm with name FooApp-staging."},{"Code":"NotFound"},{"ErrorEntity":{"Code":"NotFound","ExtendedCode":"51004","Message":"Cannot find ServerFarm with name FooApp-staging.","MessageTemplate":"Cannot find {0} with name {1}.","Parameters":["ServerFarm","FooApp-staging"]}}]

Hi there, this has become somewhat of a constantly returning issue for us. As far as I can understand, this happens after a deployment where our built app fails to start in App Services. After some retries, the App Service and Service Farm gets deleted from Azure, which causes issues in Pulumi.

Despite pulumi refresh, Pulumi still tries to make changes to this non-existing resource, and fails to recover.

The only working solution has been to run pulumi state delete "<urn>" to clear out the stale resources.

Expected Behavior

pulumi refresh to detect that the resources no longer exists, and subsequently recreate them in a following pulumi up.

Steps to reproduce

Provision the resources for an Azure App Services instance, and deploy a failing app to it. It will eventually get torn down by Azure, at which point the Pulumi state is inconsistent and fails to recover.

Output of pulumi about

CLI          
Version      3.72.2
Go Version   go1.20.5
Go Compiler  gc

Plugins
NAME          VERSION
azure-native  2.2.0
command       0.5.2
dotnet        unknown
mailgun       3.5.0-alpha.1691733420+a3d2bc7b
random        4.13.2

Host     
OS       darwin
Version  13.4
Arch     arm64

This project is written in dotnet: executable='/usr/local/share/dotnet/dotnet' version='7.0.305'

Current Stack: superagent/Superagent.Backend.Infrastructure/dev-hogne

TYPE                                                URN
[...]
Found no pending operations associated with dev-hogne

Backend        
Name           pulumi.com
URL            https://app.pulumi.com/hognevevle
User           hognevevle
Organizations  hognevevle, superagent

Dependencies:
NAME                VERSION
Pulumi              3.56.0
Pulumi.AzureNative  2.2.0
Pulumi.Command      0.5.2
Pulumi.Mailgun      3.5.0-alpha.1691733420
Pulumi.Random       4.13.2

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

hognevevle commented 1 year ago

Have also experienced this with other resources, described in another issue. Sorry if I opened this issue in the wrong repo.

dixler commented 1 year ago

Hey. Thanks for filing this and the extra context. It seems really tricky.

From my read of it, it sounds like an azure native issue so I'm going to transfer this issue to azure-native.

Side-note: If you're comfortable with terminal text editors, there's an experimental command called pulumi state edit that could potentially make it easier to pulumi state delete multiple resources.

iwahbe commented 1 year ago

Hi @hognevevle. Thanks for reporting this issue. This repo (https://github.com/pulumi/pulumi-azure-native) is the correct place for this issue.