pulumi / pulumi-aws

An Amazon Web Services (AWS) Pulumi resource package, providing multi-language access to AWS
Apache License 2.0
460 stars 155 forks source link

Resource replacement due to parent provider rename deletes actual AWS resource #2009

Closed ssemyonov closed 7 months ago

ssemyonov commented 2 years ago

What happened?

When a resource is based on an explicit provider defined via CustomResourceOptions.Provider and the parent provider is renamed (Pulumi resource name is changed), pulumi up will cause provider recreation and resource replacement operation for any dependant resources. Such resource replacement will result in the original cloud resource to be deleted, while Pulumi state still consider it as existing.

Steps to reproduce

  1. With the following program, run pulumi up while having AWS credentials configured:
    
    using Pulumi;
    using Pulumi.Aws;
    using Pulumi.Aws.Sns;

await Deployment.RunAsync(() => { var provider = new Provider( "original", new ProviderArgs { Region = "eu-west-1" });

var topic = new Topic(
  "a-topic",
  new TopicArgs { Name = "a-topic" },
  new CustomResourceOptions { Provider = provider });

});

2. In the program above change provider name from `"original"` to `"renamed"` (or any other string) and run `pulumi up` again.
3. Check AWS for existance of `a-topic` and/or run `pulumi refresh` to see that actual resource has been deleted and the stack state went out of sync.

### Expected Behavior

The topic still exists in AWS and Pulumi state is correctly adjusted to re-home affected resources without impacting the infrastructure. 
Changing only provider name should not cause any changes to the resources dependant on it and especially silent deletion of the resources shouldn't happen.

### Actual Behavior

The topic is deleted from AWS meanwile Pulumi state considers it as still existing.

### Versions used

CLI Version 3.34.1 Go Version go1.17.11 Go Compiler gc

Plugins NAME VERSION aws 5.8.0 dotnet unknown

Host OS Microsoft Windows 10 Pro Version 10.0.19041 Build 19041 Arch x86_64

This project is written in dotnet: executable='C:\Program Files\dotnet\dotnet.exe' version='6.0.201'

Dependencies: NAME VERSION Pulumi 3.34.1 Pulumi.Aws 5.8.0



### Additional context

_No response_

### Contributing

Vote on this issue by adding a 👍 reaction. 
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already). 
Frassle commented 2 years ago

So what's happening here is the engine wants to recreate the Topic resource because it's provider has changed. This is by design, if you want to rename a provider without it causing resource replacements use an alias to tell the engine that it's just a rename not a new provider (https://www.pulumi.com/docs/intro/concepts/resources/options/aliases/).

Now the oddity here with the topic being deleted is because by default pulumi tries to create the new version of a resource being replaced before deleting the old resource. As such the order of operations looks like:

 ++ aws:sns:Topic a-topic creating replacement [diff: ~provider]
 ++ aws:sns:Topic a-topic created replacement [diff: ~provider]
 +- aws:sns:Topic a-topic replacing [diff: ~provider]
 +- aws:sns:Topic a-topic replaced [diff: ~provider]
 -- aws:sns:Topic a-topic deleting original [diff: ~provider]
 -- aws:sns:Topic a-topic deleted original [diff: ~provider]

But sns Topics are just keyed by their name, not some other unique ID and so after the Create (which doesn't actually create anything because a-topic already exists) we then delete a-topic. The engine thinks this should of just deleted the original not the replacement that it thinks has just been created.

I think this is an aws provider bug that it allowed a Create operation for an SNS topic that already exists.

ssemyonov commented 2 years ago

thanks @Frassle I'll look into using aliases for the future refactoring to prevent this. It's also worth mentioning that SNS topic in this report is just a simplified example, and I suspect majority of AWS resources will have similar behavior (we saw this with SQS and SNS->SQS subscription as well), e.g. deleting original after replacement will delete the replacement resource since their ARN will match.

stack72 commented 2 years ago

I think this is an aws provider bug that it allowed a Create operation for an SNS topic that already exists.

@Frassle I think we need to talk about this - I don't believe this bug exists so I may need to get on a call with you and see what w can figure out here

stack72 commented 2 years ago

Hi @ssemyonov

So @Frassle and I talked about this and discovered a few different things here that we need to fix up - the major issue is that we need to determine if the provider name change is the only change and support that isn't a diff in the engine - that would stop the entire lifecycle of changes you got here

We will work on getting that taken care of and things will be a normal state for these types of interactions

Paul

1oglop1 commented 2 years ago

@ssemyonov This is a known issue. To replace provider without resource replacement you can run this: pulumi up --target 'urn:pulumi:stekz::iam::pulumi:providers:aws::MyNewProvider' - replace urn with your changed provider urn

Frassle commented 2 years ago

@ssemyonov This is a known issue. To replace provider without resource replacement you can run this: pulumi up --target 'urn:pulumi:stekz::iam::pulumi:providers:aws::MyNewProvider' - replace urn with your changed provider urn

Note this will simply delay the other resource replacements to the next time they are updated.

lionello commented 1 year ago

What seems to have worked for me is to add an aliases: ["urn:…:default"] to the new provider, do an up (nothing should change), then remove the aliases. From that point on, the resources are using the new provider, but did not get replaced.

VenelinMartinov commented 10 months ago

Interestingly the problem seems to only occur when the Name property of the Topic resource is explicitly specified.

If we leave that out and let the provider auto-generate one, the program works as expected, that is the final refresh yields no updates.

Yeah, this looks like an issue with the Topic resource - looks like name serves as an id.

import pulumi
import pulumi_aws as aws

topic1 = aws.sns.Topic("topic1", name="topic")
topic2 = aws.sns.Topic("topic2", name="topic")

Running with the above program we can also reproduce the issue:

  1. pulumi up
  2. comment out topic2
  3. pulumi up - this should delete topic2
  4. pulumi refresh
  5. Observe that topic1 was also deleted.
VenelinMartinov commented 10 months ago

Confirmed this is blocked by https://github.com/pulumi/pulumi/issues/918

VenelinMartinov commented 10 months ago

After discussing with @Frassle, he suggested a workaround to the issue. We could patch the create method of the resource to first check if the topic already exists and error if it does.

This should prevent us from creating two resources for the same sns topic and hopefully prevent such odd issues.

VenelinMartinov commented 9 months ago

We tried to fix this in https://github.com/pulumi/pulumi-aws/pull/3235 with follow-up work in https://github.com/pulumi/pulumi-aws/pull/3268 but it didn't work because DeleteBeforeReplace in the schema only works when the Diff method of the resource is called.

In this case, the provider is replaced so all of its child resources are replaced as well. To fix this issue, we'd need to fix https://github.com/pulumi/pulumi-terraform-bridge/issues/244

For now the only workaround is to specify DeleteBeforeReplace on the resource in the pulumi program.

VenelinMartinov commented 9 months ago

I reproduced the same issue for iam:PolicyAttachment. Here is the repro program:

import pulumi
import pulumi_aws as aws

# Create an IAM policy to attach
policy = aws.iam.Policy(
    "myPolicy",
    name="myPolicy123",
    description="A test policy",
    path="/",
    policy="""{
        "Version": "2012-10-17",
        "Statement": [{
            "Action": [
                "ec2:Describe*"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }]
    }""",
)

# Create an IAM role
role = aws.iam.Role(
    "myRole",
    name="myRole123",
    assume_role_policy="""{
        "Version": "2012-10-17",
        "Statement": [{
            "Action": "sts:AssumeRole",
            "Effect": "Allow",
            "Principal": {
                "Service": "ec2.amazonaws.com"
            }
        }]
    }""",
)

provider_name = pulumi.config.Config().get("provider_name", "prov")
prov = aws.Provider(provider_name)

# Attach the policy to the role
attachment = aws.iam.PolicyAttachment(
    "myPolicyAttachment",
    roles=[role.name],
    policy_arn=policy.arn,
    opts=pulumi.ResourceOptions(provider=prov),
)
  1. pulumi up
  2. pulumi config set provider_name "prov1"
  3. pulumi up
  4. pulumi refresh
  5. Notive that PolicyAttachment was silently deleted.

The issue is that because of https://github.com/pulumi/pulumi-terraform-bridge/issues/244 the provider gets replaced when renaming, which causes all of its child resources to be replaced and DeleteBeforeReplace in their schema does not actually work.

VenelinMartinov commented 9 months ago

I strongly suspect all resources marked DeleteBeforeReplace in the schema suffer from the same issue. Looks like we have that in:

AWS - 11 resources:

GCP - 1 resource

Github - 3 resources

Postgres - 1 resource:

Docker - 1 resource:

Some of these probably don't fail silently but some, like sns topic and iam policy attachment do. Note that there are likely others, like SNS topic, which have the same idempotent creation behaviour but we have not yet marked as DeleteBeforeReplace.