pulumi / pulumi-aws

An Amazon Web Services (AWS) Pulumi resource package, providing multi-language access to AWS
Apache License 2.0
459 stars 155 forks source link

Don't persist the provider's assumeRole attribute within the state #3149

Open oboukili opened 10 months ago

oboukili commented 10 months ago

This explicitly breaks refreshes where the cached assume role attribute value is being used instead of the actual currently configured value.

A typical broken example, in my case, would be a 2 steps preview (PR) / release (push) CI pipeline where the PR workflows would assume a read only IAM role, while the Release workflows would assume a read-write IAM role.

I'm new to Pulumi, but more generally, I can't see why would any of the provider attributes' values actually be persisted in the state, nor favored over any other current values during refreshes, so it may be a (not easily modifiable) Pulumi-wide design issue rather than only scoped to this provider.

Disabling refreshes bypasses the issue, but does not solve it.

Thanks for any insights you could provide.

t0yv0 commented 10 months ago

Hi @oboukili thank you for reporting this and sorry that Pulumi is not doing what you need here.

It sounds like you are running Pulumi against the same stack in two different contexts with different assumed IAM roles, and at some point Pulumi ignores the IAM role you have provided and instead picks up the IAM role from the statefile for the stack, which breaks your intent. There are some scenarios in Pulumi that benefit from saving provider config in the state, such as managing deletion of existing resources by the version of the provider that provisioned them, and there may be more, but sounds like this is surprising in the context of assumed IAM roles.

It would help my team a lot if we had a solid repro here to narrow your use case to a concrete sequence of steps. We can then try to find solutions - whether there is something that can be fixed locally in the provider or taken to a broader conversation.

Could you help us out with - (1) a minimal Pulumi program that uses the provider with the assume role, in particular whether you use explicit providers or pulumi config. (2) exact sequence of pulumi invocations leading up to the issue; (3) expected/actual results.

I would also find it very helpful if you could elaborate "Disabling refreshes bypasses the issue, but does not solve it.", are you asking pulumi to do refreshes explicitly, and how do you disable it?

Thanks for your patience?

oboukili commented 10 months ago

Hi @t0yv0, thanks for your reply.

There are some scenarios in Pulumi that benefit from saving provider config in the state, such as managing deletion of existing resources by the version of the provider that provisioned them, and there may be more, but sounds like this is surprising in the context of assumed IAM roles.

Thanks for clarifying, persisting the provider version would indeed be a valid use case for resource deletion, however I would rather treat that data as informational, to be used only should an issue arise (similar to what Kubernetes "last-applied" annotation is), but I digress.

Could you help us out with - (1) a minimal Pulumi program that uses the provider with the assume role, in particular whether you use explicit providers or pulumi config. (2) exact sequence of pulumi invocations leading up to the issue; (3) expected/actual results.

(1) I'm afraid I can't share the program I am using, but here's a minimal example (apologies for the automatic tab indent). Note that I explicitly disable all default providers within Pulumi.yaml through pulumi:disable-default-providers: ["*"].

package main

import (
    servicecatalogtypes "github.com/aws/aws-sdk-go-v2/service/servicecatalog/types"
    "github.com/pulumi/pulumi-aws/sdk/v6/go/aws"
    "github.com/pulumi/pulumi-aws/sdk/v6/go/aws/servicecatalog"
    "github.com/pulumi/pulumi/sdk/v3/go/pulumi"
    "github.com/pulumi/pulumi/sdk/v3/go/pulumi/config"
)

const configAssumeRole = "assumeRole"

func main() {
    pulumi.Run(func(ctx *pulumi.Context) error {
        cfg := config.New(ctx, "myconfig")
        cfg.Require(configAssumeRole)
        p, err := aws.NewProvider(ctx, "explicitProvider", &aws.ProviderArgs{
            AssumeRole: &aws.ProviderAssumeRoleArgs{
                Duration:    pulumi.StringPtr("900s"),
                RoleArn:     pulumi.StringPtr(cfg.Get(configAssumeRole)),
                SessionName: pulumi.StringPtr("minimal-test"),
            },
            Region: pulumi.StringPtr("eu-west-3"),
        })
        if err != nil {
            return err
        }
        _, err = servicecatalog.NewProduct(ctx, "test",
            &servicecatalog.ProductArgs{
                Distributor: pulumi.StringPtr("test"),
                Name:        pulumi.StringPtr("test"),
                Owner:       pulumi.String("test"),
                SupportUrl:  pulumi.StringPtr("test"),
                Type:        pulumi.String(servicecatalogtypes.ProductTypeCloudFormationTemplate),
                ProvisioningArtifactParameters: &servicecatalog.ProductProvisioningArtifactParametersArgs{
                    Name:        pulumi.StringPtr("v0"),
                    TemplateUrl: pulumi.StringPtr("https://s3-us-gov-west-1.amazonaws.com/cloudformation-templates-us-gov-west-1/IAM_Users_Groups_and_Policies.template"),
                    Type:        pulumi.StringPtr(string(servicecatalogtypes.ProductTypeCloudFormationTemplate))},
            },
            pulumi.Provider(p),
        )
        return err
    })
}

Pulumi.stackname.yaml

config:
  myconfig:assumeRole: "arn:aws:iam::1234567890:role/pr"

(2) GIven 2 to-be-assumed IAM roles arn:aws:iam::1234567890:role/pr and arn:aws:iam::1234567890:role/release, and assuming the following:

(3) Expected result the explicit pulumi-aws provider assumes the role set in the configuration: arn:aws:iam::1234567890:role/pr, and proceeds successfully.

Actual result the explicit pulumi-aws provider tries assuming the role set in the state arn:aws:iam::1234567890:role/release, and fails as the current context credentials don't allow it to.

I would also find it very helpful if you could elaborate "Disabling refreshes bypasses the issue, but does not solve it.", are you asking pulumi to do refreshes explicitly, and how do you disable it?

I was a bit too concise here, I meant not systematically refreshing upon every pulumi action (update or preview), through the following flag in Pulumi.yaml

options:
  refresh: always

Digging further, I now realize there's already been quite a long design debate over the importance of the state (it would seem Pulumi differs heavily from, say, Terraform here as the state is not just considered as a managed resource tracking data and resource cache) and thus the non-anecdotal impact of refreshes https://github.com/pulumi/pulumi/issues/2247, which is unrelated to the current issue.

iwahbe commented 9 months ago

Hi @oboukili. I think this is effectively a special case of https://github.com/pulumi/pulumi/issues/13860. I'm not sure what a workaround would be for this scenario, beyond state surgery to change the IAM role.

fitz-vivodyne commented 6 months ago

I'm assuming noone has found a workaround for this yet? We've got a central account we run pulumi in that assume roles in other child accounts via explicitly configured providers.

I was planning on having separate roles for preview/up phases, but that plan is currently blocked due to it always trying to use the up role from the state.

ryanpodonnell1 commented 5 months ago

this is biting me right now as I was told we need to have separate roles for plan vs apply. Basically will have to run refresh only for the apply step to ensure state is up to snuff:

func handleDeployment(ctx context.Context, stack auto.Stack, action string) error {
    switch action {
    case "plan":
        _, err := stack.Preview(ctx, optpreview.ProgressStreams(os.Stdout), optpreview.Diff(), colorAlwaysPreview{}, optpreview.Diff())
        if err != nil {
            return err
        }

    case "apply":
        // Refresh only on apply due to https://github.com/pulumi/pulumi-aws/issues/3149
        _, err := stack.Up(ctx, optup.ProgressStreams(os.Stdout), colorAlwaysUp{}, optup.ErrorProgressStreams(os.Stderr), optup.Diff(), optup.Refresh())
        if err != nil {
            return err
        }

    case "destroy":
        _, err := stack.Destroy(ctx, optdestroy.ProgressStreams(os.Stdout), colorAlwaysDestroy{})
        if err != nil {
            log.Fatal(err)
        }

    default:
        return fmt.Errorf("unknown action")
    }

    return nil
}

Using the plan i'm able to swap out the role but obv doesn't really do anything because a refresh isn't happening. Just hoping that nothing has changed in the environment compared to state.

fitz-vivodyne commented 5 months ago

FYI, I was able to come up with a super janky workaround for this using transitive session tags.

If you set a transitive session tag (say, pulumi-up=<true>) outside of Pulumi it propagates and isn't stored in the state.

With that capability, we created a single IAM role for both preview and up and gated all mutating permissions behind a condition to check that the session tag was set.

oboukili commented 5 months ago

Very smart @fitz-vivodyne thanks ! ❤️

corymhall commented 4 months ago

related to https://github.com/pulumi/pulumi/issues/4981

gunzy83 commented 2 months ago

We manage preview and up deployments using OIDC in Github actions using profile files for each action and one for deployments deployments from engineering workstations (AWS IAM Identity Centre) so I think this could be adapted to assumed roles as well.

We use AWS_CONFIG_FILE to point the SDK to the correct file (in the context of the run) containing profiles for each of our accounts configured for a Pulumi project.

For previews in Github Actions on our PRs we point to a ./.aws/github-preview file in the project which has read only preview roles that are assumed via web identity (OIDC). These roles can be run without a Github Repo Environment.

For up operations in Github Actions we point to a ./.aws/github-deploy file in the project which has the real roles, also assumed via OIDC. These roles require a Github Repo Environment and therefore can be subject to approval.

Engineer workstations use Taskfile.dev where we use a .env file to set AWS_CONFIG_FILE to ./.aws/profiles which are AWS IAM Identity Centre (SSO) permission sets

Our stacks specify the account/profile name and region which allows us to deploy anywhere the role is valid for the principal running the operation. When a provider is created the profile name and region are stored. By always setting AWS_CONFIG_FILE at the runner (Github Actions shared workflows, Taskfile.dev for engineer workstations), we can do preview, up, refresh and delete operations with no issues on any valid runner. Hope that helps.