Open oboukili opened 11 months ago
Hi @oboukili thank you for reporting this and sorry that Pulumi is not doing what you need here.
It sounds like you are running Pulumi against the same stack in two different contexts with different assumed IAM roles, and at some point Pulumi ignores the IAM role you have provided and instead picks up the IAM role from the statefile for the stack, which breaks your intent. There are some scenarios in Pulumi that benefit from saving provider config in the state, such as managing deletion of existing resources by the version of the provider that provisioned them, and there may be more, but sounds like this is surprising in the context of assumed IAM roles.
It would help my team a lot if we had a solid repro here to narrow your use case to a concrete sequence of steps. We can then try to find solutions - whether there is something that can be fixed locally in the provider or taken to a broader conversation.
Could you help us out with - (1) a minimal Pulumi program that uses the provider with the assume role, in particular whether you use explicit providers or pulumi config
. (2) exact sequence of pulumi
invocations leading up to the issue; (3) expected/actual results.
I would also find it very helpful if you could elaborate "Disabling refreshes bypasses the issue, but does not solve it.", are you asking pulumi to do refreshes explicitly, and how do you disable it?
Thanks for your patience?
Hi @t0yv0, thanks for your reply.
There are some scenarios in Pulumi that benefit from saving provider config in the state, such as managing deletion of existing resources by the version of the provider that provisioned them, and there may be more, but sounds like this is surprising in the context of assumed IAM roles.
Thanks for clarifying, persisting the provider version would indeed be a valid use case for resource deletion, however I would rather treat that data as informational, to be used only should an issue arise (similar to what Kubernetes "last-applied" annotation is), but I digress.
Could you help us out with - (1) a minimal Pulumi program that uses the provider with the assume role, in particular whether you use explicit providers or pulumi config. (2) exact sequence of pulumi invocations leading up to the issue; (3) expected/actual results.
(1) I'm afraid I can't share the program I am using, but here's a minimal example (apologies for the automatic tab indent). Note that I explicitly disable all default providers within Pulumi.yaml through pulumi:disable-default-providers: ["*"]
.
package main
import (
servicecatalogtypes "github.com/aws/aws-sdk-go-v2/service/servicecatalog/types"
"github.com/pulumi/pulumi-aws/sdk/v6/go/aws"
"github.com/pulumi/pulumi-aws/sdk/v6/go/aws/servicecatalog"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi"
"github.com/pulumi/pulumi/sdk/v3/go/pulumi/config"
)
const configAssumeRole = "assumeRole"
func main() {
pulumi.Run(func(ctx *pulumi.Context) error {
cfg := config.New(ctx, "myconfig")
cfg.Require(configAssumeRole)
p, err := aws.NewProvider(ctx, "explicitProvider", &aws.ProviderArgs{
AssumeRole: &aws.ProviderAssumeRoleArgs{
Duration: pulumi.StringPtr("900s"),
RoleArn: pulumi.StringPtr(cfg.Get(configAssumeRole)),
SessionName: pulumi.StringPtr("minimal-test"),
},
Region: pulumi.StringPtr("eu-west-3"),
})
if err != nil {
return err
}
_, err = servicecatalog.NewProduct(ctx, "test",
&servicecatalog.ProductArgs{
Distributor: pulumi.StringPtr("test"),
Name: pulumi.StringPtr("test"),
Owner: pulumi.String("test"),
SupportUrl: pulumi.StringPtr("test"),
Type: pulumi.String(servicecatalogtypes.ProductTypeCloudFormationTemplate),
ProvisioningArtifactParameters: &servicecatalog.ProductProvisioningArtifactParametersArgs{
Name: pulumi.StringPtr("v0"),
TemplateUrl: pulumi.StringPtr("https://s3-us-gov-west-1.amazonaws.com/cloudformation-templates-us-gov-west-1/IAM_Users_Groups_and_Policies.template"),
Type: pulumi.StringPtr(string(servicecatalogtypes.ProductTypeCloudFormationTemplate))},
},
pulumi.Provider(p),
)
return err
})
}
Pulumi.stackname.yaml
config:
myconfig:assumeRole: "arn:aws:iam::1234567890:role/pr"
(2) GIven 2 to-be-assumed IAM roles arn:aws:iam::1234567890:role/pr
and arn:aws:iam::1234567890:role/release
, and assuming the following:
pulumi up
, with the following configuration set myconfig:assumeRole: "arn:aws:iam::1234567890:role/release"
, the provider state is therefore persisted in the stack state.myconfig:assumeRole: "arn:aws:iam::1234567890:role/pr"
arn:aws:iam::1234567890:role/pr
pulumi refresh
(3)
Expected result
the explicit pulumi-aws provider assumes the role set in the configuration: arn:aws:iam::1234567890:role/pr
, and proceeds successfully.
Actual result
the explicit pulumi-aws provider tries assuming the role set in the state arn:aws:iam::1234567890:role/release
, and fails as the current context credentials don't allow it to.
I would also find it very helpful if you could elaborate "Disabling refreshes bypasses the issue, but does not solve it.", are you asking pulumi to do refreshes explicitly, and how do you disable it?
I was a bit too concise here, I meant not systematically refreshing upon every pulumi action (update or preview), through the following flag in Pulumi.yaml
options:
refresh: always
Digging further, I now realize there's already been quite a long design debate over the importance of the state (it would seem Pulumi differs heavily from, say, Terraform here as the state is not just considered as a managed resource tracking data and resource cache) and thus the non-anecdotal impact of refreshes https://github.com/pulumi/pulumi/issues/2247, which is unrelated to the current issue.
Hi @oboukili. I think this is effectively a special case of https://github.com/pulumi/pulumi/issues/13860. I'm not sure what a workaround would be for this scenario, beyond state surgery to change the IAM role.
I'm assuming noone has found a workaround for this yet? We've got a central account we run pulumi in that assume roles in other child accounts via explicitly configured providers.
I was planning on having separate roles for preview/up phases, but that plan is currently blocked due to it always trying to use the up
role from the state.
this is biting me right now as I was told we need to have separate roles for plan vs apply. Basically will have to run refresh only for the apply step to ensure state is up to snuff:
func handleDeployment(ctx context.Context, stack auto.Stack, action string) error {
switch action {
case "plan":
_, err := stack.Preview(ctx, optpreview.ProgressStreams(os.Stdout), optpreview.Diff(), colorAlwaysPreview{}, optpreview.Diff())
if err != nil {
return err
}
case "apply":
// Refresh only on apply due to https://github.com/pulumi/pulumi-aws/issues/3149
_, err := stack.Up(ctx, optup.ProgressStreams(os.Stdout), colorAlwaysUp{}, optup.ErrorProgressStreams(os.Stderr), optup.Diff(), optup.Refresh())
if err != nil {
return err
}
case "destroy":
_, err := stack.Destroy(ctx, optdestroy.ProgressStreams(os.Stdout), colorAlwaysDestroy{})
if err != nil {
log.Fatal(err)
}
default:
return fmt.Errorf("unknown action")
}
return nil
}
Using the plan i'm able to swap out the role but obv doesn't really do anything because a refresh isn't happening. Just hoping that nothing has changed in the environment compared to state.
FYI, I was able to come up with a super janky workaround for this using transitive session tags.
If you set a transitive session tag (say, pulumi-up=<true>
) outside of Pulumi it propagates and isn't stored in the state.
With that capability, we created a single IAM role for both preview
and up
and gated all mutating permissions behind a condition to check that the session tag was set.
Very smart @fitz-vivodyne thanks ! ❤️
related to https://github.com/pulumi/pulumi/issues/4981
We manage preview and up deployments using OIDC in Github actions using profile files for each action and one for deployments deployments from engineering workstations (AWS IAM Identity Centre) so I think this could be adapted to assumed roles as well.
We use AWS_CONFIG_FILE
to point the SDK to the correct file (in the context of the run) containing profiles for each of our accounts configured for a Pulumi project.
For previews in Github Actions on our PRs we point to a ./.aws/github-preview
file in the project which has read only preview roles that are assumed via web identity (OIDC). These roles can be run without a Github Repo Environment.
For up operations in Github Actions we point to a ./.aws/github-deploy
file in the project which has the real roles, also assumed via OIDC. These roles require a Github Repo Environment and therefore can be subject to approval.
Engineer workstations use Taskfile.dev where we use a .env
file to set AWS_CONFIG_FILE
to ./.aws/profiles
which are AWS IAM Identity Centre (SSO) permission sets
Our stacks specify the account/profile name and region which allows us to deploy anywhere the role is valid for the principal running the operation. When a provider is created the profile name and region are stored. By always setting AWS_CONFIG_FILE
at the runner (Github Actions shared workflows, Taskfile.dev for engineer workstations), we can do preview, up, refresh and delete operations with no issues on any valid runner. Hope that helps.
This explicitly breaks refreshes where the cached assume role attribute value is being used instead of the actual currently configured value.
A typical broken example, in my case, would be a 2 steps preview (PR) / release (push) CI pipeline where the PR workflows would assume a read only IAM role, while the Release workflows would assume a read-write IAM role.
I'm new to Pulumi, but more generally, I can't see why would any of the provider attributes' values actually be persisted in the state, nor favored over any other current values during refreshes, so it may be a (not easily modifiable) Pulumi-wide design issue rather than only scoped to this provider.
Disabling refreshes bypasses the issue, but does not solve it.
Thanks for any insights you could provide.