Pulumi refresh uses cached auth info

jtmarmon commented 9 months ago

What happened?

When trying to run pulumi refresh on a stack using an ESC env, I was getting this error:

    error: Preview failed: 1 error occurred:
        * Retrieving AWS account details: validating provider credentials: retrieving caller identity from STS: operation error STS: GetCallerIdentity, https response error StatusCode: 403, RequestID: 60765ba4-5206-41f2-b209-6b3d381c0f5d, api error ExpiredToken: The security token included in the request is expired
        *

This was confusing because esc run <env> -- aws sts get-caller-identity worked fine. Eventually I tried running pulumi up and the error went away, leading me to believe it busted some cache of an auth token.

A similar thing happened to me previously, not with expired creds but with changing my AWS configuration. I switched from having a hardcoded aws:profile in the config to using an ESC env, and the refresh wouldn't succeed until I ran pulumi up.

Example

I believe this should repro it but haven't tried myself

create a pulumi stack using an OIDC-based ESC environment with the min duration (15m)
run pulumi up
wait 16m
try running pulumi refresh

Output of `pulumi about`

CLI
Version      3.95.0
Go Version   go1.21.4
Go Compiler  gc

Plugins
NAME    VERSION
aws     6.13.2
nodejs  unknown

Host
OS       darwin
Version  14.1
Arch     arm64

This project is written in nodejs: executable='/Users/redacted/.nvm/versions/node/v17.0.0/bin/node' version='v17.0.0'

Backend
...
Token type     personal

Dependencies:
NAME            VERSION
@pulumi/aws     6.13.2
@pulumi/pulumi  3.94.2
@types/node     16.18.62
typescript      4.9.5

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

pgavlin commented 9 months ago

A similar thing happened to me previously, not with expired creds but with changing my AWS configuration. I switched from having a hardcoded aws:profile in the config to using an ESC env, and the refresh wouldn't succeed until I ran pulumi up.

Yeah it sounds like something similar is happening here. Is your program explicitly passing credentials to the AWS provider?

jtmarmon commented 9 months ago

In the expired STS token case, no. In the aws:profile case, also no, but IIRC in that case the error was with a manually constructed kubernetes provider - I wasn't passing the AWS creds in manually, but I did see the AWS_PROFILE getting removed from some blob in the k8s provider when I ran pulumi up. I think they could be separate issues

pgavlin commented 9 months ago

In the expired STS token case, no

Interesting. Typically what we've seen in scenarios like this is encrypted credentials ending up in the statefile that then get reused by pulumi refresh. The pulumi up unblocks this b/c it ends up fetching new creds, which then get stored in the statefile and picked up by the next pulumi refresh. Can you still repro this? If so, would you be able to share a statefile?

komalali commented 4 months ago

A little more context here, this issue is caused by the fact that pulumi refresh and pulumi destroy don't run the pulumi program, they use credentials that are stored in the state file if they exist. The current workaround is to not store your credentials in state. Practically, this means using authentication via environment variables instead of configuration values where possible.

There's an open issue to document this for GCP - https://github.com/pulumi/pulumi-gcp/issues/1815

jtmarmon commented 4 months ago

@komalali in our case, this is an AWS ECR that's having this issue, and the env doesn't have exported env vars:

values:
  login:
    fn::open::aws-login:
      oidc:
        roleArn: <role>
        sessionName: pulumi-environments-session
        duration: "1h"
  region: us-east-1
  pulumiConfig:
    aws:region: ${region}
    aws:accessKey: ${login.accessKeyId}
    aws:secretKey: ${login.secretAccessKey}
    aws:token: ${login.sessionToken}

Does your comment suggest there's a bug in the pulumi-aws repo as well (in that it's caching the STS token in state?)

iwahbe commented 4 months ago

Entries in the Pulumi config get stored in state.

   aws:accessKey: ${login.accessKeyId}
   aws:secretKey: ${login.secretAccessKey}
   aws:token: ${login.sessionToken}

These should be environmental variables.

masterfuzz commented 3 months ago

Environment variables are not a sufficient solution either, at least not for AWS. It will work for a single instance of a provider, but if you need to use more than one account, even if you provide credentials explicitly to the second provider, it will still get confused (trying to assume role, etc). Perhaps this is a bug in the AWS provider, however.

pgavlin commented 3 months ago

This problem has deep roots in the programming model, and I want to offer some context for the behavior.

There are three kinds of Pulumi operations:

pulumi up
pulumi refresh
pulumi destroy

The first operation is distinctly different from the latter two in that it involves running the Pulumi program associated with the stack's project. As it runs, the Pulumi program defines the desired state for resources--including provider resources--using values computed by the program in coordination with the Pulumi engine. When the program creates a provider resource, the inputs for the provider are either sourced from the program itself (i.e. from values provided by the program) or are read out-of-band by the provider plugin. The exact set of configuration that may be sourced from the environment is particular to each provider--for example, the Kubernetes provider uses the ambient kubeconfig by default, the AWS provider reads various AWS-specific environment variables, etc. Any explicitly-provided inputs are written into the stack's statefile.

For example, consider the following program:

import * as aws from "@pulumi/aws";

const usEast1 = new aws.Provider("us-east-1", { region: "us-east-1" });
const defaultRegion = new aws.Provider("default-region");

The usEast1 provider's region is explicitly specified by the program, but the defaultRegion provider's region will be read from the environment (e.g. from the AWS_REGION environment variable). In the resulting statefile, the usEast1 provider's state will include the region input, but the defaultRegion provider's state will not.

Because pulumi refresh and pulumi destroy do not run the Pulumi program associated with the stack's project, they are unable to recompute configuration values that were explicitly provided by the program, and must use the values stored in the statefile. Unfortunately, this may include credential information, which is what causes the errors described here. The current workaround--which is certainly not sufficient for explicitly-instantiated providers--is to use environment variables to provide credentials out-of-band.

The clearest/most complete solution here is to run the Pulumi program associated with a stack's project as part of pulumi refresh and pulumi destroy. Unfortunately, this is a major behavioral change, and the exact semantics of the run are not clear.

pgavlin commented 3 months ago

Closing this as a duplicate of https://github.com/pulumi/pulumi/issues/4981. We'll use that issue to track further progress on workarounds and solutions for the core problem.

mikhailshilkov commented 3 months ago

For anyone looking, here is an example of an environment based on ENV variables:

values:
  login:
    fn::open::aws-login:
      oidc:
        duration: 1h
        roleArn: <role>
        sessionName: pulumi-environments-session
  region: us-west-2
  environmentVariables:
    AWS_ACCESS_KEY_ID: ${login.accessKeyId}
    AWS_SECRET_ACCESS_KEY: ${login.secretAccessKey}
    AWS_SESSION_TOKEN: ${login.sessionToken}
  pulumiConfig:
    aws:region: ${region}

pulumi / esc