Refresh hangs forever - Provider uses old settings from last deployment, rather than current definition

Navigatron commented 1 year ago

What happened?

Our CI/CD pipeline was giving us issues ( diskControllerType and an out-of-bounds that I have yet to file a bug for), so I decided to try and refresh->preview->up from my local machine.

Our environment variables tell our Provider constructor to use MSI in CI/CD, and to not use MSI locally. However, the refresh operation appears to use the values from the previous deployment. Attempting to authenticate with MSI locally then causes refresh to hang.

Relevant entries from /tmp/.../eventlog.txt show that the refresh is attempting to use MSI. Event 34 was the last event. The process hung for several minutes before I killed it with ctrl+c. This also requires manually deleting a lockfile from our backing storage.

{
  "sequence": 29,
  "timestamp": 1693514635,
  "diagnosticEvent": {
    "prefix": "debug: ",
    "message": "2023/08/31 16:43:55 Using Managed Service Identity for Authentication\n",
    "color": "never",
    "severity": "debug"
  }
}
...
{
  "sequence": 34,
  "timestamp": 1693514635,
  "diagnosticEvent": {
    "prefix": "debug: ",
    "message": "getADALToken with MSI msiEndpoint \"http://169.254.169.254/metadata/identity/oauth2/token\", ClientID \"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX\" for msiEndpoint \"https://management.azure.com/\"\n",
    "color": "never",
    "severity": "debug"
  }
}

Expected Behavior

I expected Pulumi to use the azure cli to authenticate.

Steps to reproduce

Define a function to return an Azure Provider using the code below:

import { Provider } from '@pulumi/azure-native/provider.js';

/**
 * Create an authenticated Azure connection
 * @param subscriptionId The subscription on which to act
 * @returns An authenticated Azure connection that can build resources
 */
export function provider(subscriptionId: string) {
    const useMsi = process.env['ARM_USE_MSI'];
    if (useMsi === 'true') {

        console.log('Using MSI authentication');

        return new Provider(`provider-demo`, {
            subscriptionId,
            clientId: process.env['ARM_CLIENT_ID']!,
            tenantId: process.env['ARM_TENANT_ID']!,
            useMsi: true,
        });
    }

    console.log('Using Azure CLI authentication.');

    return new Provider(`provider-demo`, {
        subscriptionId,
        tenantId: '// hardcoded tenantid here //',
        useMsi: false,
    });
}

Deploy a stack from an Azure VM, where the MSI endpoint exists, by setting the ARM_USE_MSI, ARM_CLIENT_ID, and ARM_TENANT_ID environment variables
Attempt to refresh the stack from a local VM

Output of `pulumi about`

We are using the pulumi automation API. pulumi about does not produce meaningful output.

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

Navigatron commented 1 year ago

I did not consider this at first, but it runs the other way too - the two environments I ran locally now fail to run in CI/CD, as the refresh operation attempts to use az-cli local auth, instead of obeying the MSI environment variables.

iwahbe commented 1 year ago

Hi @Navigatron. I'm sorry you're having issues here. Refresh should not hang in any circumstance. Until I figure out otherwise, I'm assuming that the hang is an error in this provider.

pulumi refresh is designed to update each resource you have in state to it's real value in the cloud. It runs these updates against the provider that the resource was created with. pulumi refresh does not read your program. Since your stacks self-configure to use different auth depending on their environment, you will need to run up->refresh. pulumi up will bake your new configuration into the state file. Then pulumi refresh will use the new state to run the refresh.

Any change to Pulumi's refresh semantics would need to happen in https://github.com/pulumi/pulumi.

Navigatron commented 1 year ago

@iwahbe , Thanks for the context! It sounds like there are two issues here:

The refresh operation hangs when attempting to use MSI authentication that is not available
- I'll leave this issue open to track this
- I have confirmed that refresh does not hang when attempting to use Azure CLI authentication; I get plenty of errors in this case
The refresh operation has a hard dependency on the authentication method of the previous up operation
- This is troubling. Refresh is required to get accurate Preview results, so we can detect changes without blindly overwriting them with up. If it has been a while since the last deployment, we must refresh from wherever the last deployment was from. In a sense, deploying locally locks us out of ci/cd, and deploying in ci/cd locks us out of deploying locally.
- It sounds like this was a design decision / can't be fixed. I'll work around this for now / do not need this issue to track it.

pulumi / pulumi-azure-native