pulumi / pulumi-azure-native

Azure Native Provider
Apache License 2.0
123 stars 32 forks source link

Release 2.47.0 throws fatal errors #3387

Open j-fulbright opened 6 days ago

j-fulbright commented 6 days ago

What happened?

Upgrading my dependencies to latest, pulled in Azure Native v2.47.0 and builds were all failing. I thought this was due to my changes but removing my changes still caused failures.

Downgrading and pinning to 2.46.0 fixed all issues

image

Example

resourceGroupName: `rg-${suffix}`,
    location: 'centralus',
});

// TODO figure out how to handle password if we intend to do those
const sqlServer = new azureNative.sql.Server(
    `sql-${pulumi.getStack()}`,
    {
        serverName: `${suffix}`,
        resourceGroupName: resourceGroup.name,
        location: resourceGroup.location,
        administratorLogin: ######
        administratorLoginPassword: #######
        publicNetworkAccess: ServerPublicNetworkAccessFlag.Enabled,
        restrictOutboundNetworkAccess: ServerNetworkAccessFlag.Disabled,
        // minimalTlsVersion: '1.2',

        version: '12.0',
    },
    { parent: resourceGroup },
);

Output of pulumi about

This was running via github action but i can run again to verify versions if needed

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

j-fulbright commented 6 days ago

This may not be related to 2.47.0 unfortunately.. seeing it now in 2.46.0, trying to confirm what may be the cause., will re-open if i determine otherwise

yuft commented 4 days ago

We experience the same issue recently. [v2.45.0] seems to have less frequent errors than newer versions.

image
lukehoban commented 4 days ago

Re-opening given multiple reports. Not clear whether this is specific to Azure Native - the error looks like a Node runtime issue that is more likely tied to the Pulumi Node.js runtime?

danielrbradley commented 4 days ago

@j-fulbright a few additional questions to help narrow this down:

  1. What operation does this fail on - e.g. preview or up or both?
  2. Is this only with the SQL Server resource or have you seen any other programs failing with this?
  3. What version of the @pulumi/pulumi npm package are you pinned to?

@yuft what version of @pulumi/pulumi and nodejs are you using? Does it appear to be linked to a specific resource?

yuft commented 4 days ago

We see the exception in repos that references @pulumi/pulumi package and repos do not references the package. In the repo that references the package we have

"@pulumi/pulumi": "^3.121.0",

Today, I tried to pin Pulumi CLI version in CI to 3.119.0 and still experienced the error.

I also monitored CPU/RAM usage in our container runners, I did not see crazy usage. The error happens both when running pulumi up and pulumi preview.

We have pinned Pulumi native provider version to v2.45.0 and the chance of failures is reduced.

yuft commented 4 days ago

One peer told me 2.44.1 had been reliable in the last couple days with zero errors like this one.

yuft commented 3 days ago

With today's observation, the cause might be that Pulumi program uses more CPU, depending on change itself, which leads to nodejs runtime errors. The Pod that runs Pulumi does not get killed, though.

We run Pulumi program in self-hosted CircleCI container runners on AKS. This might be a K8s scheduling problem we need to resolve. I will keep monitoring next week.

Thanks.

j-fulbright commented 2 days ago

Just saw this has got re-opened.. I ended up having to downgrade to Node 18 and all problems went away.

I've not had any issues in the past with Node 20 (which is what I was using for local and github deploying/work), but by setting back to 18, it did all work correctly once again

What operation does this fail on - e.g. preview or up or both?

This was failing on up in my case, I do not know if preview was also failing, id suspect so.

Is this only with the SQL Server resource or have you seen any other programs failing with this?

I did not try any other resources, as my particular scripts have dependency on that component. I can put together a script that isnt related and see what happens

What version of the @pulumi/pulumi npm package are you pinned to?

I did not have it pinned but hadn't updated it at the time, but I believe the version that was running prior was 3.117.0

danielrbradley commented 3 hours ago

Would anyone be able to provide a runnable program which exhibits this problem on every run (or even just occasionally with enough retries). This would allow us to diagnose the root cause. Thanks!