pulumi / pulumi-terraform-bridge

A library allowing providers built with the Terraform Plugin SDK to be bridged into Pulumi.
Apache License 2.0
184 stars 42 forks source link

[EPIC] Rollout PlanResourceChange #1785

Open t0yv0 opened 3 months ago

t0yv0 commented 3 months ago

🚧

PlanResourceChange flag enables a rewritten implementation of the lifecycle for SDKv2 based resources that more closely aligns with how TF CLI exercises the providers. Building out a plan to roll it out as the default behavior in the bridge. Currently it is rolled out selectively on a per-resource basis where it is shown to fix significant issues.

Technicalities

What PlanResourceChange flag does: it currently is a partial rewrite of the resource lifecycle methods for bridging the SDKv2-based resources. It ties into as much as possible into TF gRPC implementations to work "exactly-as" TF. Several known areas where this improves things:

Cross-testing is essential to pin down corner cases and get full confidence before broader rollout.

Rollout

We will start the rollout as follows:

Instances of TF behaviour regressing against non-PRC:

VenelinMartinov commented 3 months ago

https://github.com/pulumi/pulumi-terraform-bridge/issues/1822 is likely the GCP Meta handling issue - it does have a repro.

VenelinMartinov commented 2 months ago

Possible issue in GCP bucket: https://github.com/pulumi/pulumi-gcp/issues/1952 - I attempted to enroll it in PlanResourceChange to check if it fixes an issue. Did not fix the original issue but caused a lot of test failures.

At least one was legitimate - the resource started refreshing dirty when no labels are specified.

t0yv0 commented 1 month ago

Some updates. There's still some very basic issues around unknown handling (https://github.com/pulumi/pulumi-terraform-bridge/issues/1943) causing customer P1s in AWS (https://github.com/pulumi/pulumi-aws/commit/8c302b9355882df9d70bfff6eb2d5a09bb386862) for resources where this was speculatively rolled out to fix other P1s or diff issues like in the case of LaunchTemplate.

Given this situation, after https://github.com/pulumi/pulumi-terraform-bridge/issues/1943 it would be great to open a bridge PR that runs all the downstream tests with this feature turned on to see if we can spot any more basic simple failures and proactively fix them.

The issue in question could have been located by https://github.com/pulumi/pulumi-terraform-bridge/issues/1858 but I would bet also regular integration tests would flag it somewhere.

t0yv0 commented 1 month ago

ON a positive note we landed https://github.com/pulumi/pulumi-terraform-bridge/pull/1927 which together with this flag is shown to close 4 resource cycling issues in AWS, and likely actually fixes a lot more.

VenelinMartinov commented 1 month ago

We could add an env var which default enables this perhaps?

Then run downstream tests with the env var enabled. Iterate until all downstream tests succeed?

EDIT: This was done in https://github.com/pulumi/pulumi-terraform-bridge/commit/fb50112f8d7a17df03ad538c529aef446c5e62ed

PULUMI_ENABLE_PLAN_RESOURCE_CHANGE=1 enables PlanResourceChange for all resources in all bridged providers.

t0yv0 commented 1 month ago

Exactly. The issues from actual integration tests likely cover the most important basics.

VenelinMartinov commented 1 month ago

Should we finish https://github.com/pulumi/pulumi-terraform-bridge/issues/1858 and maybe https://github.com/pulumi/pulumi-terraform-bridge/issues/1864 before rolling this out?

t0yv0 commented 1 month ago

As discussed today:

For failing downstream checks do not over-index on upgrade tests flagging Updates, this is not unexpected and we can weaken the tests to just weed out replacements.

VenelinMartinov commented 3 weeks ago

Discovered a new issue with GCP and PRC: https://github.com/pulumi/pulumi-gcp/issues/2078

VenelinMartinov commented 3 weeks ago

https://github.com/pulumi/pulumi-terraform-bridge/issues/2039 reproes on non-PRC, removing it from the list.

VenelinMartinov commented 3 weeks ago

https://github.com/pulumi/pulumi-gcp/issues/2078 is actually an upstream bug/behaviour uncovered by PRC

VenelinMartinov commented 3 weeks ago

1967 and 2047 are addressed by https://github.com/pulumi/pulumi-terraform-bridge/pull/2065 so we only need to run down all the failures in https://github.com/pulumi/pulumi-terraform-bridge/issues/1962

VenelinMartinov commented 1 week ago

Uncovered a GCP issue with effectiveLabels and terraformLabels: https://github.com/pulumi/pulumi-terraform-bridge/issues/1962#issuecomment-2173706089 after the new refresh CLI.

This is the last bit blocking the rollout.

VenelinMartinov commented 1 week ago

We should be good to go here - will enable PRC in pulumi-random with the new bridge release.

VenelinMartinov commented 1 week ago

pulumi-random is entirely PF based, so we can't start the rollout there.

Opened https://github.com/pulumi/pulumi-cloudflare/pull/828 for pulumi-cloudflare

pulumi-cloudflare was released with PRC on by default in 5.32 on 25 Jun. We will continue the rollout on 9 July with the rest of tier 2.