pulumi / pulumi-terraform-bridge

A library allowing Terraform providers to be bridged into Pulumi.
Apache License 2.0
192 stars 45 forks source link

makeDetailedDiff distorts the plan incorrectly deciding whether there are changes or not #1501

Open t0yv0 opened 10 months ago

t0yv0 commented 10 months ago

What happened?

It appears that there are significant corner cases where makeDetailedDiff incorrectly decides pulumirpc.DiffResponse_DiffChanges, where incorrectly means that it does not agree with the underlying upstream provider, on whether there are changes (DiffResponse_DIFF_NONE) or there are no changes (DiffResponse_DIFF_SOME).

Example

Output of pulumi about

N/A

Additional context

We have been trying to reconstruct why this is happening and the current best guess is this block of code:

    // Check both the old state and the new config for diffs and report them as necessary.
    //
    // There is a minor complication here: Terraform has no concept of an "add" diff. Instead, adds are recorded as
    // updates with an old property value of the empty string. In order to detect adds--and to ensure that all diffs
    // in the InstanceDiff are reflected in the resulting Pulumi property diff--we first call this function with
    // each property in a resource's state, then with each property in its config. Any diffs that only appear in the
    // config are treated as adds; diffs that appear in both the state and config are treated as updates.

    forceDiff := new(bool)
    diff := map[string]*pulumirpc.PropertyDiff{}
    for k, v := range olds {
        en, etf, eps := getInfoFromPulumiName(k, tfs, ps, false)
        makePropertyDiff(ctx, en, string(k), v, tfDiff, diff, forceDiff, etf, eps, false, useRawNames(etf))
    }
    for k, v := range news {
        en, etf, eps := getInfoFromPulumiName(k, tfs, ps, false)
        makePropertyDiff(ctx, en, string(k), v, tfDiff, diff, forceDiff, etf, eps, false, useRawNames(etf))
    }
    for k, v := range olds {
        en, etf, eps := getInfoFromPulumiName(k, tfs, ps, false)
        makePropertyDiff(ctx, en, string(k), v, tfDiff, diff, forceDiff, etf, eps, true, useRawNames(etf))
    }

The intention here is to walk Pulumi olds and news values, and translate Pulumi paths to TF diff paths in the process, checking if there is a diff or not. This has worked historically for diffs that originate from the user changing the inputs in a program. HOWEVER, bridged providers allow a degree of flexibility to provider authors to inject diff customizer functions that edit the results of a diff and can suppress or introduce diffs at will. When these changes are happening over values that are not present in Pulumi, the algorithm fails to take them into account.

  sequenceDiagram
      participant Program
      participant Bridge
      participant TFProvider
      Program->>Bridge: Diff(olds={tags={}},news={tags={}})
      Bridge->>TFProvider: SimpleDiff(tr(olds), tr(news))
      TFProvider->>Bridge: terraform.ResourceAttrDiff with "tags.n" added
      Bridge->>Bridge:     makeDetailedDiff traverses olds.tags, news.tags, did not see "tags.n"
      Bridge->>Program:    NO CHANGES (incorrect)

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

t0yv0 commented 10 months ago

What if we introduced a change, under a flag, to always respect TF's diff result in what we emit to Pulumi, moving makeDetailedDiff function to be purely concerned with matters of presentation of the diff's details (which are part of UX but no e2e correctness). And as follow up we'd refactor makeDetailedDiff to be accurate in all circumstances (under heavy test).

t0yv0 commented 10 months ago

I believe https://github.com/pulumi/pulumi-terraform-bridge/issues/1491 conclusively fixed it but remains under flag. Once we roll out to AWS and GCP and wait a cycle we can probably remove the flag and adopt the new behavior and close this bug. Open to other rollout plans here.

t0yv0 commented 4 months ago

I'm reopening per recent PR discussion - we had some situations around this and got called out by customers therefore I think more is needed here (possibly https://github.com/pulumi/pulumi-terraform-bridge/issues/1895 ). The initial "fix" which is really a workaround introduced situations where Pulumi is making a change but the detailed diff does not explain it, which was not acceptable. This led to #1696 to provide an answer in https://github.com/pulumi/pulumi-aws/issues/3439 - this works in the happy case but is not a comprehensive fix.

t0yv0 commented 1 month ago

Found a reference on how OpenTOFU decides what kind of plan to undertake based on the PlanResourceChange response from a provider:

https://github.com/opentofu/opentofu/blob/main/internal/tofu/node_resource_abstract_instance.go#L1019

This can likely inform our canonical implementation, with some caveats for pulumi-level features such as precise secrets.

VenelinMartinov commented 2 weeks ago

Hit this in pulumi-azure rolling out PRC: https://github.com/pulumi/pulumi-azure/issues/2322#issuecomment-2297073944