pulumi / pulumi-terraform-bridge

A library allowing Terraform providers to be bridged into Pulumi.
Apache License 2.0
192 stars 45 forks source link

State is updated even if the API call fails with an HTTP 400 (gcp.cloudscheduler.Job) #496

Open cakoose opened 2 years ago

cakoose commented 2 years ago

Hello!

Issue details

  1. In my Pulumi config, I set a Cloud Scheduler Job's attemptDeadline: '1m'.
  2. When I ran Pulumi, the step to update GCP failed with an HTTP 400 because '1m' is the wrong syntax. The correct syntax is '60s'.
  3. But it appears that "1m" still got saved in the Pulumi state (based on the diffs produced by later attempts at pulumi up).

I'd expect that if a request fails with an HTTP 400, that Pulumi would know that the remote resource was not updated.

Steps to reproduce

  1. I had an existing gcp.cloudscheduler.Job resource. I did not set attemptDeadline.
  2. pulumi up
  3. I added attemptDealine: '1m'
  4. pulumi up.
    • The diff shows ~ attemptDeadline: "180s" => "1m"
    • The operation fails with an HTTP 400 (see below) because "1m" is the wrong syntax.
  5. I remove the attemptDeadline field.
  6. pulumi up (expecting no changes)
    • The diff shows ~ attemptDeadline: "1m" => "180s"

I checked the setting in the GCP console and it's set to "3m".

Expected: In step 6, there should be no changes.

Actual: Pulumi state thinks that attemptDeadline needs to change from "1m" to "180s".

HTTP 400 failure on pulumi up (step 4)

  gcp:cloudscheduler:Job (send-outgoing-emails):
    error: 1 error occurred:
        * updating urn:pulumi:staging::anrok::gcp:cloudscheduler/job:Job::send-outgoing-emails: 1 error occurred:
        * Error updating Job "projects/anrok-main-staging/locations/us-central1/jobs/send-outgoing-emails-dfab7bb": googleapi: Error 400: Invalid value at 'job.attempt_deadline' (type.googleapis.com/google.protobuf.Duration), Field 'attemptDeadline', Illegal duration format; duration must end with 's'
    Details:
    [
      {
        "@type": "type.googleapis.com/google.rpc.BadRequest",
        "fieldViolations": [
          {
            "description": "Invalid value at 'job.attempt_deadline' (type.googleapis.com/google.protobuf.Duration), Field 'attemptDeadline', Illegal duration format; duration must end with 's'",
            "field": "job.attempt_deadline"
          }
        ]
      }
    ]

Output of pulumi stack export

            {
                "urn": "urn:pulumi:staging::anrok::gcp:cloudscheduler/job:Job::send-outgoing-emails",
                ...
                "inputs": {
                    "__defaults": [
                        "attemptDeadline",
                        "name"
                    ],
                    "attemptDeadline": "180s",
                    "httpTarget": { ... },
                    "name": "send-outgoing-emails-dfab7bb",
                    "schedule": "* * * * *",
                    "timeZone": "America/Los_Angeles"
                },
                "outputs": {
                    "__meta": "{\"e2bfb730-ecaa-11e6-8f88-34363bc7c4c0\":{\"create\":240000000000,\"delete\":240000000000,\"update\":240000000000}}",
                    "appEngineHttpTarget": null,
                    "attemptDeadline": "1m",
                    "description": "",
                    "httpTarget": { ... },
                    "id": "projects/anrok-main-staging/locations/us-central1/jobs/send-outgoing-emails-dfab7bb",
                    "name": "send-outgoing-emails-dfab7bb",
                    "project": "anrok-main-staging",
                    "pubsubTarget": null,
                    "region": "us-central1",
                    "retryConfig": null,
                    "schedule": "* * * * *",
                    "timeZone": "America/Los_Angeles"
                },
                ...
                "initErrors": [
                    "updating urn:pulumi:staging::anrok::gcp:cloudscheduler/job:Job::send-outgoing-emails: 1 error occurred:\n\t* Error updating Job \"projects/anrok-main-staging/locations/us-central1/jobs/send-outgoing-emails-dfab7bb\": googleapi: Error 400: Invalid value at 'job.attempt_deadline' (type.googleapis.com/google.protobuf.Duration), Field 'attemptDeadline', Illegal duration format; duration must end with 's'\nDetails:\n[\n  {\n    \"@type\": \"type.googleapis.com/google.rpc.BadRequest\",\n    \"fieldViolations\": [\n      {\n        \"description\": \"Invalid value at 'job.attempt_deadline' (type.googleapis.com/google.protobuf.Duration), Field 'attemptDeadline', Illegal duration format; duration must end with 's'\",\n        \"field\": \"job.attempt_deadline\"\n      }\n    ]\n  }\n]\n\n"
                ],
                "provider": "urn:pulumi:staging::anrok::pulumi:providers:gcp::default_5_11_0::414be576-6fe8-4639-9a93-58891138010a",
                "propertyDependencies": {
                    "attemptDeadline": null,
                    "httpTarget": [
                        "urn:pulumi:staging::anrok::gcp:serviceAccount/account:Account::cloud-scheduler"
                    ],
                    "schedule": null,
                    "timeZone": null
                }
            },
Frassle commented 2 years ago

I think the relevant code here is that we unconditionally set the value of "Outputs" in the state to whatever the provider Update function returns even if that Update also returns errors. (https://github.com/pulumi/pulumi/blob/master/pkg/resource/deploy/step.go#L485)

So one fix here would be to not save the result of Update if it errors. Another fix is that the provider shouldn't return changed outputs if it has returned an error and not actually done the update and instead return the old outputs.

I suspect we can't do the first fix as a provider could do like a partial update (say your changing two fields and it applies the change to one field then errors on the second).

Pinging @pgavlin to confirm. If this is the case then we should move this to GCP provider to fix up the output return from Update.

mikhailshilkov commented 2 years ago

If this is the case then we should move this to GCP provider to fix up the output return from Update.

Meaning to TF bridge, right?

Frassle commented 2 years ago

Meaning to TF bridge, right?

Yes.

pgavlin commented 2 years ago

So one fix here would be to not save the result of Update if it errors. Another fix is that the provider shouldn't return changed outputs if it has returned an error and not actually done the update and instead return the old outputs.

I suspect we can't do the first fix as a provider could do like a partial update (say your changing two fields and it applies the change to one field then errors on the second).

We do in fact do the first: https://github.com/pulumi/pulumi/blob/master/pkg/resource/deploy/step.go#L468-L474

In that code, if the update fails with a non-partial failure, we do not write the new outputs back. Instead, we fail the step entirely.

Interestingly, though, it looks like the TF bridge always returns partial failures during Create and Update: https://github.com/pulumi/pulumi-terraform-bridge/blob/master/pkg/tfbridge/provider.go#L902-L904, https://github.com/pulumi/pulumi-terraform-bridge/blob/master/pkg/tfbridge/provider.go#L1088-L1090

This behavior was introduced in https://github.com/pulumi/pulumi-terraform/pull/237. I'm somewhat surprised that it has not been more problematic given that it has been around for nearly four years. We should revisit this and see if we can be more precise about when we should return a fatal error rather than a partial error.