pulumi / pulumi-google-native

Apache License 2.0
72 stars 18 forks source link

Google Native Provider does not support Long Running Operations (in API Gateway, Cloud Build, and Cloud Run .. so far) #403

Open antstanley opened 2 years ago

antstanley commented 2 years ago

Hello!

Issue details

I am attempting and failing to use the @pulumi/google-native provider to deploy a Google Cloud API Gateway. API Gateway's require three resources to be deployed, namely:

These resources need to be deployed sequentially as each resource is a child of the previous resource, and requires it to exist before the next resource can be deployed. These resources can take some time to deploy, often over a minute. Google classifies the API calls to deploy these resources as Long-running operations, where the initial call to the create* endpoint returns an operation. The operation has a unique ID that can be used to query the operation endpoint for that resource to determine the deployment status.

The Google SDK's themselves have support for this long running operation, but it appears Pulumi does not. When I try to deploy a stack with the three API Gateway resources, the initial call to the createApi endpoint returns an operation and Pulumi then stores that operation as the ID of the API. The subsequent call Pulumi makes to the createApiConfig endpoint starts directly after the first createApi call is returned, not when the operation is completed, and uses the ID from the operation as the ApiId. So it fails.

If you wait a few minutes for the createApi call to complete and re-run pulumi up it gets past the createApi step because it exists, but then still fails on the createApiConfig step, because Pulumi has stored the operation Id and the ApiId. I am explicitly using the dependsOn property to define dependencies.

Long-running operations: https://github.com/googleapis/gax-nodejs/blob/main/client-libraries.md#long-running-operations

Language: TypeScript Pulumi Version: 3.26.1 Pulumi Google Native Version: 0.17.1 Node Version: 16.14.2

Steps to reproduce

Example here of a Pulumi config that fails, and an install using the Google Cloud SDK to call the API's directly.

API Gateway Example

Expected: All resources to be deployed Actual: First resource gets created, subsequent dependent resource creation fails

antstanley commented 2 years ago

FYI I can confirm this works with the @pulumi/gcp Google Classic Provider

viveklak commented 2 years ago

@antstanley thanks for the detailed bug report! We do handle the common operation semantics for long running processes but we may need to special case something for the resources. Will investigate and update the issue.

antstanley commented 2 years ago

ahh... so potentially something specific to API Gateway. Thanks for picking this up!

antstanley commented 2 years ago

So this isn't specific to API Gateway. I've found the same problems with trying to run Cloud Build Build jobs and Cloud Run.

Because the Google Classic Provider doesn't support Google Cloud Build Build operation, I don't have a fall back. At this point I'm going to have to recommend moving away from Pulumi for my orgs IaC needs.

Example Repo: https://github.com/antstanley/cloudbuild-example

Run Example

This is the error I get from Pulumi when trying to create a build job to create a docker container from a zip in Google Cloud Storage. The Build actually triggers, and successfully completes, but Pulumi reports an error.

Do you want to perform this update? yes
Updating (dev)

View Live: https://app.pulumi.com/antstanley/cloudbuild-demo/dev/updates/4

     Type                                      Name                            Status                  Info
 +   pulumi:pulumi:Stack                       cloudbuild-demo-dev             **creating failed**     1 error
 +   ├─ google-native:storage/v1:Bucket        testapp-pulumi-bucket-365ea95a  created
 +   ├─ google-native:storage/v1:BucketObject  testapp-pulumi                  created
 +   └─ google-native:cloudbuild/v1:Build      testapp-pulumi-build            **creating failed**     1 error

Diagnostics:
  google-native:cloudbuild/v1:Build (testapp-pulumi-build):
    error: waiting for completion: polling operation status: googleapi: Error 404: Requested entity was not found.

  pulumi:pulumi:Stack (cloudbuild-demo-dev):
    error: update failed

Resources:
    + 3 created

Duration: 8s
wvanderdeijl commented 2 years ago

The previous comment about Cloud build might also be related to #203

rkeene commented 1 year ago

This error occurs because the "buildsId" parameter is a base64 encoded string containing the UUID string:

propValue = operations/build/production/N2QwYTU2ZjItYzlhOC00MGRiLTgzYzQtZDJjZWE2YzE4N2Yz, name = buildsId, alias = name

The base64 value is a base64 encoded string of the UUID string:

> echo 'N2QwYTU2ZjItYzlhOC00MGRiLTgzYzQtZDJjZWE2YzE4N2Yz' | base64 -d
7d0a56f2-c9a8-40db-83c4-d2cea6c187f3