hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.32k stars 1.73k forks source link

google_datastore_index has problems when indexes take a while to create #11316

Open ggprod opened 2 years ago

ggprod commented 2 years ago

Community Note

Terraform Version

Terraform v1.1.3
on darwin_amd64
+ provider registry.terraform.io/hashicorp/google v3.52.0
+ provider registry.terraform.io/hashicorp/google-beta v3.52.0

Affected Resource(s)

google_datastore_index

Terraform Configuration Files

# Copy-paste your Terraform configurations here.
#
# For large Terraform configs, please use a service like Dropbox and share a link to the ZIP file.
# For security, you can also encrypt the files using our GPG public key:
#    https://www.hashicorp.com/security
#
# If reproducing the bug involves modifying the config file (e.g., apply a config,
# change a value, apply the config again, see the bug), then please include both:
# * the version of the config before the change, and
# * the version of the config after the change.

Debug Output

While creating indexes if they take a while to complete there are errors like

β”‚ Error: Error waiting to create Index: Error waiting for Creating Index: couldn't find resource (21 retries)
β”‚ 
β”‚   with google_datastore_index.card,
β”‚   on main.tf line 539, in resource "google_datastore_index" "card":
β”‚  539: resource "google_datastore_index" "card" {

Then if you rerun terraform apply you get errors like:

 Error: Error creating Index: googleapi: Error 409: index already exists
β”‚ 
β”‚   with google_datastore_index.card,
β”‚   on main.tf line 539, in resource "google_datastore_index" "card":
β”‚  539: resource "google_datastore_index" "card" {

Panic Output

Expected Behavior

If Indexes take a while to create this should be captured in the terraform state so that when re-running terraform apply it doesn't attempt to recreate them and throw an error because thy already exist

Actual Behavior

If Indexes take a while to create terraform apply fails with an error and then attempting terraform apply again fails because it attempts to recreate those indexes (which are present and in the process of being created)

ultimately the only way to resolve is to import the indexes into the sate file using the terraform import command

Steps to Reproduce

  1. terraform apply

Important Factoids

References

b/303808201

slevenick commented 2 years ago

Can you provide some more information on why an index would take a while to create or how I would reproduce this error?

ggprod commented 2 years ago

Can you provide some more information on why an index would take a while to create or how I would reproduce this error?

I find they always take a long time to create for me... I see this problem every time.. are you able to create indexes of any sort that don't exhibit this behavior?

slevenick commented 2 years ago

Hmmmmm, no this seems to happen on all indexes. Something must have changed recently to cause this, I'll take a look

slevenick commented 2 years ago

Well, our tests regularly pass in CI at least, so it is possible to create them. Maybe we need to bump the timeout

ggprod commented 2 years ago

Well, our tests regularly pass in CI at least, so it is possible to create them. Maybe we need to bump the timeout

perhaps it is related to the amount of indexes being created simultaneously. Lately I've tended to create a lot at once

tlkamp commented 2 years ago

Hello, I'm having this issue as well, and it happens very frequently. It seems that the creation on the datastore indexes has increased a bit, possibly when there are other indexes being created, and the provider code does not respect the 20 minute resource creation timeout.

I am using terraform-provider-google version 4.1.0.

I have found that even though the resource create timeout is set to 20 minutes, the retries stop happening after 21 retries (much less than 20 minutes), always. I think that may be due to this piece of code: https://github.com/hashicorp/terraform-provider-google/blob/v4.1.0/google/common_operation.go#L154. That call flow is started here https://github.com/hashicorp/terraform-provider-google/blob/v4.1.0/google/resource_datastore_index.go#L158-L165, and when the function returns after 21 retries the ID for the resource is wiped out, even if the index is still being created in the background. This causes conflicts on retries as Terraform does not persist the resource to state even partially.

For what it's worth, in one instance while I was trying to create a google_datastore_index resource, the last line I saw was Still creating... [3m40s elapsed] before the failure on the resource. The entire execution lasted 9 minutes 23 seconds.

slevenick commented 2 years ago

Looks like we need to set the NotFoundChecks in that common_operation for this resource.

Reference: https://pkg.go.dev/github.com/hashicorp/terraform-plugin-sdk/helper/resource#StateChangeConf

NotFoundChecks defaults to 20, and that's what's timing out

ggprod commented 2 years ago

@slevenick another nice thing would be if during the start of the apply phase if some of the indexes that were attempted created in the past apply but timed out (and are still in the creating state) would be automatically imported to state.. a big pain is having some indexes time out and then a reapply erroring because it says the index already exists and then having to manually import all those indexes (just doing that now and it's quite tedious and annoying)

ggprod commented 2 years ago

@slevenick apologies, disregard my previous comment. I was mistaken, it looks like that doesn't happen anymore with recent versions of the provider

ggprod commented 2 years ago

mistaken again.. looks like it can still happen with the latest 4.17.0

muzammil360 commented 2 years ago

Same problem happening for me. I am using v4.33.0. Initially i thought my terraform service account does not have the permission to read the indexes list but even after giving roles/datastore.indexAdmin permission, the problem persists.

Is there any workaround?

image

ggprod commented 2 years ago

@muzammil360 it looks like the error you got is different (it is an authorization problem). I haven't had that problem myself. Perhaps the service account credentials aren't being correctly picked up by terraform?

ggprod commented 2 years ago

I recently did some index building with v4.33.0 of the provider and though the problem still occurs (some indexes take too long to complete and cause a timeout error). However it is much better behaviour now as after the timeout if you rerun terraform apply it will check for indexes in the CREATING state and not try to rebuild them.. so you can converge to the desired state of having all indexes built correctly just by rerunning terraform apply a few times consecutively (for me it recently took 3 times to get all the indexes built correctly).

Previously it wasn't properly checking that some indexes are in the CREATING state and tried to rebuild them and then gave an error that indexes with the same name already existed (so that you had to terraform import the CREATING ones manually)

As far as I'm concerned the behavior is no longer problematic

muzammil360 commented 2 years ago

@ggprod thanks for reply. The interesting thing is that that index do actually get built and does show up on the firestore index UI. Initially i also thought that "Error 403: The caller does not have permission" was a permission issue, so i added roles/datastore.indexAdmin role to the corresponding service account. But it doesn't help.

Also if I run terraform my computer (with my useraccount), I get the same error. Notice that I can build the index from UI and can also see that in the firestore UI.

Also if i run the terraform again, it throws the error that index already exists. The behavior is the same with both my user acccount and SA with indexAdmin role. (picture attached)

image

image

muzammil360 commented 2 years ago

I thought if i will run it after the index building it will work. But it threw the same error as "index already exists". I will try with elevated permissions and see if that helps.

muzammil360 commented 2 years ago

It turns out I was missing datastore.operations.get permission. Without it you can not fetch the status and thus terraform fails. I just added roles/datastore.importExportAdmin and it works now.

@ggprod thanks a lot for the help.

ggprod commented 2 years ago

@muzammil360 ah, I see, good find! It is strange that the roles/datastore.indexAdmin doesn't have that datastore.operations.get permission, but I'm glad you figured it out and were able to resolve your issue.

One reference that you may find useful (if you aren't already aware of it) is this: https://cloud.google.com/iam/docs/permissions-reference which you can use to find which roles have any given permission

johnBgood commented 1 year ago

It turns out I was missing datastore.operations.get permission. Without it you can not fetch the status and thus terraform fails. I just added roles/datastore.importExportAdmin and it works now.

@ggprod thanks a lot for the help.

@muzammil360 how did you investigate this and where did you find the missing perm just out of curiosity :)

muzammil360 commented 1 year ago

@Jonathan, I don't exactly remember how i debuged it. It's been quite some time.

I think deep down in the documentation it said something about this specific granular permission and when i looked into the role, it was missing it.

Based on my experience, one needs to deeply absorb GCP documentation if you want to follow"principal of least privilege". :-)

On Thu, Jun 29, 2023, 1:16 PM Jonathan @.***> wrote:

It turns out I was missing datastore.operations.get permission. Without it you can not fetch the status and thus terraform fails. I just added roles/datastore.importExportAdmin and it works now.

@ggprod https://github.com/ggprod thanks a lot for the help.

@muzammil360 https://github.com/muzammil360 how did you investigate this and where did you find the missing perm just out of curiosity :)

β€” Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-google/issues/11316#issuecomment-1612609712, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3S5II5D3DBQEJS4KXU2CTXNU2V3ANCNFSM5RIKC6DA . You are receiving this because you were mentioned.Message ID: @.***>