Open onuruluag opened 3 years ago
From an- admittedly naive- provider's perspective, this is working as intended. Since the job in the API has disappeared, it should be recreated, and we can't distinguish between a nonexistent job and one that used to exist.
The only potential fix we could consider is to not check if the job is present in the API, but that would actually break import for the resource- we don't know in the Read
function how it has been called, so if the resource was just imported Terraform would report success. Admittedly, this isn't a resource where there is much reason to import it.
I think storing the hash of the source as a state for the load jobs would be appropriate since it should only run the load job when the source file is changed. We tried to achieve this behavior by adding google_storage_bucket_object
's md5hash
value as part of job_id
but we have failed when the history disappeared.
For a workaround, we may use null_resource
and triggers
, but this results not to use google_bigquery_job
.
First terraform apply
for a BigQuery job 183 days after it was created has caused the same issue for me but instead of re-creating, I get a 404 Not Found
and the apply exits with a failure.
Referencing the response by @rileykarson
The only potential fix we could consider is to not check if the job is present in the API
I think this is appropriate given how the provider works natively. A BigQuery Job is created once and cannot be modified. If there is a change to a BigQuery Job resource, this will result in a new Job being created in BigQuery.
but that would actually break import for the resource- we don't know in the Read function how it has been called, so if the resource was just imported Terraform would report success
I think this is appropriate too. Importing a job, whether it exists in BigQuery or has been purged after the 180 period, should indicate to Terraform not to create the resource because it already exists/existed. Isn't this inline with other uses of import
?
I'm inclined to agree with @SunilPaiEmpire points above.
It is frustrating because google_bigquery_job
could be a really nice way of managing DML changes - but i don't see how it is usable currently on its own?
We are considering a script we can run as part of our workflow that will check for and remove jobs over a certain age but it complicates what would otherwise be a very tidy process.
Revisiting this as the same issue was recently brought up again by another user.
@rileykarson Could you forward it to our internal issue tracker too?
Since the jobs are not persistent past 6 months on the server side, we'll need to think of something on the provider level. Could you elaborate on what "not check if the job is present in the API" is? Is it that we don't read the live state at all as we would never attempt to re-create any BigQuery job in Terraform?
Yeah- instead of clearing the resource id on a 404 error (this code) we'd and preserve the old state of the resource by immediately exiting Read w/o an error.
We'd have to figure out what updates to a purged job definition mean as part of that, I think. We could track purged jobs with a boolean clientside, and then use a CustomizeDiff to upgrade update plans to deletions, or add a guard to Update so that updates are no-ops.
I think for the import case I was thinking about, we could only allow importing valid jobs by tracking a flag clientside for that- we'd start it false and set it to true after a successful read. Since Import is ultimately a Read call, we'd return an error on a 404 where that flag is still false. Or just take the "import of nonexisting jobs always succeeds" approach from https://github.com/hashicorp/terraform-provider-google/issues/9768#issuecomment-1377615592.
Yeah- instead of clearing the resource id on a 404 error (this code) we'd and preserve the old state of the resource by immediately exiting Read w/o an error.
We'd have to figure out what updates to a purged job definition mean as part of that, I think. We could track purged jobs with a boolean clientside, and then use a CustomizeDiff to upgrade update plans to deletions, or add a guard to Update so that updates are no-ops.
I think for the import case I was thinking about, we could only allow importing valid jobs by tracking a flag clientside for that- we'd start it false and set it to true after a successful read. Since Import is ultimately a Read call, we'd return an error on a 404 where that flag is still false. Or just take the "import of nonexisting jobs always succeeds" approach from #9768 (comment).
Thanks again for the details. I'm trying to see if we can simplify the handling by just preserving any existing state without having to rely on additional client-side field. Can you confirm that if we return nil
on 404
from Get API response inside JobRead()
, the job resource in the state file will remain unchanged, and that will be used for comparison with the user's config file i.e. diff detection?
404
from Get API, print a message indicating the job might have been purged404
from Get so there should be no diff or any attempt to recreate, which should solve this issue here@rileykarson Could you correct any misunderstanding above? What do you think of the overall plan?
Community Note
modular-magician
user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned tohashibot
, a community member has claimed the issue already.Terraform Version
Terraform v1.0.1 on linux_amd64
Your version of Terraform is out of date! The latest version is 1.0.4. You can update by downloading from https://www.terraform.io/downloads.html
Affected Resource(s)
Terraform Configuration Files
Expected Behavior
Since the job already ran and there is no change in the JSON file, the BQ load job must not be re-created.
Actual Behavior
After 6 months, GCP deletes the BQ job history, and load jobs are re-created even though there is no change in JSON files.
Steps to Reproduce
terraform apply
terraform apply
terraform apply
The issue is caused by the design of Terraform state management for
google_bigquery_job
load jobs and GCP job history lifetime (6 months).b/371632037