remote state is not incrementally updated

manuwela commented 7 years ago

Terraform Version

Terraform v0.9.1

Affected Resource(s)

Please list the resources as a list, for example: s3 remote state

It is a remote backend issue affecting tfstate

Terraform Configuration Files

Terraform has multiple modules with lots of tf files along with remote backend configured as S3

Expected Behavior

Remote backend behavior for tfstate stored on S3 should be such that as my resources get created one by one, the tfstate on S3 should immediately get updated with the details of these resources in real time

For local tfstate files, tftsate gets updated immediately when a resource becomes "creation completed" and I can see it in tfstate immediately. But that is not the case with Remote S3 backends

Actual Behavior

What actually happened? I saw the the remote tfstate file in S3 UI , only after terraform apply completed and all resources were already created by then. I did not see the tfstate file getting updated as resources were getting created on by one.

Steps to Reproduce

Please list the steps required to reproduce the issue, for example: terraform apply

manuwela commented 7 years ago

@mitchellh : Could you please comment on whether this is terraform design/implementation or whether this is S3 behavior. Thanks

glasser commented 7 years ago

This is a big problem for us with our recent upgrade to 0.9. We run end-to-end tests creating a full TF setup from scratch, and the initial apply can take quite a while and create hundreds of resources. If the initial apply fails (eg, if it exceeds the timeout set by our end-to-end test suite), the tfstate is essentially empty and we've leaked a huge number of resources.

glasser commented 7 years ago

BTW, I would probably describe this as "apply only persists remote state once at the end". I don't think it's S3-specific.

Looking at the code, there's a sense in which this is not a regression from pre-0.9: the older code also only called PersistState once at the end. But it does feel different to me. I think that's because the pre-0.9 code still updated the local file periodically during a long apply, which could be used from a follow-up destroy on the same machine at least? The 0.9 code does at least drop a terraform.tfstate.backup while using remote state, but it appears to not get updated as resources are created.

glasser commented 7 years ago

I'd be really interested in fixing this if I have some advice about how to go about it. Based on the past few issues I've filed it seems like @jbardin is the person to talk to?

If I'm understanding the code correctly, backend_apply.go uses a StateHook to continuously update its opState object during the operation. This StateHook (which is only used in this one place*) calls WriteState on the State.

I would argue that StateHook should also call PersistState. Looking at the implementations of State, most of them have no-ops for PersistState, so this wouldn't make much of a difference.

The only two with non-trivial PersistState are remote.State and BackupState.

remote.State is the one I care about, and it doesn't seem to me that there's much of a point in calling its WriteState a bunch of times during the operation if all it does it update a pointer. This is also somewhat analogous to how local state works, where WriteState actually writes to a file.

And since BackupState only does its backup operation once, calling PersistState would be a no-op too.

My one concern would be error handling: should a failure to PersistState mid-operation really cause a HookActionHalt? This seems a little relevant to @apparentlymart 's just-merged #14423.

Anyway, let me know if this is a good change and I'll make it.

* note that there's a second, apparently dead, copy of StateHook in terraform/command

glasser commented 7 years ago

Actually, I might make an even stronger argument: we should just get rid of the WriteState/PersistState separation and move remote persistence into WriteState.

Looking at the codebase, every single call to WriteState is immediately followed by PersistState except the one I want to change and this one:

https://github.com/hashicorp/terraform/blob/5c85c9cf6c0fc0b9ddded79455c2b036027b6a42/command/meta_backend.go#L1207

Is it possible that this is a bug and this other case should PersistState too? I don't really understand the Hash stuff. If I understand properly, a PersistState here might be redundant, but it's also a relatively edge-case piece of code that shouldn't run every time you run a TF command, so it might be better to simplify the API even if it causes one extra network round trip in this rare case.

jbardin commented 7 years ago

Hi @glasser

Thanks for the feedback here. I think we want to keep the separation between Write and Persist, even though we don't make good use of it at the moment. PersistState can be a very expensive operation for some backends, and calling it on every modification may be overkill.

The PR you referenced definitely helps with the "lost state" situation, but I intend to review the codepaths handling the state here to see what can be improved. Now that remote state backends operate directly on the remote storage, we should probably implement a write cache similar to how the old remote state worked. I've been thinking about leveraging the BackupState for this purpose, so that it not only saves a complete copy of the previous state, but also records modifications during the run.

Is it possible that this is a bug and this other case should PersistState too? I don't really understand the Hash stuff.

This is purely for local configuration, which just happens to use the same state data structures for storage, but is separate from the actual state that users are concerned with and stored in separate files.

glasser commented 7 years ago

That makes sense.

So maybe the hook should persist the state, but throttled to do so, say, at most once every 15 seconds?

Is this something you're going to work on soon, or would it be helpful for me to give it a stab?

jbardin commented 7 years ago

@glasser, You mentioned that your tests can timeout and leak resources? If you're not signaling for terraform to shutdown gracefully, or killing before it completes you're pretty much guaranteed to lose state, the only difference now is that you may lose more of the state, but there's was no guarantee previously that it was remotely complete either.

Another issue here is that updates to S3 are only eventually consistent, so multiple updates during a Terraform execution wouldn't be guaranteed to appear immediately, or even in order from the view of other clients.

Now losing "more" state than before is still less than desirable, since it can require more manual intervention to take clean up. I have some changes in progress to help make saving remote state more robust, but we need to understand that the previous behavior isn't something that can be relied upon.

glasser commented 7 years ago

99% of the time when an apply is timing out, it's because an individual resource inside it is hanging and will never finish (eg, it's trying to bring up an AWS ASG with buggy code where no machine will ever come up successfully). I've found in those cases that the Terraform graceful shutdown support is basically useless, since I don't think it interrupts in-process resource updates. (The resources in question have very long timeouts set on them because sometimes they really do legitimately take a long time to roll out.) Maybe something about graceful shutdowns has changed recently, but in the particular use cases where I want to cancel an apply, I have never had an experience where graceful shutdown actually shut down at the end.

That said, throwing a single call to opState.PersistState into backend_apply.go when the graceful shutdown starts might help a bit. (Though literally just doing that might have a race condition.)

I do understand your point that it's hard to have any guarantees with a non-graceful shutdown. But this really is a significant regression from non-backend Terraform. If you're doing a giant create-from-scratch apply before 0.9 and it dies, you are likely to be left with a local file listing the resources it created. With 0.9 backends, you are 100% guaranteed that no Terraform-created resource will have any reference to it written on any file or remote resource unless the terraform.Context.Apply finishes.

I'm not really worried about race conditions here — I accept that they may exist in the world. I'm talking about creating brand new resources and not writing the name down anywhere even if the process doesn't get killed for another half an hour.

Would you be likely to accept a PR that adds a throttled PersistState call to StateHook? I think that would solve this problem and not add too much overhead, if it's only calling PersistState every 30 seconds or so.

jbardin commented 7 years ago

@glasser,

I totally agree here. The changes I mentioned actually include both a call to PersistState when the operation context is cancelled, and have StateHook periodically calling PersistState.

glasser commented 7 years ago

Thanks for prioritizing this! I hope my earlier comments weren't too aggressive - just wanted to be clear that I would be happy to help out if you weren't going to get to it soon.

jbardin commented 7 years ago

No problem @glasser! I already had these changes in mind for the next release, we just got them merged a little faster ;)

glasser commented 7 years ago

Note that this fix was reverted in #15176 because it caused a race condition.

manuwela commented 7 years ago

@jbardin : Can this issue be re-opened again. Thanks.

jbardin commented 7 years ago

Hi,

Once the 0.10 betas are underway we can revisit this to determine why the partial states adversely effected Terraform Enterprise.

manuwela commented 7 years ago

Thanks @jbardin!

ultrasound commented 6 years ago

I have same problem with tf 0.11. after terraform apply I can see the state file but when I modify some resources of main.tf and execute terraform plan, terraform shows me that it would add new resources on AWS instead of changing current resources.

adschutz commented 4 years ago

This might seem like ancient history, but this is biting me hard right now. Could you please, if nothing else, add a parameter for the backend config for this?

mikeroll commented 3 years ago

This is still perfectly relevant for us with Terraform 1.0.3 using the S3 state backend. Our CI server may under certain conditions abort terraform apply with a SIGKILL so it doesn't have the chance to do a final PersistState (or whatever that is these days) and we end up with a lot of untracked resources. Given that otherwise Terraform handles partial apply pretty well, I would really expect the state to be persisted much more often - ideally after each resource change.

What's the current stance on this and is there anything planned to address the issue?

rajpa0 commented 3 years ago

I also recently ran into this issue when using a long-running Terraform job from a Jenkins pipeline where my deployment container was terminated abruptly. I was using S3 as a remote backend, and my lock was also stuck because Terraform did not get a chance to exit cleanly. The force-unlock was fine, but since the remote S3 backend was not continuously updated with changes from this long-running job, I was left with no state and 45 orphaned resources.

Neither terraform refresh nor terraform plan could recollect anything that was done (understandably because remote state was not updated at all). The only option was to analyze the original output from the failed job and generate terraform import commands to get orphaned resources cleaned up. This works well for some resources that are imported via UUID, but doesn't work for several resources types that require configuration-specific information as part of the import command.

In this sort of scenario and others that have been mentioned above, I think having remote state updated continuously makes a lot of sense. At the very least, the scope of orphaned resources would be vastly reduced, making for a far better recovery experience.

david-b8-ncsc commented 2 years ago

Same issue - specifically on GitHub Actions where the job timeout seems to just SIGKILL the container and we're left with a swathe of orphaned resources and an empty state file in s3.

GroovyCarrot commented 2 years ago

Can I reboot this as well? We've just had a nightmare tracking down orphaned resources, and now I'm finding it hard to justify why we would use this over cloudformation. As mentioned above github actions can send SIGKILL to a running cicd container, and we are using Azure DevOps and have experienced the same thing - it didn't even timeout 🤦

If we're using remote state then surely this is the expected behaviour? Even if a resource is in the process of being created, the service will have given you the id of the resource that is in pending state - there just doesn't seem like any reason why the remote state cannot be completely up to date always?

crw commented 2 years ago

Thanks for the continued feedback on this issue! At this point, we understand the use cases, and know that we would like to address this issue in the future, pending prioritization. For this reason I am going to lock comments, please continue to express your interest in this issue using the upvote mechanic (👍 ) on the original description. Thanks very much!

hashicorp / terraform