hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
41.79k stars 9.43k forks source link

Proposal: Versioned resources #8765

Open radeksimko opened 7 years ago

radeksimko commented 7 years ago

New commands

$ terraform plan
$ terraform apply
# The new ECS TD doesn't work  (╯°□°)╯︵ ┻━┻, let's roll back

$ terraform list-versions aws_ecs_task_definition.myapp
myapp-td:4 # CURRENT
myapp-td:3
myapp-td:2
myapp-td:1
$ terraform set-version aws_ecs_task_definition.myapp myapp-td:3

$ terraform plan
# here we see plan w/ diff in dependent resources (e.g. aws_ecs_service)

$ terraform apply

# Rollback DONE

$ terraform list-versions aws_ecs_task_definition.myapp
myapp-td:4
myapp-td:3 # CURRENT
myapp-td:2
myapp-td:1

Schema changes

type Resource interface {
    ListVersions func(d *schema.ResourceData, meta interface{}) ([]string, error)
    SetVersion   func(d *schema.ResourceData, meta interface{}) error

I'd expect set-version to have a very similar behaviour to taint - i.e. no API calls, just ID change. The difference would then appear during refresh/plan/apply as these would be working with the new version of the resource.

Resources that would benefit from this

The way we think about versioned resources today (in majority of cases) is that a given resource manages the latest version and refresh pulls down the latest version if it doesn't match with local config. Some resources destroy old versions (ECS TD), some don't (S3 object, Docker Image, Lambda), some don't create new versions at all (API Gateway Deployment).

This was treated as a nice way to detect changes outside of Terraform which we'd now lost.

Impact

destroy

We would only destroy a given version of the resource rather than all versions which could potentially cause dependency issues as we might not be able to destroy some parent resources - e.g. aws_api_gateway_rest_api.

We could accept this as a feature/default behaviour and make people explicitly do something like

$ terraform destroy -all-versions

which would under the hood result in something like this:

for version in $(terraform list-versions RESOURCE_TYPE.NAME); do
  terraform set-version RESOURCE_TYPE.NAME $version
  terraform destroy -target=RESOURCE_TYPE.NAME
done

refresh

We would refresh data about a given version of that resource, rather than pull the latest version, but here's how the user could emulate the latter behaviour:

$ terraform set-version aws_ecs_task_definition.myapp $(terraform list-versions | head -1)
$ terraform refresh

apply

Each apply would (potentially) generate a new version (new resource) which in turn means it would leave some orphans behind.

Some resources may need a new field as the cause of version generation comes from outside of the resource. We already have this in docker_image and we'd probably need to add similar one to aws_api_gateway_deployment too.

taint

This would behave as apply with changed trigger.

apparentlymart commented 7 years ago

Thanks for starting this discussion, @radeksimko!

I have a question about your initial example, which I'll quote some parts of and then comment inline:

$ terraform plan
$ terraform apply
# The new ECS TD doesn't work  (╯°□°)╯︵ ┻━┻, let's roll back

$ terraform list-versions aws_ecs_task_definition.myapp
myapp-td:4 # CURRENT
myapp-td:3
myapp-td:2
myapp-td:1
$ terraform set-version aws_ecs_task_definition.myapp myapp-td:3

If I understood the rest of the proposal correctly, I think your intent is that at this point we've recorded in the state that version 3 is "current", but we've not actually changed anything in the real system.

$ terraform plan
# here we see plan w/ diff in dependent resources (e.g. aws_ecs_service)

In which case, what exactly is plan doing here?

Normally we expect plan to generate a diff from the current state to what's in the configuration, but if we did that here then we'd presumably flip back to being on version 4 during refresh (because it wants to make the state match the "real world") and then get no diff at all because the config will (presumably) be describing the world as of version 4.

It feels to me like set-version must actually take action on the real system because otherwise whatever it does to the state would be undone by a subsequent refresh.


One way we could conceptualize "rolling back" is as a funny sort of plan. Normally a plan produces a diff from the state to the config, and "rolling back" could be thought of as a diff from the current state to some "synthetic configuration" we obtain by reading the old version from the real system, in situations where the backend itself has a versioning concept.

So building on the idea of terraform plan -destroy (which acts as a modifier for what we're diffing), perhaps we'd say:

$ terraform plan -set-version="aws_ecs_task_definition.myapp=myapp-td:3" -out=tfplan

aws_ecs_task_definition.myapp: Refreshing state... (ID: abc123)
aws_ecs_task_definition.myapp: Reading version myapp-td:3... (ID: abc123)

...

~ aws_ecs_task_definition.myapp
    version_id:            "myapp-td:4" -> "myapp-td:3"
    container_definitions: "abc" -> "123"

...

Here's what I assumed plan did here:

Now when we later apply this plan, the Update implementation for this resource can see that version_id changed from "myapp-td:4" to "myapp-td:3" and so implement this update by taking whatever API action maps to rolling back to this earlier version, rather than by creating a new version as it presumably would've done in the normal case. The other attribute changes shown in the diff are purely for user information and would not actually be used in this action, because the provider could assume that the rollback would implicitly make those changes.

I think we need to think carefully about what happens to dependent resources in this situation. If we had another resource consuming the revision attribute of that resource, I think we'd expect the terraform plan -set-version="..." command to produce a plan that also includes a diff for the downstream resource that is using the revision attribute, so the rollback can ripple through any downstream resources that might need to be repointed at the earlier version in some way.

radeksimko commented 7 years ago

we've not actually changed anything in the real system.

This obviously depends on the implementation of Resource.SetVersion, but I thought we could have a Passthrough helper, similar to the one we have for imports - basically all it would do is d.SetId(givenId) where givenId is the [version-id] from terraform set-version resource [version-id].

Whether we should call Read() straight after resetting the ID is another question. I think we should follow taint here - so I shouldn't call it.

It feels to me like set-version must actually take action on the real system because otherwise whatever it does to the state would be undone by a subsequent refresh.

It would not be undone, in fact if ID is taken is real unique ID (version-unique) - which in most cases is - then refresh would just pull a different version of the resource, if implemented well.

Normally a plan produces a diff from the state to the config, and "rolling back" could be thought of as a diff from the current state to some "synthetic configuration" we obtain by reading the old version from the real system, in situations where the backend itself has a versioning concept.

By the time we get to plan generation, we'll already have the new versioned resource pulled in place (either in memory in case of plan or in tfstate in case of refresh) so we should not need to make that distinction.

I think we need to think carefully about what happens to dependent resources in this situation. If we had another resource consuming the revision attribute of that resource, I think we'd expect the terraform plan -set-version="..." command to produce a plan that also includes a diff for the downstream resource that is using the revision attribute

Yes, at the moment dependent resources are planned for potential update only if Computed fields are marked for "recomputation", which in reality is just taint or change of any ForceNew: true field.

There's another good reason why expand the number of cases where dependent resources should be scheduled for update, specifically https://github.com/hashicorp/terraform/pull/4846

So basically set-version would also mark Computed resources for "re-computation" besides changing the ID of the resource - which should in theory make it all work together.

radeksimko commented 7 years ago

btw. yes - the above mechanism is kinda broken under apply -refresh=false 💭 😞

apparentlymart commented 7 years ago

Ahh, I see where I misunderstood. You imagine the resource id as identifying the specific version, where I was expecting it works be identifying the versioned object itself, regardless of current version.

Seems like we need both ids in any case, since the version-agnostic id will be needed to enumerate the available versions.

I am still a little confused about the interaction between set-version and plan; how does the subsequent plan know that you want to make the world match the old version, as opposed to matching what is in the config as it would do by default?

radeksimko commented 7 years ago

Seems like we need both ids in any case, since the version-agnostic id will be needed to enumerate the available versions.

Agreed.

I am still a little confused about the interaction between set-version and plan; how does the subsequent plan know that you want to make the world match the old version, as opposed to matching what is in the config as it would do by default?

The config (in HCL) is expected to be version agnostic, so there's no conflict there (I think). The only difference would appear to be in 1 or more computed fields - e.g. arn. I'm not sure how the diff generation works for ID in particular - I assume it's not treated as Computed field, so maybe after all we would need to set ID + 1 computed field.


Something I didn't realise in the initial proposal and what you were probably asking about: We would basically suppress the diff of the versioned resource itself. It may sound like a bad idea, but after all we're not changing anything, we're just referencing a different resource which already exists, so I think it's fine.

All you would see in the plan is other resources that reference the versioned resource.

radeksimko commented 7 years ago

The config (in HCL) is expected to be version agnostic, so there's no conflict there (I think).

Actually I think I now understand what do you mean. That ^ is not true. 😞

radeksimko commented 7 years ago

But I still think we can get around that, if we flag versioned resources specifically in the tfstate and then have 2 sets of fields - 1st primary (basically matching HCL), 2nd pinned version then we may be able to compare things accordingly -> we can detect changes if HCL != Primary, but then do the actual comparison with pinned version. After apply we'd basically remove the pinned version.

All references would be 1st compared against pinned version if one exists.

madamedwards commented 7 years ago
dscowan commented 5 years ago

Any plan to address this anytime soon?