upbound / provider-terraform

A Crossplane provider for Terraform
https://marketplace.upbound.io/providers/upbound/provider-terraform/
Apache License 2.0
151 stars 59 forks source link

Support for Policy as Code against the planfile on the Update / Create #223

Open lukegAtPaxos opened 11 months ago

lukegAtPaxos commented 11 months ago

What problem are you facing?

When running a terraform apply, validation of the plan file can be useful to ensure nothing unexpected is going to happen.

How could Official Terraform Provider help solve your problem?

Tools like conftest are used in other opensource tooling (Atlantis) and Hashicorp Sentinel in Terraform Cloud.

Would it be possible to block the Update / Create crossplane actions on a policy as code evaluation of the planfile.

bobh66 commented 11 months ago

You might be able to do something like this once #219 is merged. You could create the Workspace with managementPolicies: [Observe] which would run terraform plan and nothing else. We don't currently expose the output of that but it's probably feasible to do in some form. Then when you're happy with the output you could change the managementPolicies to [Observe, Create] or ['*'] to let it proceed. See https://docs.crossplane.io/latest/concepts/managed-resources/#managementpolicies

lukegAtPaxos commented 11 months ago

Oh damn, I hadn't realised management policies weren't already implemented.

caiofralmeida commented 10 months ago

Hi @lukegAtPaxos, the PR https://github.com/upbound/provider-terraform/pull/219 has been merged.

balu-ce commented 10 months ago

hi @bobh66 , iam making code changes like exposing the plan in gzip format either via config map or variable in status field, may i know which one will work and feasible

bobh66 commented 10 months ago

Hi @balu-ce - that's a good question. There are size limitations to both that might become an issue for really large terraform modules, but it would probably work for the majority of the use cases. Another possibility might be to create a Plan resource that gets created as a child of the Workspace resource. That doesn't solve any of the content size issues, it's just another way of representing the data.

I think I would lean toward using status for simplicity.

balu-ce commented 10 months ago

@bobh66 we can increase the limit of CRD object if needed know ref

balu-ce commented 10 months ago

@bobh66 Added changes, can you please review it

lukegAtPaxos commented 10 months ago

I've been using the provider a bit more recently and had some thoughts.

potentially the introduction of a PolicyConfig resource, much like the provider config it can be referenced by a workspace. In the terraform execution stage.

in https://github.com/upbound/provider-terraform/blob/main/internal/terraform/terraform.go the provided policy could be invoked on the plan, against something like conf test.

This would involve outputting a plan file, before the apply - and executing the policy engine. The apply could then be invoked from the stored plan file.

The result of a policy failure would be to cause the workspace to fail, bubbling up the failure in status.

curious what maintainers think, i can knock out a poc.

negz commented 9 months ago

I'm not convinced that it's a good idea to try to implement a human-approves-plan workflow inside of provider-terraform's reconcile loop. In fact I'm pretty confident it's not.

What's the benefit of having an always-on control plane constantly monitoring your infrastructure if all it's going to do is update a plan that a human needs to review and tell it it's okay to apply?

I understand the desire for a dry run (e.g. https://github.com/crossplane/crossplane/issues/1805) to build confidence before making a change to a configuration. Wouldn't it make more sense to shift that left, though? i.e. Check that the plan isn't going to do anything unexpected before the new configuration hits the control plane. For example by validating the new plan as part of a gitops workflow in CI (or even just by running terraform plan on a developer's laptop).

Edit: I feel similarly WRT automated tests. Why not run them before the configuration (i.e. CR) is ever applied to the control plane?

balu-ce commented 9 months ago

I doubt that it will applicable for other native providers of crossplane but we need to reconsider it for terraform.

negz commented 9 months ago

I doubt that it will applicable for other native providers of crossplane but we need to reconsider it for terraform.

Why?

toastwaffle commented 9 months ago

@negz

Wouldn't it make more sense to shift that left, though? i.e. Check that the plan isn't going to do anything unexpected before the new configuration hits the control plane. For example by validating the new plan as part of a gitops workflow in CI (or even just by running terraform plan on a developer's laptop).

My concern would be that if we're not running terraform plan in exactly the same setup that provider-terraform runs terraform apply, we don't have any guarantees that the plan will reflect the changes that will be made. Especially with ProviderConfigs separating some of the config from the Workspace, it would be easy for the plan to be run with different providers, credentials, or a different terraform binary.

I agree that provider-terraform itself shouldn't be itself handle the approval workflow, but if it supported exposing the plan, you could fairly easily build an approval workflow on top of it using the management policies as Bob suggests above

negz commented 9 months ago

you could fairly easily build an approval workflow on top of it using the management policies

It sounds like you're talking about an automated workflow, right? So some other system (not a human) validates the change? Do you have an example of what kind of validation such a system would run?

toastwaffle commented 9 months ago

A combination of human and automated validations. One example would just be time based checking - e.g. changes should only be made between 10am and 4pm, Monday to Thursday, or only made during pre-approved maintenance windows.

I'd like to live in a world where we trusted ourselves and our systems to not require human approval of changes, but for production-critical environments we may be required to have a human review step as well.

negz commented 9 months ago

I'd like to live in a world where we trusted ourselves and our systems to not require human approval of changes, but for production-critical environments we may be required to have a human review step as well.

Is that view compatible with the controller/reconciler "control plane" model? It almost seems like it's turning the control plane into a change notification system - i.e. something to let you know when drift has happened and ask you what to do about it.

Do you see this as something that would be relevant for other Crossplane providers, or only this Terraform one? Do you use/have plans to use other providers?

toastwaffle commented 9 months ago

The way I see it, the control plane is responsible for actuating/enforcing an intent and correcting for drift; what we want to build on top of that is a set of safeguards between the changes being made by developers in source control, and a new version of the intent being "activated" (not that we intend to explicitly version things).

We have a lot of Terraform modules (occasionally several layers deep 😬 ) that are used in various combinations across many environments, and when a developer makes a change to a module, they're probably not going to be the one landing the change which bumps the module version and rolls that change and a bunch of others out to the fleet.

I don't think this is directly relevant to other Crossplane providers - the individual managed resource level is probably too granular for that sort of approval workflow - but I could see something similar being used for changes to compositions and claim parameters (given that a Terraform workspace can provision multiple cloud resources, I generally think of them as loosely equivalent to composite resources and compositions).

We do currently use provider-gcp, but only a tiny amount (probably 20-odd MRs, if that). Unfortunately we have a lot of history with Terraform, and I don't think there's any way we could justify rewriting all of our Terraform modules.

bobh66 commented 9 months ago

I don't know that I see any more work here for the provider beyond making the plan available for review in #226 . A higher layer can handle scheduling and approval of changes by manipulating claims and managementPolicies to control what changes get applied and when, and if the change gets applied automatically or requires approval.

toastwaffle commented 9 months ago

Yes, absolutely, that PR is all we need for our purposes.

lukegAtPaxos commented 9 months ago

I'm not convinced that it's a good idea to try to implement a human-approves-plan workflow inside of provider-terraform's reconcile loop. In fact I'm pretty confident it's not.

What's the benefit of having an always-on control plane constantly monitoring your infrastructure if all it's going to do is update a plan that a human needs to review and tell it it's okay to apply?

I understand the desire for a dry run (e.g. crossplane/crossplane#1805) to build confidence before making a change to a configuration. Wouldn't it make more sense to shift that left, though? i.e. Check that the plan isn't going to do anything unexpected before the new configuration hits the control plane. For example by validating the new plan as part of a gitops workflow in CI (or even just by running terraform plan on a developer's laptop).

Edit: I feel similarly WRT automated tests. Why not run them before the configuration (i.e. CR) is ever applied to the control plane?

FWIW my original statement wasn't introducing a human approved step, but allowing a hook for policy automation to validate a plan file.

Policy in terraform can be written for many things - a policy could be written to ensure no AWS IAM Keys are being created for example - but at its root its validating the plan file, not offering a human approver the ability to approve.

toastwaffle commented 7 months ago

@negz @bobh66 can we get a decision on this/#226 please?

negz commented 7 months ago

I defer to @bobh66 and @ytsarev. While I created this provider, I haven't touched it in some time.

bobh66 commented 7 months ago

I understand the need for modeling changes and pre-approvals, but I agree with @negz that this sort of processing should be handled at a higher/different layer and that the provider should simply "do what it's told". I think it's better to handle this as part of crossplane/crossplane#5477 rather than do something provider-specific here. I'll update #266 as well.