upbound / provider-terraform

A Crossplane provider for Terraform
https://marketplace.upbound.io/providers/upbound/provider-terraform/
Apache License 2.0
151 stars 59 forks source link

fix(workspace): don't delete directory on every reconciliation #276

Closed arththebird closed 4 months ago

arththebird commented 4 months ago

Description of your changes

Stop removing the Workspace directory on every reconciliation. This is no longer needed as the bug in go-getter was fixed in v1.7.5 and go-getter was upgraded to latest by Renovate in this PR.

Not removing the workspace directory also results in terraform init not being run every time when there's no change because the checksum is the same. This effectively allow using max-reconcile-rate > 0 while using the plugin cache (issues can still happen if a workspace is applying while another one is being init, but it happens much less frequently than before when there's no changes).

Fixes #275 Fixes #230

I have:

How has this code been tested

  1. Deploy the provider locally using make local-deploy
  2. Deploy a workspace using a remote repository as the module source
  3. Check the creation time of the .terraform directory in the workspace directory and notice that it doesn't get deleted on every reconciliation
  4. Check the provider logs and notice that the workspace is no longer being init on every reconciliation
2024-06-28T18:01:04.009Z    DEBUG   provider-terraform  Reconciling {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:01:08.702Z    DEBUG   provider-terraform  Checksums match - skip running terraform init   {"request": "standard-metadata"}
2024-06-28T18:01:08.908Z    DEBUG   provider-terraform  External resource is up to date {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:03:00.169Z"}
2024-06-28T18:03:00.180Z    DEBUG   provider-terraform  Reconciling {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:03:04.900Z    DEBUG   provider-terraform  Checksums match - skip running terraform init   {"request": "standard-metadata"}
2024-06-28T18:03:05.157Z    DEBUG   provider-terraform  External resource is up to date {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:03:46.505Z"}
2024-06-28T18:03:46.519Z    DEBUG   provider-terraform  Reconciling {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:03:51.150Z    DEBUG   provider-terraform  Checksums match - skip running terraform init   {"request": "standard-metadata"}
2024-06-28T18:03:51.337Z    DEBUG   provider-terraform  External resource is up to date {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:03:53.990Z"}
2024-06-28T18:03:53.996Z    DEBUG   provider-terraform  Reconciling {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:03:59.063Z    DEBUG   provider-terraform  Checksums match - skip running terraform init   {"request": "standard-metadata"}
2024-06-28T18:03:59.263Z    DEBUG   provider-terraform  External resource is up to date {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:04:59.361Z"}
2024-06-28T18:04:59.376Z    DEBUG   provider-terraform  Reconciling {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:05:04.305Z    DEBUG   provider-terraform  Checksums match - skip running terraform init   {"request": "standard-metadata"}
2024-06-28T18:05:04.568Z    DEBUG   provider-terraform  External resource is up to date {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:07:00.865Z"}
2024-06-28T18:07:00.786Z    DEBUG   provider-terraform  Reconciling {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:07:05.777Z    DEBUG   provider-terraform  Checksums match - skip running terraform init   {"request": "standard-metadata"}
2024-06-28T18:07:05.993Z    DEBUG   provider-terraform  External resource is up to date {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:08:51.454Z"}
2024-06-28T18:08:51.464Z    DEBUG   provider-terraform  Reconciling {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:08:56.196Z    DEBUG   provider-terraform  Checksums match - skip running terraform init   {"request": "standard-metadata"}
2024-06-28T18:08:56.409Z    DEBUG   provider-terraform  External resource is up to date {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:10:35.639Z"}
2024-06-28T18:10:35.647Z    DEBUG   provider-terraform  Reconciling {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:10:40.364Z    DEBUG   provider-terraform  Checksums match - skip running terraform init   {"request": "standard-metadata"}
2024-06-28T18:10:40.554Z    DEBUG   provider-terraform  External resource is up to date {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:11:09.485Z"}
2024-06-28T18:11:09.491Z    DEBUG   provider-terraform  Reconciling {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:11:14.273Z    DEBUG   provider-terraform  Checksums match - skip running terraform init   {"request": "standard-metadata"}
2024-06-28T18:11:14.547Z    DEBUG   provider-terraform  External resource is up to date {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:12:55.419Z"}
2024-06-28T18:12:55.427Z    DEBUG   provider-terraform  Reconciling {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:13:00.382Z    DEBUG   provider-terraform  Checksums match - skip running terraform init   {"request": "standard-metadata"}
2024-06-28T18:13:00.609Z    DEBUG   provider-terraform  External resource is up to date {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:14:51.656Z"}
2024-06-28T18:14:51.661Z    DEBUG   provider-terraform  Reconciling {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}}
2024-06-28T18:14:57.491Z    DEBUG   provider-terraform  Checksums match - skip running terraform init   {"request": "standard-metadata"}
2024-06-28T18:14:57.656Z    DEBUG   provider-terraform  External resource is up to date {"controller": "managed/workspace.tf.upbound.io", "request": {"name":"standard-metadata"}, "uid": "ade13d39-05e1-40b4-9372-8ca4d30a403b", "version": "7717", "external-name": "standard-metadata", "requeue-after": "2024-06-28T18:16:48.223Z"}
Upbound-CLA commented 4 months ago

CLA assistant check
All committers have signed the CLA.

bobh66 commented 4 months ago

I'm wondering why the CI pipeline didn't run on this PR? I'm reasonably sure the fix is fine but I'd feel a lot better if the CI jobs had run.

ytsarev commented 4 months ago

Hi @bobh66, we have an issue with the multiple runner label selection from yesterday in reusable workflows. It should be fixed after the merge of https://github.com/upbound/official-providers-ci/pull/209

@arththebird thanks a ton for this contribution, just amazing! 🥇

arththebird commented 4 months ago

@arththebird thanks a ton for this contribution, just amazing! 🥇

My pleasure, thank you for the fast review and merge 🙌

Not sure how the release cycle for this provider is setup, I'm wondering when can we expect this to be released officially?

ytsarev commented 4 months ago

@arththebird I had an internal chat with @turkenf, we target the release for this Thursday 👍

project-administrator commented 4 months ago

Installed the latest release. provider-terraform is still deleting the workspace and re-initing it on every reconciliation. I don't have any directories under the "/tf" inside the provider pod with the modification date older than 1 hour.

bobh66 commented 4 months ago

@project-administrator do the provider logs indicate that terraform init is being run every reconciliation? Is it possible that go-getter is updating the timestamp of the directory on every reconciliation loop?

project-administrator commented 4 months ago

@bobh66 checked that and verified that checksums never match:

2024-07-11T18:11:14.815Z    DEBUG   provider-terraform  Checksums don't match so run terraform init:    {"request": "demo-dev", "old": "ccbd114f2f61a213c57f3fc023222b3d", "new": "4812c08f3f95a371c9c2e5a6cdbaf9e1"}

So now it's the checksum calculation taking smth into account that is not there. I wonder if it might be either the crossplane-provider-config.tf which is generated by the Crossplane's providerconfigs.tf.upbound.io resource, or few files which are downloaded by the terraform run?..

I can see that GenerateChecksum uses the following command:

/usr/bin/find . -path ./.git -prune -o -path ./.terraform -prune -o -type f -exec /usr/bin/md5sum {} +

With our config it always picks up more files than needed:

19e9bd65283e39a75bd15d532eeb4a3a  ./.git-credentials
c115cb3d433d8fc4304272d851cc4694  ./crossplane-provider-config.tf
d1b0e16cfb7f8fa5004dc6494c0bbc73  ./.terraform/terraform.tfstate
60715e2ecfcf0d2885edd52f0a506cf5  ./.terraform/environment
9a11bb6c5701d7300a3c6ed8c33dc6dd  ./.terraform.lock.hcl
0507597169bd025f95803b0d1713f943  ./rds-combined-ca-bundle.pem
84a4bf3e807ebcd712ed17a7a9fd576b  ./rds-combined-ca-bundle.crt

None of these are present in the remote git repo...

arththebird commented 4 months ago

I don't have any directories under the "/tf" inside the provider pod with the modification date older than 1 hour.

I see the same thing, but I think @bobh66 theory about go-getter updating the timestamp makes sense because if I check the .terraform directory timestamp (which is not in the remote git repo), it doesn't get updated. My understanding is that deleting this folder is what was causing the checksum diff on every loop in the previous version.

project-administrator commented 4 months ago

These files should be ignored when checksum is calculated. But they are being taken into consideration and this breaks the checksum calculation:

19e9bd65283e39a75bd15d532eeb4a3a  ./.git-credentials
c115cb3d433d8fc4304272d851cc4694  ./crossplane-provider-config.tf
d1b0e16cfb7f8fa5004dc6494c0bbc73  ./.terraform/terraform.tfstate
60715e2ecfcf0d2885edd52f0a506cf5  ./.terraform/environment
9a11bb6c5701d7300a3c6ed8c33dc6dd  ./.terraform.lock.hcl
project-administrator commented 4 months ago

Also, does it make sense to calculate the checksum of the URL address only for the remote repository module source? Why do we need to fetch files each time to calculate the checksum? I mean, if the URL has not changed, then it must be the same TF workspace source, therefore there is no need to run terraform init if it has been run once before.

bobh66 commented 4 months ago

Also, does it make sense to calculate the checksum of the URL address only for the remote repository module source? Why do we need to fetch files each time to calculate the checksum? I mean, if the URL has not changed, then it must be the same TF workspace source, therefore there is no need to run terraform init if it has been run once before.

We need to pull the content of the repo (using git fetch/pull) to determine if the content of the repo has changed. We can't assume that the repo is static.

project-administrator commented 4 months ago

We need to pull the content of the repo (using git fetch/pull) to determine if the content of the repo has changed. We can't assume that the repo is static.

Do you think it's a good idea to introduce an optional parameter to control how the checksum is calculated? (repo URL vs repo contents). For example, we're always using the remote repo git URL with a tag. So there can't be any change unless the tag is updated, and that would be a different URL (as the tag is a part of the URL)

bobh66 commented 4 months ago

@project-administrator it's an interesting point - with the existing implementation if the remote URL changes it will now git pull the new repo into the existing repo directory, having removed only the .git directory. We would need to store the current remote URL somewhere (it is in .git/config) and compare it with the current spec.forProvider.module. If they are different then the entire directory should be removed so a clean git pull can occur.

Since we don't require the use of a ref it's entirely possible for people to use a repo that has changes being pushed into it. We have no way of knowing that happens so we need to run the equivalent of git fetch and git pull on every reconciliation.

I wonder if a source of RemoteOnce or something similar would be appropriate - it says to pull the remote content one time and never pull it again (assuming that it exists in the container). The existing Remote value is the equivalent of RemoteAlways - it's basically a pullPolicy for the remote repo.

Though that still doesn't help with the remote URL changing - that we would still need to check for explicitly.

@ytsarev - any thoughts?

ytsarev commented 4 months ago

@bobh66 I like the idea. We can create something like remoteSourcePullPolicy symmetric to the well-known https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy

bobh66 commented 4 months ago

Thanks @ytsarev - IfNotPresent and Always seem to make the most sense. I'm not sure that Never applies to our use case. Are there any other possible scenarios? I'm wondering what happens if the module changes when source is Remote but the pull policy is IfNotPresent - I assume we automatically assume the new module content is not present and pull it?

bobh66 commented 4 months ago

Also should we save the current Remote module value in status just to make it easier to compare with the current spec.module so we know if it changed without needing to read .git/config? It seems like that might be a more robust solution.