upbound / provider-terraform

A Crossplane provider for Terraform
https://marketplace.upbound.io/providers/upbound/provider-terraform/
Apache License 2.0
151 stars 59 forks source link

Proposal: Add RemoteModulePullPolicy #280

Open bobh66 opened 3 months ago

bobh66 commented 3 months ago

What problem are you facing?

This is a continuation of the discussion in #276 regarding some remaining holes in the logic used for remote modules.

The current implementation pulls the remote source on every reconciliation because there is no other way to know whether the content has changed. This also implicitly handles changes to the spec.ForProvider.Module field if the remote module source location is changed.

This is inefficient because it retrieves the remote source whether or not it has changed and may trigger a terraform init to run unnecessarily.

How could Official Terraform Provider help solve your problem?

This proposal adds a new RemoteModulePullPolicy attribute to the spec which allows the user to specify whether the remote source should be pulled on every reconciliation (Always) or if it should only be pulled if it doesn't already exist (IfNotPresent).

This proposal also adds a new status field status.AtProvider.RemoteSource to track the value of the remote source that was last retrieved.

If the RemoteModulePullPolicy is set to Always the behavior is the same as it is today. This should probably be the default, though it remains an inefficient implementation.

If the RemoteModulePullPolicy is set to IfNotPresent then the code can check for the presence of a .terraform directory in the local workspace directory, and if it exists skip the remote pull. The .terraform directory is created by the terraform init command which would have run on a previous reconciliation.

If the .terraform directory is not present, we could just pull the remote source, or we could analyze the spec.ForProvider.Module URL to determine the name of the target directory and see if it exists. If the remote URL is a git repository (starts with git::, ends with .git or has a ?ref= component) we could look for a .git directory.

All of that seems like a lot of extra effort for what is likely an edge case (.terraform exists but the remote module does not) if it is even possible at all.

I propose the following:

switch cr.Spec.ForProvider.Source {
    case v1beta1.ModuleSourceRemote:
        switch {
        case cr.Spec.ForProvider.Remote == v1beta1.RemoteModuleIfNotPresent:
            // If .terraform directory exists then terraform init must have been run so the remote source must exist
            if b, err := c.fs.DirExists(filepath.Join(dir, terraformStateDir)); b == true && err == nil {
                break
            }
                fallthrough
            // The remote module could be an http directory or a git repository.  In the http case if the directory
            // exists and is not empty
        case cr.Spec.ForProvider.Module != cr.Status.AtProvider.RemoteSource:
            fallthrough
        case cr.Spec.ForProvider.Remote == v1beta1.RemoteModuleAlways:
            gc := getter.Client{
                Src: cr.Spec.ForProvider.Module,
                Dst: dir,
                Pwd: dir,

                Mode: getter.ClientModeDir,
            }
            err := gc.Get()
            if err != nil {
                return nil, errors.Wrap(err, errRemoteModule)
            }
            cr.Status.AtProvider.RemoteSource = cr.Spec.ForProvider.Module
        }
    case v1beta1.ModuleSourceInline:
        if err := c.fs.WriteFile(filepath.Join(dir, tfMain), []byte(cr.Spec.ForProvider.Module), 0600); err != nil {
            return nil, errors.Wrap(err, errWriteMain)
        }
    }

@ytsarev @negz any thoughts?