hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.6k stars 9.54k forks source link

[Feature] Module Variable and Output Inheritance (or simular) #31485

Open kderck opened 2 years ago

kderck commented 2 years ago

Current Terraform Version

Terraform v1.1.1
on darwin_arm64

Use-cases

It would be nice if you could reuse inputs, outputs from child modules without the need to duplicate code. Take a module like base-template it would be easier to maintain if when wrapping this module I didn't need to refine every input and output I wanted to pass to base-template.

module "base_template" {
 name = var.name
}

variable "name" {
}

output  "id" {
}
module "TemplatorService" {
 source = "../modulea"
 name = var.name
}

# Duplicated Code 
variable "name" {
}

output  "id" {
}

Attempted Solutions

While we can create a module per application we've found this to be tedious, and a lot of code gets duplicated in each application module.

Proposal

Module Inheritance where a Module call Inherit the inputs and outputs of another.

References

N/A

apparentlymart commented 2 years ago

Hi @kderck! Thanks for sharing this use-case.

I understand that you are seeking some way to reuse some definitions between different modules, but I'd like to understand more about how you imagine that working.

We don't typically consider what you shared to be "duplicate code" because each of those blocks is declaring something different: the variable "name" in your first block declares that the root module has a variable called name, and the variable "name" in your second block declares that the child module has a variable called name. There doesn't seem to be any redundancy here because Terraform has no way to guess based on the existence of one that the other one exists.

Terraform modules are most analogous to functions in general-purposes languages, and so module blocks are analogous to calling a function in a general-purpose language. Can you tell us about another programming language you are familiar with which has some way to share parameters and return types between functions in a way that you find useful and intuitive?

To help understand what I mean, here's some JavaScript code which I'd consider to have the same structure as what you shared here:

function main(id, name) {
    let result = baseTemplate(id, name);
    return {
        id: result.id,
    };
}

function baseTemplate(id, name) {
    return {
        id: "example";
    };
}

main("foo", "bar");

Both function main and function baseTemplate declare that they have arguments id and name; would you consider that to be "duplicate code" in the same way you defined it in this issue? If so, is there a different way you might write this to avoid that duplication?

I'm asking this just to try to get to the root of what you are asking about. You mentioned "module inheritance" but I'm not sure exactly what that would mean and so if possible I'd like to discuss it by analogy to something you've seen work successfully in another programming language, so that we can see a concrete design to start from when designing something equivalent in Terraform. If you're not familiar with JavaScript then I'd be happy to switch to another general-purpose language you are more familiar with and use that as the basis instead, as long as it's a language whose specification is public so that we will be able to study it and see how the features we're discussing are defined.

Thanks!

kderck commented 2 years ago

We don't typically consider what you shared to be "duplicate code" because each of those blocks is declaring something different: the variable "name" in your first block declares that the root module has a variable called name, and the variable "name" in your second block declares that the child module has a variable called name

I disagree, var.name would duplicated because it's just being passed to the base module. For example If I take the AWS EKS Module I want to create my own wrapper (or Facade) around that module for example I want to create, and provide a Kubernetes Cluster Module with some of the inputs being overrided I now have two options I can change the base template (the eks module) and I now need to fork, and maintain this as it updates, or I can copy and paste every input and output from the base module into my own Kubernetes module.

jtackaberry commented 2 years ago

I've seen this feature being requested in one form or another a handful of times now -- I know because I periodically search for it hoping it will one day materialize -- and it's always shot down. I'm not sure if the use case isn't properly understood or if there's just a philosophical disagreement between users and Terraform maintainers in this case.

We create a number of internal modules to our codify deployment best practices. Almost always, all I want to do is take the upstream module, override some defaults based on our internal standards, maybe augment with a couple additional resources, and wrap that up as an internally managed module.

If I want to surface all the capabilities of the upstream module to users of the internal module, the amount of boilerplate I need to create is pretty silly. Re-define each variable from upstream module in our own variables.tf, then pass through each one of those variables via name = var.name in the module block. Likewise for the outputs. For feature-rich upstream modules, this is quite a lot of boilerplate that I need to track and maintain as upstream evolves.

Some form of automatic inheritance described by the OP would significantly simplify the creation and ongoing maintenance of these "internal best practice" wrapper modules.

apparentlymart commented 2 years ago

The question from my earlier comment still stands: can folks share examples from other languages which allow using the signature of one function to define another, or something else that you find comparable to the idea of one module wrapping another?

In order to make progress here we either need to understand what patterns from other languages we're intending to emulate, or to justify why Terraform is different enough from other languages to justify doing something novel. Terraform's current design for module input variables and output values is modeled after function arguments and return values in other languages, and so I'm starting from that basis for this question but I'm open to discussing other models too, but I want to focus on concrete examples of existing language designs that we can study and learn from, rather than hypotheticals.

jtackaberry commented 2 years ago

The best analogy for me is inheritance in OOP: if there's a class that implements 99% of the functionality I want but I just want to augment a portion of its behavior or add some new capability, then I'll create and use a subclass.

This is rather contrived, but hopefully illustrative:

@dataclass
class TerraformAWSEKS:
    cluster_name: str
    cluster_version: str
    cluster_addons: list
    cluster_endpoint_public_access_cidrs: list
    eks_managed_node_groups: dict
    ...

    def __init__(self): ...
    def plan(self): ...
    def apply(self): ...

class MyCompanysTerraformAWSEKS(TerraformAWSEKS):
    def __init__(self):
        self.cluster_endpoint_public_access_cidrs = ['1.1.0.0/22', '2.3.4.0/24', '5.6.8.0/19']

I can then use MyCompanysTerraformAWSEKS and leverage the full capabilities of the upstream module (or class, in this example), where my custom subclass simply injects the necessary custom business logic, without having to redefine/dispatch everything defined in the parent class.

apparentlymart commented 2 years ago

Thanks for sharing that, @jtackaberry!

It sounds like you're suggesting that we consider an analogy between a class in a class-based OOP language (Python here, as an example) and a module in Terraform.

Given that, here's my sense of how the different aspects of a Python class might map to a Terraform module, based on what you've shown here.

Class-based OOP concept Terraform module concept
Public data members Input variables
Private data members Local values
??? Output Values
Constructor/initializer ???
Methods ???
Parent class ???
??? Resources
??? Nested modules

As you can see, I have some missing elements in my chart here. Do you have a specific idea of how you would fill those in? Also, if you disagree with some of the connections I made, please let me know that too!

In particular I wasn't really sure what to do with your constructors and your plan and apply methods. When I framed a module as being like a function, I was thinking of it being a function which generates a description of desired state for Terraform to consider rather than a function which actually takes actions in a remote system directly, and so I'm not sure how to incorporate "plan" and "apply" explicitly into this new analogy.

Here's the (ugly) source code for my table in case you want to copy it into a new comment and edit it:

| Class-based OOP concept | Terraform module concept |
|--|--|
| Public data members | Input variables |
| Private data members | Local values |
| ??? | Output Values |
| Methods | ??? |
| Parent class | ??? |
| ??? | Resources |
| ??? | Nested modules |

Bringing the concepts of classes and inheritance into play makes me immediately think of the classic Composition vs. Inheritance tradeoff. Terraform is currently exclusively composition-based (assuming we make an analogy to Terraform modules as the currently exist).

One concept in that realm that Terraform doesn't have is an explicit idea of "interfaces". I could imagine a hypothetical design where one module could take another module as an argument, as long as that module met some sort of predefined criteria for expected input variables and output values, which could then in principle allow e.g. passing a module that describes a network. But I'm wondering then how significantly that would differ from Terraform's current capability to pass an object representing the results of a module into another module, as discussed in module composition; in both cases it would give one module access to the results of another. One way it could differ is to say that if I pass the same module to two different modules and both instantiate it then I would have two copies of the infrastructure that module describes, which Terraform does not support today: today's Terraform would have you instantiate all of the modules up-top and pass just the resulting data between them.


Thinking about this is making me realize that I don't think I have a strong idea of what real problem is underlying this request. As a separate line of inquiry to this question about design patterns in other languasges, I'd also like to dig a bit more into why you both are wrapping modules into other modules, so that we can see if there's a different way to frame that problem that might lead to a clearer answer for how to solve it.

To return to my earlier example, I wrote out some JavaScript-like pseudocode showing one function calling another, which I had intended to be an analogy to one Terraform module calling another.

But what if instead the two modules were called at the root and the output of one was passed into the other?

function main(base) {
    return {
        id: base.id,
    };
}

function baseTemplate(id, name) {
    return {
        id: "example";
    };
}

let base = baseTemplate("foo", "bar");
main(base);

In this example, function main doesn't know anything about function baseTemplate... it just takes some data and acts on it. This is a contrived JavaScript-syntax reframing of what the Module Composition guide is encouraging for Terraform modules.

It might help if we try to move away from contrived examples containing just placeholders and talk about the design of real systems instead. Can one or both of you say a bit more about what real-world problems your pairs of modules are addressing, so that we can discuss the concrete tradeoffs of different design shapes rather than just writing hypothetical configurations/programs that don't really do anything? Thanks :grinning:

kderck commented 2 years ago

@apparentlymart Can't you generate templates with go? Can't see how this wouldn't be possible with Go aside the Language Paradigm. Like Lombok does with Java, or Data Classes.

kderck commented 2 years ago

@apparentlymart Hi - We want to be able to use the EKS Module and wrap it with our own logic for example we add Flux Provider. This allows easily create a Kubernetes Cluster with Flux Provided with less boilerplate. However we still need to define the cluster name, cluster vpc.... pass them to the eks module and add them as outputs so that other modules can read them.

apparentlymart commented 2 years ago

Hi @kderck,

Under our existing design recommendations in Module Composition, my first instinct for what you described would be to do something like this:

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
    }
    flux = {
      source = "fluxcd/flux"
    }
    kubernetes = {
      source = "hashicorp/kubernetes"
    }
  }
}

provider "aws" {
  # ...
}

module "eks" {
  source = "terraform-aws-modules/eks/aws"

  cluster_name    = "my-cluster"
  cluster_version = "1.22"
  # ...
}

provider "kubernetes" {
  # ...
}

module "flux-setup" {
  source = "./modules/flux-setup"

  eks_cluster = module.eks
  # ...
}

Notice that the eks_cluster argument in the module "flux-setup" block is the entire object represent the module "eks" results, and so it has all of the output values for that module, and can use them to describe the additional resources needed to complete setting up flux.

Using composition in this way, instead of nesting one module call inside another, means that the flux-setup module doesn't need to know anything about the inputs to the EKS module, and can instead just worry about whatever subset of its output values it needs to do its work.

With that said, I'm not really familiar with Flux so I can't be certain that there isn't something that needs to happen inside this hypothetical "flux-setup" module that requires using a nested call rather than composition. There's no real harm in using nested calls when the situation requires it, but we've found that the composition approach typically leads to less coupling between the modules and thus easier maintenance in future as the details of these two modules both change independently.

I'm not sure how generating templates with Go is related to what we're discussing here, but I'd be happy to consider that question more fully if you can show me an example of what you have in mind. I'm also not familiar with Lombok, but if you can show me an example of how could relate to Terraform and give me a link to where I can get more details in order to understand it then I'd love to dig in some more.

kderck commented 2 years ago

Yeah and if I want to change cluster_name for eks I now have to add an input, pass it to eks, and add an output. Even through it's already defined in the eks module. I'l have to to do that for every input that I'd like to change in the eks module.

apparentlymart commented 2 years ago

Hi @kderck,

Assuming you're talking about the example I shared in my most recent comment, notice that in my example module.eks and module.flux-setup are sibling modules, rather than module.flux-setup being a child of module.eks.

Inside ./modules/flux-setup you can define eks_cluster as accepting only exactly the subset of attributes your module needs:

variable "eks_cluster" {
  type = object({
    cluster_endpoint = string
  })
}

There is no need to redeclare any of the EKS module's input variables, because the module "eks" block uses those directly, and you only need to mention in this declaration the subset of output values from the EKS module that flux-setup will use.

It isn't clear to me why it would be necessary to redeclare the cluster name under that approach, since as far as I can tell the idea of a "cluster name" is something specific to the EKS API and isn't needed for setting up Flux, which seems to work with the Kubernetes API directly. If I'm missing something please let me know!

kderck commented 2 years ago

Please see: https://github.com/kderck/gh-31485. You can see if I compose two modules the vpc and eks module and I want to publish that as a module to be able to pass inputs, and receive outputs to the root module that created it I have to redefine all of the inputs and outputs that I want to use in the root module or sibling modules.

apparentlymart commented 2 years ago

Hi @kderck,

What I was asking, I suppose, is whether it's truly necessary to publish that wrapper module as a separate module, rather than just putting the module "vpc" and module "eks" blocks directly in the root module. The module composition guide recommends keeping things "flat" specifically because it avoids all of this extra boilerplate of declaring the union of all variables of the child modules you're wrapping.

I'd typically expect a useful module to encapsulate additional information, rather than to just pass through the full set of inputs and outputs from what it's wrapping verbatim, since otherwise the wrapper isn't really adding much to justify the development and maintenance cost of its existence.

Of course I realize that you're trying to propose a feature that would lower the cost of maintaining it, but I'm also asking you to consider whether this informationless wrapper module has sufficient value to justify a significantly more complex Terraform language. I'm not saying it isn't, because of course I can only see the contrived subset you shared rather than the real thing, but I think justifying the value of this sort of wrapping is the crux of justifying the language complexity cost of supporting it.

kderck commented 2 years ago

Yes - I don't want to be able to have to update a vpc and eks module across every environment I have, I want to be able to make a change and have that reflected by running terraform apply and having the new module version downloaded (or even versioned controlled by tag). I think the module "vpc" and module "eks" blocks directly in the root module. just encourages copy and paste infrastructure. It's worth saying we follow: https://terragrunt.gruntwork.io/docs/getting-started/quick-start/

kderck commented 2 years ago

We also want to be able to provide these modules to Developers. Who may not know how to connect these up at the root module level, or even adequately secure it.

NunoMaga commented 1 year ago

i just found this thread. cool discussion. just wanted to add the usecase that brought me to it. planning cluster setup for example i define modules for the components i.e aks+blob container with this file structure:

infra
├── deployed
│   ├── dev
│   │   ├── backend.tf
│   │   ├── cluster.tf
└── modules
    ├── base
    │   ├── aks
    │   │   ├── aks.tf
    │   │   ├──  ...
    │   │   ├── outputs.tf
    │   │   ├── variables.tf
    └── preset 
        ├── dev
        │   ├── aks
        │   │   ├── aks.tf

with modules/base/preset/dev/aks

module "aks" {
  source = "../../base/aks" #TODO move to artifactory
  location = "East US"
  worker_vm_size = "Standard_D4s_v3"
  tags = {
    department  = "Engineering"
    environment = "Dev"
  }
}

and deployed/dev/cluster.tf

module "lego" {
  source = "../../modules/preset/dev/aks" 
  name     = "lego"
  }
}

instead of

module "lego" {
  source = "../../modules/base/aks" 

  name     = "lego"
  location = "East US"

  worker_vm_size = "Standard_D4s_v3"

  tags = {
    department  = "Engineering"
    environment = "Dev"
  }
}

for every cluster instance

this allows me to after maintain different presets of clusters for different needs, different versions of k8s, sizes. and update one of the types without breaking all clusters that did not overwrite those values or having to update all cluster instances manually

P.s.: did not look into it much so might be saying nonsense. but the way i see the modules is very recursive. so since the recursive language i've loosely studied is haskell i found this:

https://stackoverflow.com/questions/42784928/what-is-the-haskell-equivalent-of-an-interface hope it helps

kderck commented 1 year ago

Would anybody be in favour in copying this to #OpenTF, now #OpenTofu.