Named, reusable type constraints

chrispytoes commented 2 years ago

I think it would be useful to have type aliases in Terraform, similar to TypeScript. I have a module that takes in a large map of objects as an input variable. Currently, I have to copy/paste this entire declaration across each submodule where it's needed, and keep them all up to date if changes are made. It would be nice if I could just define this variable type in one spot so it does not need to be duplicated.

To clarify, I'm not looking for global variables, just the ability to store a type declaration under a defined name, and reuse it for other variables in submodules.

Here's an example of what a type alias and its usage might look like:

type "my_custom_type" {
  type = map(object({
    foo = string
    bar = string
  }))
}

variable "config" {
  type = my_custom_type
}

As for accessing these custom types from submodules, I'm not 100% certain if this would work, but my thought is that submodules could reference custom types via <unique_module_path>.<type_name>.

apparentlymart commented 2 years ago

Hi @chrispytoes! Thanks for sharing this use-case.

When considering this sort of thing before, there have been a few considerations. I'm not sharing these with the intention of immediately rejecting what you are proposing, but just to add some extra context to weigh as we think about how and whether to address this.

The Terraform language type system is a structural type system rather than a nominal type system, which means that Terraform considers two types to be compatible if they have the same structure, without any consideration of what they are named.

In particular, that means that in the following situation Terraform would consider the two named types to be exactly equivalent to one another, even though declared with separate names:
```
# INVALID: a hypothetical example of a design proposal under discussion

type "example_1" {
  type = object({
    a = string
  })
}

type "example_2" {
  type = object({
    a = string
  })
}
```
This is not disqualifying by any means: it's totally reasonable to allow naming types as a way to reuse them without also using the names as part of the type identity. You referred to TypeScript in your writeup and so I assume you're already aware that TypeScript's type system is a great example of that, as we can see in Type Compatibility.

The question that I think warrants further research here is whether having named types in a structural type system is intuitive enough, and in particular whether it might cause users familiar with arguably-more-common nominal type systems to make incorrect assumptions about how type constraints will behave.
It's not clear how or whether named types across module boundaries should work. You also astutely touched on this in your writeup, and this has honestly been the main hangup in previous iterations of researching this.

Firstly, there is the question of what the syntax might look like to refer to a type declared in another module, as you noted. The syntax you proposed is indeed the most "obvious" initial choice, but it isn't clear to me that it would be appropriate to overload our current conception of modules to also include the definition of types, because we can see in other programming languages with named types that there are often quite complex relationships between packages/modules/etc (depending on the terminology in each ecosystem) even when considering only the type-reference dependencies. It typically becomes a dependency graph, whereas Terraform modules today strictly form a tree.

Secondly, there is a more tricky concern that so far we've intentionally designed the language so that it's possible to interrogate the public interface of a Terraform module with only the source code of that single module. That design goal is then relied upon by other systems, such as the auto-generated Terraform Registry module API documentation. Allowing a module to define a variable's type constraint only indirectly from one in another module would require consulting the entire module subtree beneath in order to obtain that information, and would potentially allow the apparent API of a module to change even though that module's source code hasn't change at all, in the case where the downstream module changed from under it.

From a more wonky/philosophical perspective, I should also acknowledge that our Terraform language design principles tend to take inspiration from the learnings/proverbs of the Go language, which for this case makes me consider "a little copying is better than a little dependency". I think it a lot of cases programmers (myself included) get itchy when writing down the same information in multiple places, but "don't repeat yourself" is not a magic bullet itself and the idea of "reuse" it encourages has various tradeoffs of its own that we must consider.

I described one particular example of that in my second point above, but I wanted to mention the more general philosophy to as some context for how we convinced ourselves that broadly speaking it's not a significant burden to state inline the expected type constraint for a variable, even if it happens to currently exactly match one written in another module elsewhere. One major benefit of doing so is the variable declaration being entirely self-contained, which is helpful for both human readers and for automated systems like Terraform Registry. Another benefit is retaining the relative simplicity of modules together forming a tree, rather than a graph.

A notable downside, though, is that as your system evolves you may find yourself wanting to update many of these all at once. In practice (with now several years of experience with this new type system) we've found that this burden typically isn't as significant as we might've expected, particularly if authors follow our design advice such as Module Composition where each module is tailored to solving one problem well and, in particular, only declares the subset of information it actually needs. (Note that this seems contrary to your underlying problem statement: it sounds like you are instead intending to pass the same big bag of information to all modules, where presumably each module picks out only the subset of data it needs. I don't mean to imply that this approach is objectively bad, but only to call out that there are different design approaches that Terraform's current language design is better optimized for.)

The practical concerns are the more important thing to get to the bottom of than the philosophical ones, but all of the concerns are interconnected in that any language design philosophy will encourage certain patterns and discourage others, and will enable certain technical behaviors and prevent others. An important part of language design is carefully-considered principles so that the language can be cohesive and have all of its features tessellate well rather than conflict with or overlap one another.

With all of that said, I want to reaffirm that I'm not intending this comment as any sort of immediate rejection, but just as context from previous discussions so that we can revisit and consider whether our original assumptions still hold, and thus whether we should revisit the conclusions we drew from those assumptions. (In particular, several members of our team were not on the team yet when we previously visited this topic, and so I hope this will be useful context for them if we decide to revisit it.)

In order to move the discussion forward, it would help if you could share a more "real-world" example of what one or more of your shared types might look like. While I do of course understand what you are proposing in an abstract/generalized sense, it's much easier to make design tradeoffs with real examples than contrived ones.

Thanks again!

chrispytoes commented 2 years ago

@apparentlymart Thanks for the detailed response! I completely understand where you're coming from in that modules should be fully self-contained.

I know this is a far-fetched idea but I'm wondering if, since Terraform has the concept of Modules and Providers, what if there were a third type of thing called a Library?

A "Library" could be separate from a Module, and just provide definitions of variables, types, etc, that a Module could explicitly define as a dependency. Could this open up the ability to add more features that couldn't be previously done with modules?

apparentlymart commented 2 years ago

I have previously pondered a similar thing when thinking about the subject of reusable user-defined functions, which end up causing some similar complications about dependency graph vs. call tree.

It could be interesting to consider that, but as I'm sure you can imagine adding an entirely new kind of externally-installable artifact is a heavy lift and a lot of new conceptual overhead for users, so I expect we'd do it only if they payoff were very high and with a considerable design effort to make sure it would tesselate well with Terraform's existing extension points.

chrispytoes commented 2 years ago

@apparentlymart Absolutely. It's not worth implementing if type constraints are the only benefit to it. I was just putting that out there in-case there were other uses-cases for it that you could see.

I'll probably rework my modules eventually so I won't need to do this copy/paste in the first place. It's not a big deal compared to the burdens the alternatives may create.

You mention user-defined functions though. I wonder if they could be implemented with a simple embedded interpreter like tengo. Tengo functions could be defined in a Library only, so as not to create too much mix-up in the dependency graph. Additionally, Libraries would only be allowed to depend on other Libraries, making all the functions from its imported libraries available from Tengo as well.

This would at least add a more substantial use-case for my Library concept, but I understand it's a tall order and I'm getting off topic now. You can close this issue if you'd like.

apparentlymart commented 2 years ago

We're getting a bit into the weeds with general programming language stuff now :grinning: but yeah, some sort of language for defining functions was what I was meaning there. There are the usual tradeoffs to be made about what principles/paradigms such a thing would follow in order to fit in well with the rest of Terraform -- functional vs. imperative, explicitly-typed vs. inferred, etc, etc. I'm sure we'll explore that more eventually, and if you're interested in following along and/or participating when the time comes I think #21124 is the issue currently representing that.

(The writeup there proposes functions embedded directly in a module in a similar way as you'd proposed named type declarations embedded in a module, which of course has similar tradeoffs as we discussed here and so a final solution to that might end up looking quite different if we were to dig in to the design some more, as might be the case for this one.)

crw commented 2 years ago

I am going to accept your suggestion @chrispytoes and close this issue. :D Thanks for the feature request and the great discussion!

github-actions[bot] commented 2 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

hashicorp / terraform

Named, reusable type constraints #30386