RFD - Move Nebari infrastructure code from HCL to python using terraformpy

Status	Open for comments 💬
Author(s)	@viniciusdc
Date Created	13-03-2023
Date Last updated	13-03-2023
Decision deadline	--/--/--

Summary

Nebari heavily depends on terraform to handle all of our IaC needs. While HCL (the .tf files) is a great language for describing infrastructure, it is not the best language for writing code where multiple ecosystems are involved. We can see such cases where adding a simple new feature requires us to sometimes re-write the same piece of code multiple times in HCL (e.g the variables that are used across different modules)

Our main code that handles most of the execution of the terraform binaries is already written in python (a subprocess is responsible to run terraform plan and terraform apply), as well as almost all of our interactions within the already deployed cluster during testing is also done in python. Due the complexity of our ecosystem having such situations where we need to write a lot of HCL code to handle the edge cases that we have is not only time consuming but also error prone. In this RFD I would like to suggest moving our infrastructure code to python using terraformpy to make it easier to maintain and extend.

Benefit

There are multiple benefits to this change:

Easier to maintain and extend the codebase as we can use the full power of python to write the code
Easier to test the code as the function would then be easier to import and test
Python would grant us more flexibility when adding new features as we would then be able to point to a terraform resource as an object and then call its methods to do the required changes (no need for extra variables and output to move data around)
Parsing our code-base would be easier.
- As a quick example on how that would benefit use: Right now all helm charts are using the helm provider for quick deployment, which is wonderful for the deploy perspective... though linting the files and keeping track of the version updates is really complex as we would require inspecting all files in the repo tree and use some regex to identify the charts. If we move to python we can import the helm provider and then call its methods to get the list of charts and their versions (or save then under a list to be exported somewhere else). Which would make it easier to keep track of their versions and also to update them in other tests (e.g., the upgrade test -- eval broken Bitnami charts --)
Easier to get people onboarded to the project as they would not need to learn HCL to contribute to the project.

Drawbacks

We would need to rewrite all of our codebase to python using the terraformpy library
Requires some time to get used to the new syntax
Re-think how we would call each stage of the deployment (though I think this migration would be not as terrible)
The terraformpy library has few updates in the last two years, but it is still maintained.

Approaches considered (if any)

Right now to write a simple new variable, we need to do something like this:

# in the variables.tf file in the main.tf root directory
variable "my_var" {
  type = string
  default = "my_value"
}
-----------------------
# in the main.tf file in the main.tf root directory
module "my_module" {
  source = "./my_module"
  my_var = var.my_var
}
-----------------------
# in the main.tf file in the my_module directory
variable "my_var" {
  type = string
}
# in the variables.tf file in the my_module directory
variable "my_var" {
  type = string
}

And we also need to make sure we are passing it over to input_vars.py. This is a lot of code to write for a simple variable that we need to pass over to a module. (image when we need to pass outputs to different stages)

With python we would instead have a function that received the vars as input and passes it over to the correct module under its hood. This would make the code much easier to maintain and extend. For example:

from terraformpy import Module
from .vars import my_var

def pass_vars_to_module(my_var):
    Module(
        source="./my_module",
        my_var=my_var
)

That's it, of course, this example is very simple and do not take in consideration the full complexity of the codebase, but I think it would be a good starting point to see how we can simplify the codebase.

User impact

The user would not see any changes in the way they usually interact with the project, though this would be a breaking change for the project as we would need to rewrite all of the codebase to python.
Our CI tests would be more reliable as we could test the code in a more isolated way.

nebari-dev / governance