Buildingblocks with collie-hub and collie-cli

JohannesRudolph commented 6 months ago

This issue collects various improvements for a simple and consistent workflow to deal with assembling tenants from buildingblocks using a pure IaC workflow using collie. Common examples of building blocks needed in a stage 2 (CFMM) cloudfoundation are a tenant building block (think subscription/account/project) that enforces tags + IAM role model, a budget alert (simple) and a hub+spoke vnet (complex).

The goals for the approach we want to support with collie are

plain and simple IaC composition using terraform: no meta-templating or file generation
isolation: building blocks should have well-defined inputs and outputs that enables them to be deployed and operated by teams outside the cloud foundation (this is common e.g. for DevOps toolchains and advanced network capabilities). This is also good practice for security, since automation will typically need to perform high privileged operations
automation-friendly: provide a clear ugprade path for plugging building blocks into a GitOps workflow or meshStack for automation

Concept: Building Blocks

In terragrunt parlance, building blocks are "shared service modules"

A Terraform module that is designed to be standalone and applied directly. These modules are not root modules in that they are still missing the key blocks like backend and provider, but aside from that do not need any additional configuration or composition to deploy

Concept: Backplanes

Backplanes add what's necessary to successfully deploy a building block to a tenant in your landing zones. From the perspective of an application team, a building block provides some sort of capability to their application's cloud environment, e.g. an on-prem connected spoke VNet. This spoke VNet however is only the "tip of the iceberg". Application teams can't (and should not have to care about) all of the mechanics that make this spoke VNet work, like the hub network, WAN connection, IPAM and firewall rules.

We will call everything that makes the application-team visible part of the building block work the "backplane". The backplane always includes

terraform backend (state storage)
terraform provider configuration (authentication and authorization)
any infrastructure shared between different building blocks (e.g. the hub network)

Packaging Building Blocks in collie-hub

We will settle for the following structure for packaging building blocks in collie-hub

kit/$platform/buildingblocks/$block/backplane/{README.md, main.tf, ...} building block backplanes will be normal kit modules
kit/$platform/buildingblocks/$block/buildingblock/{README.md, main.tf, ...} building block modules will be plain terraform modules
- we might want to let building blocks come with their own metadata (e.g. an icon.svg) that can be used for self-service UIs and collie documentation
- by convention, we let buildingblock backplanes emit a sensitive config_tf output. This output should contain provider and (optionally?) backend configuration block. This is the "missing" configuration required to deploy a building block and can be injected e.g. by terragrunt or other terraform automation
[x] collie-cli must learn to ignore **/buildingblock/README.md files when trying to parse configuration objects from the repo or learn to treat buildingblocks as its own concept
[x] some BBs will have a very advanced backplane, e.g. the connectivity block requires the hub deployed from azure/kit/connectivity - should we go the trouble of always defining a dedicated backplane even if we could reuse the bootstraped SPN?

Deploying Building Blocks

Deploying building blocks to a cloud tenant becomes a simple terraform composition in a main.tf that invokes the building blocks as plain terraform modules like

module "subscription" {
  source = "github.com/likvid-bank/likvid-cloudfoundation/kit/azure/buildingblocks/subscription/buildingblock"
  subscription_name       = "glaskugel"
  parent_management_group = "likvid-corp"
}

module "connectivity" {
  source = "github.com/likvid-bank/likvid-cloudfoundation/kit/azure/buildingblocks/connectivity/buildingblock"

  providers = {
    azurerm.spoke = azurerm
    azurerm.hub   = azurerm.hub
  }

  location = "germanywestcentral"
  hub_rg   = "hub-vnet-rg"
  hub_vnet = "hub-vnet"

  name          = "glaskugel"
  address_space = ["10.1.0.0/24"]
}

[x] as shown here, a tenant may need multiple building blocks that each come with their own provider/backend config. Providers can be aliased to disambiguate, but there must only be one backend per tenant... this is at odds with simple tenant composition via module calls in a main.tf

Testing Building Blocks

Once we have deployed a building block backplane, it's very useful to ensure that this backplane can successfully deploy the building block in isolation. This can be achieved with terraform test and terragrunt like so

dependency "buildingblock" {
  config_path = "../budget-alert"
}

dependency "glaskugel" {
  config_path = "../../tenants/glaskugel"
}

generate "config" {
  path      = "config.tf"
  if_exists = "overwrite"
  contents  = dependency.buildingblock.outputs.config_tf
}

terraform {
  source = "${get_repo_root()}//kit/azure/buildingblocks/budget-alert/buildingblock"
}

inputs = {
  subscription_id = dependency.glaskugel.outputs.subscription_id
  contact_emails = "foo@example.com, bar@example.com"
}

open issues

[x] it's using config.tf which works very well here, but is not similar to collie-style tenant composition (see above)
[x] we can't test backplane's backend configurations that way

Note: some of what's discussed in this issue here (especially the design principles) should end up on the collie documentation

JohannesRudolph commented 6 months ago

some BBs will have a very advanced backplane, e.g. the connectivity block requires the hub deployed from azure/kit/connectivity - should we go the trouble of always defining a dedicated backplane even if we could reuse the bootstraped SPN?

I'd say yes, we should always have dedicated backplanes. See https://github.com/meshcloud/collie-hub/issues/109 which would mean that the kit modules in collie-hub up to CFMM L2 don't come with an automation solution. Building Blocks are certainly an implementation of CFMM L3 Modular Landing Zones. We can therefore reasonably expect putting SPNs etc. on the learning curve for a CFT.

JohannesRudolph commented 6 months ago

Summary of an internal design session we've had with @florianow and @felixzieger

Decisions

We assume shared backends are the norm for platform teams owning a set of building blocks. Backend authentication is orthogonal to provider authentication.
We will pursue using terraform/opentofu test for "unit testing" building blocks. We may introduce a dedicated collie foundation test command to simplify this workflow
Backplanes shared on collie-hub will not include backends and principals. The common interface is an input set of prinipal id's to grant permissions to. This allows both human-in-the-loop (e.g. platform engineers) and automation use cases.
We will MVP this workflow in likvid-cloudfoundation and if that MVP is successful transplant this to collie-hub

JohannesRudolph commented 1 month ago

We have successfully landed a buildingblock design pattern in likvid-cloudfoundation that we are fairly happy with

meshcloud / collie-hub