Clarification around variable handling in new Fleet project

strophy commented 2 weeks ago

I'm running into a lot of confusion while trying to handle variables in a gitops repo with the following structure:

$ tree -L 3 --dirsfirst
.
├── backend
│   ├── templates
│   │   ├── configmap.yaml
│   │   ├── deployment.yaml
│   │   ├── external-secret.yaml
│   │   ├── httproute.yaml
│   │   └── service.yaml
│   ├── Chart.yaml
│   └── fleet.yaml
├── cert-manager
│   ├── templates
│   │   ├── cluster-issuer.yaml
│   │   ├── fleet.yaml
│   │   └── issuer.yaml
│   └── fleet.yaml
├── embedded
│   ├── templates
│   │   ├── configmap.yaml
│   │   └── deployment.yaml
│   ├── Chart.yaml
│   └── fleet.yaml
├── emqx
│   ├── routes
│   │   ├── fleet.yaml
│   │   └── httproute.yaml
│   └── fleet.yaml
├── external-secrets
│   ├── templates
│   │   ├── basic-secret-store.yaml
│   │   ├── fleet.yaml
│   │   └── ghcr-io.yaml
│   └── fleet.yaml
├── frontend
│   ├── templates
│   │   ├── configmap.yaml
│   │   ├── deployment.yaml
│   │   ├── httproute.yaml
│   │   └── service.yaml
│   ├── Chart.yaml
│   ├── fleet.yaml
│   └── values.yaml
├── influxdb
│   ├── routes
│   │   ├── fleet.yaml
│   │   └── httproute.yaml
│   ├── templates
│   │   ├── external-secret.yaml
│   │   └── fleet.yaml
│   └── fleet.yaml
├── redis
│   └── fleet.yaml
├── telegraf
│   ├── fleet.yaml
│   └── telegraf.yaml
├── traefik
│   └── fleet.yaml
├── aws-ssm-secret.yaml
├── config.yaml
├── fleet.yaml
└── repo.txt

The repo consists of a mixture of external Helm charts for products like EMQX, InfluxDB, Traefik, etc., and our own code deployed as Helm charts like frontend, backend, embedded, etc. The infrastructure will be deployed to multiple different clusters, and I need to be able to define different config for each cluster. I want to eventually use external-secrets with the aws-ssm-secret.yaml file (which is in .gitignore and not checked in to the repo) to dynamically pull secrets from AWS SSM Parameter store to configure each target cluster without storing any secrets in git. For this reason I want to keep the repo as DRY as possible, so for example the EMQX config required in multiple locations below should reference a single location for configuration:

# telegraf/fleet.yaml
namespace: myproject
dependsOn:
  - name: myproject-emqx
  - name: myproject-external-secrets-templates
helm:
  repo: https://helm.influxdata.com
  chart: telegraf
  version: 1.8.54
  releaseName: telegraf
  valuesFiles:
    - telegraf.yaml

# telegraf/telegraf.yaml (excerpt)
tplVersion: 2
config:
  inputs:
    - mqtt_consumer:
        client_id: "gateway_mqtt_v2_control"
        data_format: "value"
        data_type: "float"
        password: "emqx_s3cret" # template
        username: "emqx_user" # template
        servers:
          - "tcp://emqx:1883" # template

# backend/templates/configmap.yaml (excerpt)
apiVersion: v1
kind: ConfigMap
metadata:
  name: backend
data:
  GW_EMQX_CLIENT_PASSWORD: emqx_s3cret # template
  GW_EMQX_CLIENT_USERNAME: emqx_user # template
  GW_EMQX_HOST: emqx
  GW_EMQX_PORT: '1883' # template
  GW_EMQX_PROTOCOL: tcp

I have been struggling to understand how I can define values like 1883, emqx_user and emqx_s3cret in one central location and have the various bundle dirs access that value. I've read #671 and #1164 and documentation and fleet-examples repo exhaustively but I still cannot understand what the intended approach is here given the architecture of Fleet. Should I:

Create a configmap at the top level with a single values.yaml key, store the entire config for all apps in it as a block, and read it in each bundle with valuesFrom?
Declare helm.values in the top-level fleet.yaml file and read the values from here into sub-level fleet.yaml files with helm.values.emqx.username: ${ .Values.emqx.username }?
Try and use targetCustomizations to do the same thing? I would like to avoid spreading targetCustomizations across many files if possible, to reduce maintenance burden when e.g. adding a new target cluster
Try and use Helm templating in e.g. backend/templates/configmap.yaml, declare helm.values in the relevant fleet.yaml and access values with GW_EMQX_CLIENT_USERNAME: {{ .Values.emqx.username }}?
Create the values in spec.templateValues at the cluster level (how??) and access them with ${ get .ClusterValues "emqx.username" }? Or do something similar with labels/annotations as described here?
Something else?

I would greatly appreciate more extensive documentation and an example of how to handle sharing values in the fleet-examples repo, as it was very helpful to get started with Fleet but is lacking for a beginner when attempting anything slightly more complicated.

More generally, does the structure of the repo above look logical, or have I created more problems for myself with this structure? Is it normal to have so many fleet.yaml files at all levels, or have I misunderstood something about how Bundles are created? Would it be possible/better to have only one fleet.yaml file at the base level and somehow have it configure everything else as Helm subcharts?

Thanks for any help, I tried asking on Rancher Slack first but didn't receive any response there, so trying here.

strophy commented 1 week ago

As a concrete example of something that is simple in my head but apparently difficult to implement, I have two sets of files as follows:

# embedded/fleet.yaml
defaultNamespace: bioapp
helm:
  values:
    tcp_port: 1883
    username: emqx_user

# embedded/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: embedded
data:
  IL_MQTT_PORT: '{{ .Values.tcp_port }}'
  IL_MQTT_USERNAME: {{ .Values.username }}

The above works fine. Then I try and do the same thing in the root of the repository, :

# fleet.yaml
defaultNamespace: bioapp
helm:
  values:
    emqx_tcp_port: 1337
    username: top_level_emqx_user

# config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-config
data:
  emqx_tcp_port: '1337'
  emqx_client_user: {{ .Values.username }}

This fails with ErrApplied(2) [Cluster fleet-default/dev: error while running post render on files: yaml: invalid map key: map[interface {}]interface {}{".Values.username":interface {}(nil)}]. This is confusing to me, I don't understand what all the interface is about since I am not passing Go functions here, I'm just trying to set a key to value. I suspect it is because Fleet is processing the two sets of files in different ways, the embedded example as a Helm template and and the root example as a Fleet/raw template, but it is not clear to me if the error is coming from Fleet or Helm, or what I should do to fix it. I have tried templating in different places with ${} and {{ }} syntax, tried using Sprig templating commands like ${ get .Values.username } and endless other iterations. What would really help me would be if Fleet would make it clear in the UI or Bundle logs what "mode" is being used to process a particular bundle, assuming there are 5 different ways of assembling the Helm resources as described under https://fleet.rancher.io/gitrepo-content#how-repos-are-scanned

My eventual goal is to have the entire config in a single configmap with nesting so that I can use valuesFrom and choose only the relevant key from the main configmap so the other sub-services deployed by fleet can read their specific config. But I'm not sure if this is an anti-pattern, since the examples only show reading yaml block scalars explicitly named values.yaml and not actual yaml objects. Even the most basic examples in the fleet-examples repo showing how to pass variables around would be extremely welcome, as well as more logs and more documentation. Thanks for any help!

skanakal commented 1 week ago

try quote the values emqx_client_user: '{{ .Values.username }}' for more information on templating: https://fleet.rancher.io/ref-fleet-yaml#templating

strophy commented 1 week ago

Thanks a ton @skanakal that resolved the error! But it raises a few more questions:

How was it clear to you from the error message that quoting was the issue?
Why is it necessary to quote the templated segments in the second example at the repo root, but not in the first example in the embedded directory?
The documentation you linked is about the ${ } templating style, not {{ }}. Is the latter handled by Helm or Fleet?
Where are the quoting rules documented? Searching the docs site results in:

rancher / fleet

Clarification around variable handling in new Fleet project #2929