Proposal: `module.kubernetes.configmap` to load modules from configmaps

captncraig commented 1 year ago

Background

This is a competing proposal to grafana/alloy#457, which is a similar idea, but with a dedicated CRD for modules. It is different enough that I think it deserves its own issue.

The goal is to allow modular configs in kubernetes environments. My target use case is that an app team may wish to deploy some amount of agent configuration alongside their app. I expect the most common modules will be:

Scrape targets for a given environment (discovery.relabel -> prometheus.scrape)
Exporter for something in the environment (prometheus.exporter.redis -> prometheus.scrape)

If we can make the experience useful for those use cases in particular, I would be very happy.

Proposal

I propose we create a component module.kubernetes.configmaps that will:

Watch for any configmaps with a specified label selector (flow.grafana.com/module=true)
For each discovered config map, load contents as a single flow module
Manage the modules lifecycle

Module Semantics

When a config map is located, we will concatenate any data fields (possibly filtering to *.river?), and parse it as a single module, named after the namespace and name of the containing configmap.

Any exports from the module will be completely ignored, and inaccessible to the rest of the system.

Example configmap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: test-module
  namespace: monitoring
  labels:
   flow.grafana.com/module: "true"
data:
  scrape.river: |
    argument "receiver" { }
    discovery.kubernetes "k8s_pods" {
      role = "pod"
      kubeconfig_file = "k3s.yaml"
    }
    prometheus.scrape "scrape" {
      targets = discovery.kubernetes.k8s_pods.targets
      forward_to = [argument.receiver.value]
   }

Arguments

Arguments are tricky. My proposal would be that the root component has a list of arguments like any other module component:

module.kubernetes.configmaps "dynamic-config" {
  arguments{
    receiver = prometheus.remote_write.grafanacloud.receiver
    pod_targets = discovery.kubernetes.pods.targets
    clustered = true
  }
}

These arguments comprise everything a discovered module may use, and allow the agent's operator to define what capabilities are available to child modules from the base config. I do not think it should be an error if a module does not use all of the supplied arguments. This is a change from the current module semantics.

Changes to flow

This is the first module loader that can load multiple modules. All the others have assumed module per component exactly. This component's primary role is tracking the list of submodules and keeping them in sync. module.kubernetes.configmaps will be healthy if it is successfully watching configmaps. Health of individual submodules is up to them.
As discussed above, I'd like to introduce optional non-strict validation of module arguments. It would only be used for module loaders like this that manage a set of nested submodules. If a submodule has a non-optional argument that is not present on the root module, that is clearly an error. But I do not think it should be an error if a module does not use all arguments supplied. Maybe a given module does not need a metrics receiver, or a list of targets, or whatever.

Implementation

The quick and dirty way to implement this would be for module.kubernetes.configmaps generates a single string for its' own module, that creates a module.string sub-component for each configmap it finds. This would work fine, but I'm worried it could be a little confusing in the ui, but maybe not.

Alternatively, could make a new base type for "mult-module" components that manage a list of child modules, but that feels like it is replicating a bit of the flow controller. Simple may be a place to start.

Polling vs Watching the k8s api? I'm a little sensitive to the number of watches we are putting on the apiservers. I would be fine doing a polling solution to make the code simpler. But I don't have strong opinions here.

Why not `AgentModule` CRD?

grafana/alloy#457 proposes a similar idea with a dedicated AgentModule crd. I don't like that for a few reasons:

Making crds is annoying. We have tooling in place, but defining the type and generating the definitions and clients is a fair bit of work.
Deploying crds is annoying. Does it go in the helm chart, or does it need to be deployed manually? What if it isn't there or there's a conflict between two versions?
ConfigMaps are built-in and easy to use. I'd argue this use case is not incompatible with the semantics of what a configmap is for.
Helm users could have issues trying to put files into content of crds.
It may be awkward for app teams to create an instance of a new, unknown CRD, but anybody can drop a configmap.

petewall commented 1 year ago

I think we need to make sure that if there are multiple instances of an Agent deployed to the cluster, they're able to find the right ones. For example, I could see a multi-tenancy situation where an agent is deployed in different namespaces for different tenants. We'd want to ensure that Agent A will discover ConfigMaps limited by namespace or something. And Agent B can get the config maps meant for it.

captncraig commented 1 year ago

Yes, I left it out, but we can assume this has the same selector blocks as all the other CRD components in flow. I used the example of flow.grafana.com/module as something that would be the default selector, or maybe a required selector that you can add others on top of. I kinda like the idea of having that, so we don't inadvertently try and parse every configmap in your cluster as a flow config.

grafana / alloy