siderolabs / sidero

Sidero Metal is a bare metal provisioning system with support for Kubernetes Cluster API.
https://www.sidero.dev
Mozilla Public License 2.0
405 stars 62 forks source link

Manage order of configPatches #689

Open TimJones opened 2 years ago

TimJones commented 2 years ago

Issue

When using configPatches across TalosControlPlane, TalosConfigTemplate, ServerClass, and Server resources, you can run into cases where a config patch is conceptually coupled with a spec, while its parent config node is logically coupled with another, leaving the patch unable to be applied in the current order.

Specific example:

I want to use the floating VIP feature, which conceptually, should only exist at the TalosControlPlane level as usually you want to use the virtual IP as the Kubernetes endpoint for a poor-mans LoadBalancer.

---
apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
kind: TalosControlPlane
metadata:
  name: talos-cluster-cp
  namespace: default
spec:
  controlPlaneConfig:
    init:
      configPatches:
      - op: add
        path: /machine/network/interfaces/0
        value:
          vip:
            ip: 192.168.10.100
---
apiVersion: metal.sidero.dev/v1alpha1
kind: Server
metadata:
  name: 00000000-0000-0000-0000-d05099d333e0
spec:
  configPatches:
  - op: add
    path: /machine/network
    value:
      interfaces:
      - interface: eth0
        dhcp: true
      - interface: eth1
        dhcp: true
        dhcpOptions:
          routeMetric: 1000  

Except the network config is logically tires to the Server or ServerClass, as they 'know' what network hardware is available and how it should be configured. However, configuring the VIP at the Server/ServerClass level binds those resources to that specific TalosControlPlane, which is not ideal and mostly contrary to metal management with Sidero.

Possible solutions:

Add 'order' field

We could add an order field to the ConfigPatch struct. When the patches are complied, they should be pulled from all applicable resources, and applied in order. If the order field is null, then in original list/resource order.

This is potentially the simplest solution, but also means we're not strictly following the JSON patch RFC which is not great.

Use path to determine order

As above, we could compile all the configPatches from all applicable resources, but then apply them in order of the defined path, i.e. if patch-a has a path /machine/network/interfaces/0 and patch-b has a path /machine/network then we apply patch-b before patch-a.

This shouldn't be complicated in code, but does mean we patch implicitly rather than explicitly, which I think is not great for UX.

Add ConfigPatch custom resource

Adding a TalosConfigPatch custom resource, and a configPatchSelector to the Talos resources decouples the config from the resources directly which while not ideal, brings a lot of flexibility (and the hard-coded configPatches would still be available). Application order could be a part of the spec or an annotation, and it could reduce duplication: most likely your TalosControlPlane and TalosConfigTemplate will both need the same patches to provide the CNI config, and if you have a set of identical metal servers you only have to write appropriate config patches once, and the Server resources can select them.

This is probably the most complex solution, but also gives most flexibility and remains explicit in its use.

smira commented 2 years ago

:100: agree with what you posted, and thanks for summarizing all of the issues and potential solutions.

we fully see the problem with patches being counter-intuitive, not really composable and hard to use.

In addition to the solutions you posted above, we were looking into having label-selector model for patches (when patches apply based on selectors, and servers/clusters/etc. are labeled).

Also another issue is that patching our machine configuration is exceptionally hard, as some fields are nested deep into the tree (like VIP configuration), and having multi-doc YAML configuration might help a lot, and it doesn't require that much patching vs. simply appending another small document to configure e.g. VIP.