The “kubectl cluster-compare” command is capable of performing an intelligent diff between a reference configuration and the specific configuration applied to a cluster. The comparison is capable of suppressing diffs of content which is expected to be user variable, validating required and optional configuration, and ignoring known runtime variable fields. With these capabilities a cluster administrator, solutions architect, support engineers, and others can validate a cluster’s configuration against a baseline reference configuration.
In addition to the subcommand to perform this comparison, this enhancement defines the structure and method of capturing known user variation, optional components, and required content in the reference configuration.
Many deployed clusters are based on engineered and validated reference configurations. These reference configurations have been designed to ensure a cluster will meet the functional, feature, performance and resource requirements for specific use cases. A customer will take this reference configuration and adapt it for their particular environment adding variations to account for their networking topology, specific servers/hardware in use, optional features, etc. This adapted version of the configuration is then applied to their cluster, or replicated across a large scale deployment of clusters. When this adapted configuration deviates from the reference configuration the impacts may be subtle, transient, or delayed for some period of time. When working with these clusters across their lifetimes it is important to be able to validate the configuration against the known valid reference configuration to identify potential issues before they impact end users, service level agreements, or cluster uptime.
This kubectl cluster-compare command is capable of doing an "intelligent" diff between a reference configuration and a set of CRs representative of a deployed production cluster. These CRs may derive from many potential sources such as being pulled from a live cluster, extracted from a support archive, or shared directly by the customer. The reference configuration is the engineered set of configuration CRs for the use case and has been sufficiently annotated to describe expected user variations versus required content.
┌──────────────────┐ ┌──────────────────┐
│ │ Adaptation to │ │
│ Published │ user env │ Deployed user │
│ Reference ├────────────────►│ Configuration │
│ Configuration │ │ │
│ │ │ │
└────────┬─────────┘ └─────────┬────────┘
│ │
│ │
│ ┌──────────────────┐ │
│ │ │ │
└───────►│ Proposed Cluster │◄────────┘
│ Validation Tool │
│ │
│ │
└─────────┬────────┘
│
┌────▼────┐
│Relevant │
│Diffs │
│ │
│... │
│... │
│... │
└─────────┘
Existing tools meet some of this need but fall short of the goals
<get cr> | <key sorting> | diff
: There are various ways of chaining together existing tools
to obtain, correlate, and compare/diff two YAML objects. These methods fall short in similar ways as
the kubectl diff
The design and implementation of this subcommand is guided by the following goals:
The validation tool will operate similarly to a standard Linux diff
tool which operates (recursively) across a
set of inputs (eg two trees of input). The left hand side of the diff will be a user selected reference
configuration (see below for structure/contents of the reference) and the right hand side will be a
collection of the user’s configuration CRs. The logical flow of the tool will be:
kubectl cluster-compare <referenceConfig> <userConfig>
<userConfig>
input is made up of the set of
CRs pulled from the cluster based on the reference configuration. Only those CRs included in
the reference configuration are pulled from the live cluster. Where the reference
configuration indicates user variability in CR name or namespace multiple CRs may be pulled
based on the kind and included in the <userConfig>
.<userConfig>
is a local directory.<userConfig>
diff
between rendered reference CR and the input CR. Any
non-expected variations and/or missing content are reported.As described in the logical flow the tool will report any differences considered outside the expected set of variability as defined by the reference configuration (ie the "drift"). The tool will highlight this drift for additional analysis/review by the user. In addition to the CR comparison output the tool will output a report detailing:
The tool consumes two mandatory inputs and supports additional options to control the comparison, output, etc.
The reference configuration is a required input. The structure of the reference is described below. The minimum requirement is that the reference can be located on the local filesystem (eg directory).
The user configuration is an optional input. If specified the user configuration will be pulled from the local filesystem. Otherwise the user configuration will be pulled from a live cluster.
kubectl cluster-compare
must correlate CRs between reference and input configurations to perform the
comparisons. kubectl cluster-compare
correlates CRs by using the apiVersion, kind, namespace and name fields of the CRs to
perform a nearest match correlation. Optionally the user may provide a manual override of the correlation to identify a
specific reference configuration CR to be used for a given user input CR. Manual matches are prioritized over the
automatic of correlation, meaning manual matches override matches by similar values in the specified group of fields.
kubectl cluster-compare
gets as input a diff config that contains an option to specify manual matches between cluster resources
and resource templates. The matches can be added to the config as pairs of
apiVersion_kind_namespace_name: <Template File Name>
. For cluster scoped CRs that don't have a namespace the matches can
be added as pairs of apiVersion_kind_name: <Template File Name>
.
When there is no manual match for a CR the command will try to match a template for the resource by looking at the 4-tuple: apiVersion, kind, namespace and name . The Correlation is based on which fields in the templates that are not user-variable. Templates get matched to resources based on all the features from the 4-tuple that are declared fixed ( not user-variable) in the templates. For example a template with a fixed namespace, kind, name and templated (user-variable) apiVersion will only be a potential match by the kind-namespace-name criterion.
For each resource the group correlation will be done by the next logic:
We can phrase this logic in a more general form. Each CR will be correlated to a template with an exact match in the largest number of fields from this group: apiVersion, kind, namespace, name.
The tool will generate standard diff output highlighting content as described in "Categorization of differences". Note in this example the cpusets and hugepage count are not highlighted as these are expected user variations. The hugepage node is indicated as extra content and the realtime kernel setting is indicated as a drift
@@ -8,7 +8,7 @@
namespace: MyNamespace
spec:
ports:
- - port: 8000
+ - port: 80
selector:
app: guestbook
tier: frontend
---
<next CR>
…
Summary
Missing 1 required CRs:
guestbook:
frontend:
- frontend-deployment.yaml
No CRs are unmatched
Once the validations are complete we run a diff between the user's input configuration (now validated) CR vs the resolved template (user variable input is pulled from input config into the resolved template). This final step is needed to error/warn user of remaining drift that validation steps may not catch
The primary output of this step is a side-by-side diff as shown in the output section above. To achieve this meaningful diff the tool must do perform two operations:
To Compare a known valid reference configuration with a live cluster:
kubectl cluster-compare -r <referenceConfigurationDirectory>
To Compare a known valid reference configuration with a local set of CRs:
kubectl cluster-compare -r <referenceConfigurationDirectory> -f <inputConfiguration>
To Compare a known valid reference configuration with a live cluster and with a user config:
kubectl cluster-compare -r <referenceConfigurationDirectory> -c <userConfig>
To Run a known valid reference configuration against a support archive:
kubectl cluster-compare -r <referenceConfigurationDirectory> -f "must-gather*/*/cluster-scoped-resources","must-gather*/*/namespaces" -R
The metadata.yaml is a mandatory file for each reference config. The commands entrypoint will be looking for the metadata.yaml file in the reference directory. The name of the file is fixed and cant be changed.
The main thing included in the metadata are the list of reference CRs that are grouped by components and parts (as described in previous sections). The Parts are specified under the Parts key in the YAML and include a list of components under the Components key. The full schema can be found in the appendix.
Another parameter that can be set in the metadata.yaml file is the templateFunctionFiles. This Implementation of the command supports the declaration of nested templates in external files that then can be used in all resource templates included in the reference. All files including nested templates should be added to the list of files under the templateFunctionFiles key.
Also the metadata,yaml includes an optional field: fieldsToOmit
. Under this key they can specify fields that should
not appear in the commands output. The fields will not be reported showed in the output for all templates in the
reference, meaning no need to specify them in the resource templates. The fields included will not be showed in the
output even if they are specified in the resource templates. Omitted fields can be nested therefore each field is
represented by a list of strings. As can be seen in the example below.
Example for metadata.yaml:
Parts:
- name: guestbook
Components:
- name: redis
type: Required
requiredTemplates:
- path: redis-master-deployment.yaml
- path: redis-master-service.yaml
optionalTemplates:
- path: redis-replica-deployment.yaml
- path: redis-replica-service.yaml
- name: frontend
type: Required
requiredTemplates:
- path: frontend-deployment.yaml
- path: frontend-service.yaml
The user has an option to pass a file called the diff config. The diff config includes user preferences and content that is specific to the users cluster (not like the metadata.yaml that includes only settings that are valid for the specific reference).
In the version the diff config includes an option to specify manual matches between cluster resources and resource
templates. The matches can be added to the config as pairs of apiVersion_kind_namespace_name: <Template File Name>
. For
resources that don't have a namespace the matches can be added as pairs of apiVersion_kind_name: <Template File Name>
.
The pairs are listed in the config under correlationSettings.manualCorrelation.correlationPairs as can be seen in the
example below.
correlationSettings:
manualCorrelation:
correlationPairs:
v1_Service_guestbook_frontendService: "frontend-service.yaml"
kubectl cluster-compare implementation includes usage of parts of code from the K8s built-in diff
command which combines
patching and an external diff tool via
KUBECTL_EXTERNAL_DIFF
.
The command implementation includes parsing of the reference and other user passed arguments, correlation logic,
template injecting, calling the diff code and summary creation.
The command calls diff code by using the exported Differ Struct: Definition:
type Differ struct {
From *DiffVersion
To *DiffVersion
}
func (d *Differ) Diff(obj Object, printer Printer, showManagedFields bool) error
func (d *Differ) Run(diff *DiffProgram) error
The compare command calls the differ.Diif function for each resource, adding the injected resource and the cluster resource to the files that should be included in the diff. As seen above the differ.Diif function gets as an argument an object that matches the Object interface:
type Object interface {
Live() runtime.Object
Merged() (runtime.Object, error)
Name() string
}
The compare command includes a custom implementation of this interface. Where the Live function returns the cluster resource and the Merged function returns the injected version of the CR. After the differ.Diff function is called for all CRs the differ.Run() is called and the diff is printed out to stdout.
Existing tools can perform a diff of two CRs – This tool extends that functionality to allow for expected variations, optional content, and detection of missing/unmatched content.
The kubectl cluster-compare uses Different Correlators to correlate between custer resources and their matching reference template. When Designing the structure of the correlators we tried to come up with a design that will be: easy to add additional correlation logics, and will allow chaining of different correlators. The Correlators are divided into 2 types: Base correlators - implement a specific correlation logic Decorator correlators - correlators that wrap other correlators and add an additional behaviour.
The current version includes 2 decorator correlators: MultiCorrelator and MetricsCorrelatorDecorator. And includes 2 Base correlators: ExactMatchCorrelator and GroupCorrelator. (detailed information about all of them can be found below) To allow easy chaining all the correlators match the correlator interface: (include Errors)
In this Version the correlators are created and initialized in the following chain:
┌──────────────────────┐
<<use>> │ │
┌──────────► │ │
│ │ ExactMatchCorrelator │
┌──────────────────────┐ ┌───────────────┴──────┐ │ │
│ │ │ │ │ │
│ │ │ │ └──────────────────────┘
│ MetricsCorrelator- ├──────────►│ MultiCorrelator │
│ Deorator │ <<use>> │ │ ┌──────────────────────┐
│ │ │ │ │ │
└──────────────────────┘ └────────────────┬─────┘ │ │
│ │ GroupCorrelator │
└─────────► │ │
<<use>> │ │
└──────────────────────┘
The MultiCorrelator aggregates multiple correlators while implementing the correlator interface. The multiCorrelator stores a list of correlators. It Matches resources to templates by iterating over the list of correlators and for each subcorrelator attempts to find a match for the requested resource. In case a match is found for one of the correlators, it will be returned without any errors. If no match is found a joined error including all sub correlators errors will be returned.
Wraps a single correlator, And collects metrics about the correlation. The metrics can be later retrieved and then can be used to create a summary output. The MetricsCorrelatorDecorator gathers metrics on which resource templates that have been matched and with cluster CRs were not matched.
Matches templates by exact match between a predefined config including pairs of Resource names and their equivalent template.The exact behavior of this correlator is described in Correlation by manual matches section.
The group correlator implements the correlation behavior explained in Correlation by group of fields (apiVersion, kind, namespace and name). The correlation behavior in this version is: “Each CR will be correlated to a template with an exact match in the largest number of fields from this group: apiVersion, kind, namespace, name.” The group correlator is more generic, and it gets on creation a list of fields that will be used for matching templates. In this version the group of fields are fixed: apiVersion, kind, namespace, name. But it can be changed in the future to allow more flexibility in group correlating.
The existing kubectl diff works well for validation of a CR (or set of CRs) on a cluster against a known valid configuration. This tool does a good job of suppressing diffs in known managed fields (eg metadata, status, etc), however it is lacking in several critical features for the use cases in this enhancement:
Another option is the builtin diff command: diff -t -y -w <(yq 'sort_keys(..)' /path/to/reference/config/cr) <(yq 'sort_keys(..)' /path/to/input/cr ) The command works well on Comparison of two offline files but doesn't handle one-to-many matches and does not suppress known managed fields and expected user variations.