siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.39k stars 514 forks source link

talosctl apply-configdiff exposes raw secrets #8268

Open WinterNis opened 7 months ago

WinterNis commented 7 months ago

Bug Report

Description

When using talosctl apply-config, the diff output can contain secret values (ex: MachineToken, certificates key, etc). This is not suitable for an usage where output logs are captured. For example in CI, secrets would be leaked in CI logs, which is not great.

As far as we understand talos api is using go-cmp directly without any specific measure for handling secrets values.

In our specific case, we are not blocked by this "issue" as we are instrumenting the api directly in golang. This allow us to call go-cmp on our own with a redacted version of configs.

Here is the idea of our "quick fix" implementation:

func sha265sum(in string) (string, error) {
    h := sha256.New()
    _, err := h.Write([]byte(in))
    if err != nil {
        return "", nil
    }
    return hex.EncodeToString(h.Sum(nil)), nil
}

func redactConfig(conf *v1alpha1.Config) (*v1alpha1.Config, error) {
    prefix := "REDACTED_SHA256_ONLY"
    if conf.MachineConfig.MachineToken != "" {
        redacted, err := sha265sum(conf.MachineConfig.MachineToken)
        if err != nil {
            return nil, err
        }
        conf.MachineConfig.MachineToken = fmt.Sprintf("%s_%s", prefix, redacted)
    }
    if conf.MachineConfig.MachineCA.Key != nil {
        redacted, err := sha265sum(string(conf.MachineConfig.MachineCA.Key))
        if err != nil {
            return nil, err
        }
        conf.MachineConfig.MachineCA.Key = []byte(fmt.Sprintf("%s_%s", prefix, redacted))
    }
    if conf.ClusterConfig.ClusterSecret != "" {
        redacted, err := sha265sum(conf.ClusterConfig.ClusterSecret)
        if err != nil {
            return nil, err
        }
        conf.ClusterConfig.ClusterSecret = fmt.Sprintf("%s_%s", prefix, redacted)
    }
    if conf.ClusterConfig.BootstrapToken != "" {
        redacted, err := sha265sum(conf.ClusterConfig.BootstrapToken)
        if err != nil {
            return nil, err
        }
        conf.ClusterConfig.BootstrapToken = fmt.Sprintf("%s_%s", prefix, redacted)

    }
    if conf.ClusterConfig.ClusterAESCBCEncryptionSecret != "" {
        redacted, err := sha265sum(conf.ClusterConfig.ClusterAESCBCEncryptionSecret)
        if err != nil {
            return nil, err
        }
        conf.ClusterConfig.ClusterAESCBCEncryptionSecret = fmt.Sprintf("%s_%s", prefix, redacted)
    }
//..........and so on for all other secrets
}
redactedCurrentConfig, err := redactConfig(&currentConfig)
            if err != nil {
                return err
            }
            redactedNextConfig, err := redactConfig(nextConfig)
            if err != nil {
                return err
            }
            diff := cmp.Diff(redactedCurrentConfig, redactedNextConfig)
            if err != nil {
                return err
            }

This is not pretty but still allow a "diff" for all values while masking secrets. And even if the value is not displayed, we are still aware of a diff in a secret value thanks to the sha256 sum.

Another, (probably prettier) way of handling this would be to implement a custom go-cmp reporter. However go-cmp does a lot of magic in its default reporter, and all this magic is internal to the package. A custom reporter would mean re-implementing a good chunk of this magic.

There is probably a prettier and smarter way to fix this, hence why we are raising the issue.

Logs

This is an exemple of a apply-config modifiying cert SANs. The machine token value gets leaked:

Dry run summary:
Applied configuration without a reboot (skipped in dry-run).
Config diff:
  &v1alpha1.Config{
        ConfigVersion: "v1alpha1",
        ConfigDebug:   &false,
        ConfigPersist: &true,
        MachineConfig: &v1alpha1.MachineConfig{
                MachineType:  "controlplane",
                MachineToken: "THISVALUEGETSLEAKED",
                MachineCA:    &{Crt: "-----BEGIN CERTIFICATE-----\nMIIBPjCB8aADAgECAhBVEsN2OCNpiNRnqUH2"..., Key: "-----BEGIN ED25519 PRIVATE KEY-----\nMC4CAQAwBQYDK2VwBCIEIOwx5gFw"...},
                MachineCertSANs: []string{
                        "5.6.7.8",
+                       "1.2.3.4",
                },
 ................

Note: Since certificate and key are long value, it gets truncated by go-cmp, which kind of protect the key. But if we modify the key manually, the full value will be displayed.

Here is the output diff with our custom "fix":

&v1alpha1.Config{
        ConfigVersion: "v1alpha1",
        ConfigDebug:   &false,
        ConfigPersist: &true,
        MachineConfig: &v1alpha1.MachineConfig{
                MachineType:  "controlplane",
                MachineToken: "REDACTED_SHA256_ONLY_597c51a9b971d177c792c5cf7364a4063ce13f86359"...,
                MachineCA:    &{Crt: "-----BEGIN CERTIFICATE-----\nMIIBPjCB8aADAgECAhBVEsN2OCNpiNRnqUH2"..., Key: "REDACTED_SHA256_ONLY_bea3bb22387b55713ec91564b38f5d153a025b82bea"...},
                MachineCertSANs: []string{
                        "5.6.7.8",
+                       "1.2.3.4",
                },
 ................          

Environment

Note: in our code we are using github.com/siderolabs/talos v1.6.3 Talos version does not matter in this case.

smira commented 7 months ago

I don't think we have a really great way at this point to filter out all secrets, as the machine config might contain user-supplied data which is not known whether it contains secret data.

I think we should just suppress the diff as an option, or that processing should happen outside of talosctl.

WinterNis commented 7 months ago

as the machine config might contain user-supplied data which is not known whether it contains secret data.

Yes. This is not an issue in our setup since we control the input, but this is definitely an issue for general usage and the cli.

The diff option can actually be quite useful IMHO. Suppressing it entirely would make the cli less helpful. But people should be aware that it absolutely should not be used in CI environment since secrets are not protected.

As for processing outside of the CLI, I suppose one can do the same trick we did with some jq/yq and the diff command 👍

I am a bit surprised that this issue was not raised sooner. Do people not automatise their deployments or look at config diff outside of local apply ? On our side we are doing a terraform-like plan/apply approach where we want to see the diff before actually applying machine configs.