kubeadm upgrade fails when hostname != node name and when kubeadm config is used

zerkms commented 5 years ago

This is a followup for the https://github.com/kubernetes/kubeadm/issues/1757

What keywords did you search in kubeadm issues before filing this one?

upgrade kubeadm hostname

If you have found any duplicates, you should instead reply there and close this page.

If you have not found any duplicates, delete this section and continue on.

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version): kubeadm version: &version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.0", GitCommit:"2bd9643cee5b3b3a5ecbd3af49d09018f0773c77", GitTreeState:"clean", BuildDate:"2019-09-18T14:34:01Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration: bare metal
OS (e.g. from /etc/os-release): ubuntu:bionic
Kernel (e.g. uname -a): Linux hq-srv11 4.15.0-64-generic #73-Ubuntu SMP Thu Sep 12 13:16:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Others:

What happened?

During the kubeadm upgrade apply v1.16.0 --config=/path/to/config.yaml --dry-run run it ended up with an infinite loop of

[dryrun] Resource name: "hq-srv11"
[dryrun] The GET request didn't yield any result, the API Server returned a NotFound error.
[dryrun] Would perform action GET on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "hq-srv11"
[dryrun] The GET request didn't yield any result, the API Server returned a NotFound error.
[dryrun] Would perform action GET on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "hq-srv11"
[dryrun] The GET request didn't yield any result, the API Server returned a NotFound error.
[dryrun] Would perform action GET on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "hq-srv11"
[dryrun] The GET request didn't yield any result, the API Server returned a NotFound error.
[dryrun] Would perform action GET on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "hq-srv11"
[dryrun] The GET request didn't yield any result, the API Server returned a NotFound error.

and the same with more verbose output:

I0924 20:42:22.238048   16521 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "hq-srv11" as an annotation
[dryrun] Would perform action GET on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "hq-srv11"
I0924 20:42:22.738465   16521 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json" -H "User-Agent: kubeadm/v1.16.0 (linux/amd64) kubernetes/2bd9643" 'https://10.50.8.1:6443/api/v1/nodes/hq-srv11'
I0924 20:42:22.742984   16521 round_trippers.go:443] GET https://10.50.8.1:6443/api/v1/nodes/hq-srv11 404 Not Found in 4 milliseconds
I0924 20:42:22.743053   16521 round_trippers.go:449] Response Headers:
I0924 20:42:22.743110   16521 round_trippers.go:452]     Content-Type: application/json
I0924 20:42:22.743126   16521 round_trippers.go:452]     Content-Length: 186
I0924 20:42:22.743142   16521 round_trippers.go:452]     Date: Tue, 24 Sep 2019 20:42:22 GMT
I0924 20:42:22.743183   16521 request.go:968] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"nodes \"hq-srv11\" not found","reason":"NotFound","details":{"name":"hq-srv11","kind":"nodes"},"code":404}
[dryrun] The GET request didn't yield any result, the API Server returned a NotFound error.
[dryrun] Would perform action GET on resource "nodes" in API group "core/v1"
[dryrun] Resource name: "hq-srv11"

I traced back to see where that value comes from and found the source of the problem:

func SetJoinDynamicDefaults(cfg *kubeadmapi.JoinConfiguration) error {
    addControlPlaneTaint := false
    if cfg.ControlPlane != nil {
        addControlPlaneTaint = true
    }
    if err := SetNodeRegistrationDynamicDefaults(&cfg.NodeRegistration, addControlPlaneTaint); err != nil {
        return err
    }

    return SetJoinControlPlaneDefaults(cfg.ControlPlane)
}

// SetNodeRegistrationDynamicDefaults checks and sets configuration values for the NodeRegistration object
func SetNodeRegistrationDynamicDefaults(cfg *kubeadmapi.NodeRegistrationOptions, ControlPlaneTaint bool) error {
    var err error
    cfg.Name, err = kubeadmutil.GetHostname(cfg.Name)
    if err != nil {
        return err
    }

// GetHostname returns OS's hostname if 'hostnameOverride' is empty; otherwise, return 'hostnameOverride'
// NOTE: This function copied from pkg/util/node package to avoid external kubeadm dependency
func GetHostname(hostnameOverride string) (string, error) {
    hostName := hostnameOverride
    if len(hostName) == 0 {
        nodeName, err := os.Hostname()
        if err != nil {
            return "", errors.Wrap(err, "couldn't determine hostname")
        }
        hostName = nodeName
    }

As you can see unless you specify it explicitly - the os.Hostname is used, and the hostname of the machine is hq-srv11:

# hostname
hq-srv11
# hostname -f
hq-srv11.<redacted-org-domain-name>

while nodes in the cluster have the explicitly set FQDN

# kubectl get nodes
NAME                                      STATUS   ROLES    AGE   VERSION
hq-srv11.<redacted-org-domain-name>          Ready    master   63d   v1.15.3

What you expected to happen?

I believe the name of the node should be obtained from the API, or at least correlated with what's in the API, since hostname not necessary matches the node name.

How to reproduce it (as minimally and precisely as possible)?

Initialise an older version cluster with a node with non-default name and with a kubeadm confid using kubeadm init --node-name=foo, then upgrade, using the kubeadm config again.

Anything else we need to know?

neolit123 commented 5 years ago

@zerkms i remember this problem, but i could not reproduce it: https://github.com/kubernetes/kubeadm/issues/1757#issuecomment-526370377

we should have re-open the old issue instead of this new one.

As you can see unless you specify it explicitly - the os.Hostname is used, and the hostname of the machine is hq-srv11:

the join configuration is not used during kubeadm upgrade. where are you seeing a call to SetJoinDynamicDefaults from kubeadm upgrade apply?

zerkms commented 5 years ago

but i could not reproduce it:

I provided more details in this request - it happens when you provide a kubeadm config.

where are you seeing a call to SetJoinDynamicDefaults from kubeadm upgrade apply?

It's really hard to find it just reading the code, but from the logs I found that at least at

func PerformPostUpgradeTasks(client clientset.Interface, cfg *kubeadmapi.InitConfiguration, newK8sVer *version.Version, dryRun bool) error {
    errs := []error{}

    // Upload currently used configuration to the cluster
    // Note: This is done right in the beginning of cluster initialization; as we might want to make other phases
    // depend on centralized information from this source in the future
    if err := uploadconfig.UploadConfiguration(cfg, client); err != nil {
        errs = append(errs, err)
    }

at UploadConfiguration call it already uses the wrong value:

[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[dryrun] Would perform action CREATE on resource "configmaps" in API group "core/v1"
[dryrun] Attached object:
        apiVersion: v1
        data:
          ClusterConfiguration: |
            apiServer:
              extraArgs:
                authorization-mode: Node,RBAC
              timeoutForControlPlane: 4m0s
            apiVersion: kubeadm.k8s.io/v1beta2
            certificatesDir: /etc/kubernetes/pki
            clusterName: kubernetes
            controlPlaneEndpoint: 10.50.8.1:6443
            controllerManager: {}
            dns:
              type: CoreDNS
            etcd:
              local:
                dataDir: /var/lib/etcd
            imageRepository: k8s.gcr.io
            kind: ClusterConfiguration
            kubernetesVersion: v1.16.0
            networking:
              dnsDomain: <redacted>
              podSubnet: 10.51.0.0/16
              serviceSubnet: 10.52.0.0/16
            scheduler: {}
          ClusterStatus: |
            apiEndpoints:
              hq-srv11:
                advertiseAddress: 10.50.4.11
                bindPort: 6443
            apiVersion: kubeadm.k8s.io/v1beta2
            kind: ClusterStatus
        kind: ConfigMap
        metadata:
          creationTimestamp: null
          name: kubeadm-config
          namespace: kube-system

neolit123 commented 5 years ago

at UploadConfiguration call it already uses the wrong value:

what value is wrong in this config for you?

zerkms commented 5 years ago

            apiEndpoints:
              hq-srv11:

There is no kubernetes node hq-srv11, there is only hq-srv11.<redacted-org-domain-name>

But I cannot reliably trace back (without a debugger) where exactly cfg.NodeRegistration.Name is filled in.

neolit123 commented 5 years ago

i will try to reproduce this again tomorrow.

SataQiu commented 5 years ago

/cc

SataQiu commented 5 years ago

Thanks @zerkms I have reproduced the problem through the following steps:

# kubeadm init --node-name=foo
# kubeadm config view > config.yaml
# kubeadm upgrade apply v1.16.0  --dry-run --config=config.yaml

I'm going to dig into how do we solve this problem.

SataQiu commented 5 years ago

IMO, in short term, we can solve this problem by manually passing a node-name flag ( just like what we did in init and join phase ) ( kubernetes/kubernetes#83180 ). But in long term, we need a way to automatically associate node and node name configured within the cluster.

zerkms commented 5 years ago

we need a way to automatically associate node and node name configured within the cluster.

isn't it available in the kubelet certificate?

SataQiu commented 5 years ago

we need a way to automatically associate node and node name configured within the cluster.

isn't it available in the kubelet certificate?

Yeah, that might be one way. We can extract the host name from the certificate. But I'm not quite sure. @zerkms @neolit123

SataQiu commented 5 years ago

/kind bug

SataQiu commented 5 years ago

/remove-priority awaiting-more-evidence

neolit123 commented 5 years ago

Yeah, that might be one way. We can extract the host name from the certificate.

it can work, but we shouldn't rely on certs for node names.

IMO, in short term, we can solve this problem by manually passing a node-name flag ( just like what we did in init and join phase )

the problem with that is that if the user messes up the name during upgrade, i think this will result in a new entry in the ClusterStatus.

neolit123 commented 5 years ago

# kubeadm init --node-name=foo
# kubeadm config view > config.yaml
# kubeadm upgrade apply v1.16.0  --dry-run --config=config.yaml

i can try testing this later today.

~~if i pass --node-name=foo.bar.zzz instead of foo in the above example, would the config.yaml still contain only foo in the ClusterStatus section?~~

never mind this just uses the ClusterConfiguration object.

neolit123 commented 5 years ago

ok, so i did some investigation here.

@zerkms your suggestion to use certificates to fetch the node name is already used actually, but only when the user is not providing a config file and and the configuration is fetched from the cluster. see https://github.com/kubernetes/kubernetes/blob/2e6b073a3f800654ec217e763fcb97412308a9db/cmd/kubeadm/app/util/config/cluster.go#L113

this is like so because the dynamic defaulting of node name from certficates happens only for nodes that have the kubelet config and certificates present already and a configuration is fetched from the cluster. if you pass a configuration file kubeadm will default the node name to your hostname. this is by design.

dynamically defaulting your node name to a value from the kubelet and certificates when already passing --config to apply is an option, but i don't think we should do this.

the explicit flag that @SataQiu added is workaround for your use case. there is a similar flag for CRI socket. but i'm personally not in favor of adding more flags.

your existing workaround is to have such a config:

apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
nodeRegistration:
  name: the.fqdn.here
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
....

my question for you @zerkms is why are you passing --config to apply? this acts like reconfiguration and while kubeadm supports it, it should not be done in the first place. if your config is missing important information it will be defaulted with dynamic values, such as the host name of the node.

neolit123 commented 5 years ago

cc @rosti @fabriziopandini

zerkms commented 5 years ago

@neolit123

my question for you @zerkms is why are you passing --config to apply?

nowhere in the upgrade documentation process it states whether it should be specified or not. And given that I used it initially to create a cluster I by default think that now I must specify it every time during an upgrade as well.

If it's not the case I think it may need a bit of clarification in the documentation, right?

Thanks nevertheless, now I see it's my mistake actually.

neolit123 commented 5 years ago

nowhere in the upgrade documentation process it states whether it should be specified or not

exactly, that is because we don't want users to use it.

the --config flag was added to upgrade to allow reconfiguration of the existing cluster, which is now supported using the kubeadm kustomize feature (see the changelog for 1.16). yet, reconfiguring the cluster using this flag is not recommended.

If it's not the case I think it may need a bit of clarification in the documentation, right?

i agree. this needs a line or two in this document: https://github.com/kubernetes/website/blob/master/content/en/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade.md

/kind documentation /assign

SataQiu commented 5 years ago

Thanks @neolit123 ! I learned a lot from this ticket. So we can already set the node name through config file :

apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
nodeRegistration:
  name: the.fqdn.here
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
....

Emm... Adding --node-name flag just saves you the trouble of changing the configuration file. Since init and join both support this flag, why not upgrade ? I'm curious about the bad effects of adding this flag. And WDYT @zerkms ?

zerkms commented 5 years ago

@SataQiu given that @neolit123 mentioned a kubeadm config should not be specified at all and it would all work - I think this report should be closed as works as designed.

rosti commented 5 years ago

@SataQiu ideally no config backed flags should be added to upgrade. The existence of --config is a workaround a specific case. However, I hope, that in the future it will be removed and a proper reconfiguration CLI should be introduced in kubeadm.

rosti commented 5 years ago

Adding kind/UX, as this is also UX issue (in the long term) as it's a documentation issue in the short. /kind UX

k8s-ci-robot commented 5 years ago

@rosti: The label(s) kind/ux cannot be applied. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to [this](https://github.com/kubernetes/kubeadm/issues/1801#issuecomment-535852672): >Adding kind/UX, as this is also UX issue (in the long term) as it's a documentation issue in the short. >/kind UX Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

rosti commented 5 years ago

/area UX

neolit123 commented 5 years ago

i've sent a couple of PRs:

deprecate the flag https://github.com/kubernetes/kubernetes/pull/83530
add notes in the docs that the flag should not be used: https://github.com/kubernetes/website/pull/16703

kubernetes / kubeadm