hetznercloud / hcloud-cloud-controller-manager

Kubernetes cloud-controller-manager for Hetzner Cloud
Apache License 2.0
732 stars 118 forks source link

Bug: hcloud-cloud-controller-manager crashes with SIGSEGV #60

Closed MatthiasLohr closed 4 years ago

MatthiasLohr commented 4 years ago

Hi,

I'm trying to set up a Kubernetes Cluster on the Hetzner Cloud with the network feature.

I'm using the unmodified https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/master/deploy/v1.6.1-networks.yaml file.

After removing the taints for calico/coredns they're starting, the hcloud-cloud-controller-manager started to crash:

Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
I0701 05:47:43.542132       1 serving.go:313] Generated self-signed cert in-memory
W0701 05:47:43.968638       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0701 05:47:43.978618       1 controllermanager.go:120] Version: v0.0.0-master+$Format:%h$
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x16faad3]

goroutine 1 [running]:
github.com/hetznercloud/hcloud-cloud-controller-manager/hcloud.newCloud(0x0, 0x0, 0xc000222758, 0xc0003acd10, 0xc000222750, 0xa4)
        /maschine-controller/src/hcloud/cloud.go:83 +0x983
github.com/hetznercloud/hcloud-cloud-controller-manager/hcloud.init.0.func1(0x0, 0x0, 0x7ffcf9723504, 0x6, 0xc0002227d8, 0xc00003a101)
        /maschine-controller/src/hcloud/cloud.go:155 +0x35
k8s.io/cloud-provider.GetCloudProvider(0x7ffcf9723504, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/pkg/mod/k8s.io/cloud-provider@v0.18.3/plugins.go:86 +0xcf
k8s.io/cloud-provider.InitCloudProvider(0x7ffcf9723504, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/pkg/mod/k8s.io/cloud-provider@v0.18.3/plugins.go:134 +0x504
k8s.io/kubernetes/cmd/cloud-controller-manager/app.Run(0xc00000e878, 0xc0000820c0, 0xc0004d3a98, 0x4)
        /go/pkg/mod/k8s.io/kubernetes@v1.18.3/cmd/cloud-controller-manager/app/controllermanager.go:122 +0x127
k8s.io/kubernetes/cmd/cloud-controller-manager/app.NewCloudControllerManagerCommand.func1(0xc000198a00, 0xc000080dc0, 0x0, 0x5)
        /go/pkg/mod/k8s.io/kubernetes@v1.18.3/cmd/cloud-controller-manager/app/controllermanager.go:78 +0x204
github.com/spf13/cobra.(*Command).execute(0xc000198a00, 0xc00003a1f0, 0x5, 0x5, 0xc000198a00, 0xc00003a1f0)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830 +0x29d
github.com/spf13/cobra.(*Command).ExecuteC(0xc000198a00, 0x161d8acea0e98b5b, 0x2bbee40, 0xb)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x2fb
github.com/spf13/cobra.(*Command).Execute(...)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
main.main()
        /maschine-controller/src/main.go:44 +0xf3

Any idea?

Best regards Matthias

LKaemmerling commented 4 years ago

Hey @MatthiasLohr,

thank you for the report. I found the bug. Did you create a network secret within the hcloud secret like described here: https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/master/docs/deploy_with_networks.md?

It looks like the CCM can not find the Network

MatthiasLohr commented 4 years ago

No, I followed the instructions from https://community.hetzner.com/tutorials/install-kubernetes-cluster to create a ansible role for setting up everything. The ansible task for creating the secret looks like this:

- name: Hetzner Cloud Controller Secret
  community.kubernetes.k8s:
    definition:
      apiVersion: v1
      kind: Secret
      metadata:
        name: hcloud
        namespace: kube-system
      stringData:
        token: "{{ hetzner.api_token }}"
        network: "{{ hetzner.network_id }}"

According to https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/master/docs/deploy_with_networks.md, there are two "create secret" commands, which does not work:

# kubectl -n kube-system create secret generic hcloud --from-literal=token=nothingtoseehere
secret/hcloud created
# kubectl -n kube-system create secret generic hcloud --from-literal=network=42
Error from server (AlreadyExists): secrets "hcloud" already exists

(to test this, I deleted the secret generated by ansible manually)

LKaemmerling commented 4 years ago

Are you sure that the hetzner.network_id contain the correct value of your network?

MatthiasLohr commented 4 years ago

Oops. I recreated the network to adjust the subnet to match the controllers default subnet and forgot to update it in my config as well. Sorry!

Anyways: I would suggest to add a check for n == nil in https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/master/hcloud/cloud.go#L83 and give a proper message like "are you as stupid as Matthias and put the wrong ID here?" message...

Thanks for your help!

LKaemmerling commented 4 years ago

@MatthiasLohr the MR for this was just opened in #61 :) Thank you!