Closed DerPate closed 11 months ago
@DerPate please paste the full log.
There ist not more to it realy but I will paste it tomorrow. Worked around it with using version 1.3
INFO[05:15:02 UTC] Determine hostname...
INFO[05:15:04 UTC] Determine operating system...
these two lines are missing from the log. @kron4eg
@DerPate Is your API server working at all? Can you do anything with kubectl
, like kubectl get nodes
?
No not realy, The cluster wouldn't start or any thing it chrashes hard when using 1.4. It will work when using 1.3
@DerPate how does "crashes hard" manifests itself? Some logs from crashing pods would be good to see.
@kron4eg The go sub process kubeone spawns to configure the cluster exits with signal 11 SIGSEGV the crash log for this "hard chrash" i called it is in the first post. Kubeone is not able to spin up the cluster because it chrases before even starting to building the first node, so i'm not able to provide any pod node logs as there is no active cluster. I hoipe this makes it more clear to you
But can you ssh to the control plane instance and take a look directly to the container logs?
I have encountered the same panic issue on the already built cluster. I can access the cluster via kubectl
but there is nothing out of the ordinary that would stand out. All pods are running, some had a restart, however, I cannot confirm that the restarts happened during the panic. The nodes are also healthy in the Ready
state.
Kubeone version 1.4.3
Deployed via kubeone apply
without terraform
@MiroslavRepka a way to reproduce would help to debug this case.
I deployed a cluster on VMs I created in Hetzner cloud. The cluster was composed from 3 Master/2 Worker nodes.
Here is the kubeone.yaml
apiVersion: kubeone.k8c.io/v1beta2
kind: KubeOneCluster
name: cluster
versions:
kubernetes: 'v1.21.0'
clusterNetwork:
cni:
external: {}
cloudProvider:
none: {}
external: false
addons:
enable: true
path: "../addons"
apiEndpoint:
host: 'my.fancy.url'
port: 6443
controlPlane:
hosts:
- publicAddress: 'x.x.x.x'
privateAddress: 'x.x.x.x'
sshUsername: root
sshPrivateKeyFile: './private.pem'
taints:
- key: "node-role.kubernetes.io/master"
effect: "NoSchedule"
- publicAddress: 'x.x.x.x'
privateAddress: 'x.x.x.x'
sshUsername: root
sshPrivateKeyFile: './private.pem'
taints:
- key: "node-role.kubernetes.io/master"
effect: "NoSchedule"
- publicAddress: 'x.x.x.x'
privateAddress: 'x.x.x.x'
sshUsername: root
sshPrivateKeyFile: './private.pem'
taints:
- key: "node-role.kubernetes.io/master"
effect: "NoSchedule"
staticWorkers:
hosts:
- publicAddress: 'x.x.x.x'
privateAddress: 'x.x.x.x'
sshUsername: root
sshPrivateKeyFile: './private.pem'
- publicAddress: 'x.x.x.x'
privateAddress: 'x.x.x.x'
sshUsername: root
sshPrivateKeyFile: './private.pem'
machineController:
deploy: false
The addons are defining calico
CNI.
The cluster was created via kubeone apply -m kubeone.yaml
with no errors. After some small changes to the external Loadbalancer, I ran kubeone apply
again, which threw a panic.
The cluster was still accessible with kubectl
, as I mentioned before.
The panic the kubeone threw was identical to the one reported by OP.
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
@MiroslavRepka can I ask why cloudProvider.none
is used?
@DerPate We've upgraded few times kubernetes dependencies since initial issue open data, including controller-runtime. Please try to reproduce again.
can I ask why
cloudProvider.none
is used?
If I remember correctly, the machines used in the cluster were from multiple cloud providers.
machines used in the cluster were from multiple cloud providers.
This is definitely not recommended nether supported and is a perfect recipe to break the cluster.
This is definitely not recommended nether supported and is a perfect recipe to break the cluster.
I agree that a multi-cloud cluster that depends on solely kubeOne would probably fail. However, a few adjustments, like installing a VPN before running kubeOne, installing and configuring some block storage solution like Longhorn, etc., should do the trick.
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
/close
@kubermatic-bot: Closing this issue.
What happened?
After starting kubeone -v apply -m kubeone/kubeone.yaml the Porgram runs until
this happens: INFO[14:14:29 UTC] Running cluster probes...
E0519 14:14:29.961918 10934 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 1 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x195ec80?, 0x2ce98e0}) k8s.io/apimachinery@v0.22.2/pkg/util/runtime/runtime.go:74 +0x99 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000002340?}) k8s.io/apimachinery@v0.22.2/pkg/util/runtime/runtime.go:48 +0x75 panic({0x195ec80, 0x2ce98e0}) runtime/panic.go:838 +0x207 sigs.k8s.io/controller-runtime/pkg/client.(client).List(0x2?, {0x1f75580?, 0xc00003c058?}, {0x1f7af50?, 0xc0002f24d0?}, {0xc0004a2340?, 0xc00051de80?, 0x9?}) sigs.k8s.io/controller-runtime@v0.10.2/pkg/client/client.go:287 +0x3c9 k8c.io/kubeone/pkg/tasks.investigateCluster(0xc0003260f0) k8c.io/kubeone/pkg/tasks/probes.go:344 +0x7ee k8c.io/kubeone/pkg/tasks.runProbes(0xc0003260f0) k8c.io/kubeone/pkg/tasks/probes.go:159 +0x494 k8c.io/kubeone/pkg/tasks.(Task).Run.func1() k8c.io/kubeone/pkg/tasks/task.go:60 +0x94 k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x19cf480, 0x436d01}) k8s.io/apimachinery@v0.22.2/pkg/util/wait/wait.go:217 +0x1b k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x1f75580?, 0xc00003c060?}, 0xc000118230?) k8s.io/apimachinery@v0.22.2/pkg/util/wait/wait.go:230 +0x57 k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0x12a05f200?) k8s.io/apimachinery@v0.22.2/pkg/util/wait/wait.go:223 +0x39 k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff({0x2540be400, 0x4000000000000000, 0x0, 0x9, 0x0}, 0x2d18c40?) k8s.io/apimachinery@v0.22.2/pkg/util/wait/wait.go:418 +0x5f k8c.io/kubeone/pkg/tasks.(Task).Run(0xc000420980, 0xc0003260f0) k8c.io/kubeone/pkg/tasks/task.go:55 +0x11c k8c.io/kubeone/pkg/tasks.Tasks.Run({0xc0001a2460, 0x4, 0x0?}, 0xc0003ff970?) k8c.io/kubeone/pkg/tasks/tasks.go:41 +0xfd k8c.io/kubeone/pkg/cmd.runApply(0xc00007ea80) k8c.io/kubeone/pkg/cmd/apply.go:183 +0x207 k8c.io/kubeone/pkg/cmd.applyCmd.func1(0xc000654a00?, {0x1bbe2e7?, 0x3?, 0x3?}) k8c.io/kubeone/pkg/cmd/apply.go:111 +0x7e github.com/spf13/cobra.(Command).execute(0xc000654a00, {0xc0002ed350, 0x3, 0x3}) github.com/spf13/cobra@v1.1.3/command.go:852 +0x67c github.com/spf13/cobra.(Command).ExecuteC(0xc000654280) github.com/spf13/cobra@v1.1.3/command.go:960 +0x39c github.com/spf13/cobra.(Command).Execute(...) github.com/spf13/cobra@v1.1.3/command.go:897 k8c.io/kubeone/pkg/cmd.Execute() k8c.io/kubeone/pkg/cmd/root.go:52 +0x89 main.main() k8c.io/kubeone/main.go:24 +0x17 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x11d8c09]
goroutine 1 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000002340?}) k8s.io/apimachinery@v0.22.2/pkg/util/runtime/runtime.go:55 +0xd8 panic({0x195ec80, 0x2ce98e0}) runtime/panic.go:838 +0x207 sigs.k8s.io/controller-runtime/pkg/client.(client).List(0x2?, {0x1f75580?, 0xc00003c058?}, {0x1f7af50?, 0xc0002f24d0?}, {0xc0004a2340?, 0xc00051de80?, 0x9?}) sigs.k8s.io/controller-runtime@v0.10.2/pkg/client/client.go:287 +0x3c9 k8c.io/kubeone/pkg/tasks.investigateCluster(0xc0003260f0) k8c.io/kubeone/pkg/tasks/probes.go:344 +0x7ee k8c.io/kubeone/pkg/tasks.runProbes(0xc0003260f0) k8c.io/kubeone/pkg/tasks/probes.go:159 +0x494 k8c.io/kubeone/pkg/tasks.(Task).Run.func1() k8c.io/kubeone/pkg/tasks/task.go:60 +0x94 k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x19cf480, 0x436d01}) k8s.io/apimachinery@v0.22.2/pkg/util/wait/wait.go:217 +0x1b k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x1f75580?, 0xc00003c060?}, 0xc000118230?) k8s.io/apimachinery@v0.22.2/pkg/util/wait/wait.go:230 +0x57 k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0x12a05f200?) k8s.io/apimachinery@v0.22.2/pkg/util/wait/wait.go:223 +0x39 k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff({0x2540be400, 0x4000000000000000, 0x0, 0x9, 0x0}, 0x2d18c40?) k8s.io/apimachinery@v0.22.2/pkg/util/wait/wait.go:418 +0x5f k8c.io/kubeone/pkg/tasks.(Task).Run(0xc000420980, 0xc0003260f0) k8c.io/kubeone/pkg/tasks/task.go:55 +0x11c k8c.io/kubeone/pkg/tasks.Tasks.Run({0xc0001a2460, 0x4, 0x0?}, 0xc0003ff970?) k8c.io/kubeone/pkg/tasks/tasks.go:41 +0xfd k8c.io/kubeone/pkg/cmd.runApply(0xc00007ea80) k8c.io/kubeone/pkg/cmd/apply.go:183 +0x207 k8c.io/kubeone/pkg/cmd.applyCmd.func1(0xc000654a00?, {0x1bbe2e7?, 0x3?, 0x3?}) k8c.io/kubeone/pkg/cmd/apply.go:111 +0x7e github.com/spf13/cobra.(Command).execute(0xc000654a00, {0xc0002ed350, 0x3, 0x3}) github.com/spf13/cobra@v1.1.3/command.go:852 +0x67c github.com/spf13/cobra.(Command).ExecuteC(0xc000654280) github.com/spf13/cobra@v1.1.3/command.go:960 +0x39c github.com/spf13/cobra.(Command).Execute(...) github.com/spf13/cobra@v1.1.3/command.go:897 k8c.io/kubeone/pkg/cmd.Execute() k8c.io/kubeone/pkg/cmd/root.go:52 +0x89 main.main() k8c.io/kubeone/main.go:24 +0x17
Expected behavior
The Program should not run into SIGSERV and terminate with a clean cluster Install
How to reproduce the issue?
Install kubeone via the link in Getting started on Ubuntu 20. And try to install a cluster
What KubeOne version are you using?
Provide your KubeOneCluster manifest here (if applicable)
What cloud provider are you running on?
Running on Baremetal with out Teraform
What operating system are you running in your cluster?
Ubuntu 20.04