weaveworks / weave

Simple, resilient multi-host containers networking and more.
https://www.weave.works
Apache License 2.0
6.62k stars 670 forks source link

Kubernetes dashboard - stops working after some time #3260

Closed Adiqq closed 6 years ago

Adiqq commented 6 years ago

What you expected to happen?

Kubernetes dashboard work correctly

What happened?

After some time, kubernetes dashboard stops to work, it displays error: Gateway Timeout (504) the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps)

It worked correctly with Calico 2.6.x , but something is wrong on weave net, it might be specific to kubespray, but same problems occured on weave 2.1.x, when I tested it some time ago.

2018/03/15 19:16:05 Cannot restore settings config map: the server was unable to return a response in the time allotted, but may still be processing the request (post configmaps)
2018/03/15 19:16:05 [2018-03-15T19:16:05Z] Outcoming response to 10.233.64.1:37374 with 200 status code
2018/03/15 19:16:13 Cannot restore settings config map: the server was unable to return a response in the time allotted, but may still be processing the request (post configmaps)
2018/03/15 19:16:13 [2018-03-15T19:16:13Z] Outcoming response to 10.233.64.1:37374 with 200 status code
2018/03/15 19:16:14 the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps)
2018/03/15 19:16:14 [2018-03-15T19:16:14Z] Outcoming response to 10.233.64.1:37374 with 504 status code
2018/03/15 19:17:28 Getting application global configuration
2018/03/15 19:17:28 Application configuration {"serverTime":1521141448766}
2018/03/15 19:17:28 [2018-03-15T19:17:28Z] Incoming HTTP/2.0 GET /api/v1/settings/global request from 10.233.64.1:37374: {}
2018/03/15 19:17:29 [2018-03-15T19:17:29Z] Incoming HTTP/2.0 GET /api/v1/login/status request from 10.233.64.1:37374: {}
2018/03/15 19:17:29 [2018-03-15T19:17:29Z] Outcoming response to 10.233.64.1:37374 with 200 status code
2018/03/15 19:17:29 [2018-03-15T19:17:29Z] Incoming HTTP/2.0 GET /api/v1/systembanner request from 10.233.64.1:37374: {}
2018/03/15 19:17:29 [2018-03-15T19:17:29Z] Outcoming response to 10.233.64.1:37374 with 200 status code
2018/03/15 19:17:29 [2018-03-15T19:17:29Z] Incoming HTTP/2.0 GET /api/v1/login/status request from 10.233.64.1:37374: {}
2018/03/15 19:17:29 [2018-03-15T19:17:29Z] Outcoming response to 10.233.64.1:37374 with 200 status code
2018/03/15 19:17:29 [2018-03-15T19:17:29Z] Incoming HTTP/2.0 GET /api/v1/rbac/status request from 10.233.64.1:37374: {}
2018/03/15 19:17:29 [2018-03-15T19:17:29Z] Incoming HTTP/2.0 GET /api/v1/overview/default?filterBy=&itemsPerPage=10&name=&page=1&sortBy=d,creationTimestamp request from 10.233.64.1:37374: {}
2018/03/15 19:17:29 Getting config category
2018/03/15 19:18:28 Cannot find settings config map: the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps kubernetes-dashboard-settings)
2018/03/15 19:18:29 Couldn't get available api versions from server: the server was unable to return a response in the time allotted, but may still be processing the request
2018/03/15 19:18:29 [2018-03-15T19:18:29Z] Outcoming response to 10.233.64.1:37374 with 500 status code
2018/03/15 19:18:29 the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps)
2018/03/15 19:18:29 [2018-03-15T19:18:29Z] Outcoming response to 10.233.64.1:37374 with 504 status code
2018/03/15 19:19:29 Cannot restore settings config map: the server was unable to return a response in the time allotted, but may still be processing the request (post configmaps)
2018/03/15 19:19:29 [2018-03-15T19:19:29Z] Outcoming response to 10.233.64.1:37374 with 200 status code

How to reproduce it?

Deploy cluster with Kubespray, selecting weave plugin

Anything else we need to know?

Kubespray, bare metal, 2 masters, 3 etcd, 2 workers, all on separate machines

Versions:

Version: 2.2.1 (failed to check latest version - see logs; next check at 2018/03/15 23:14:02)

Service: router
Protocol: weave 1..2
Name: 2e:35:00:fd:81:38(dmse02lx0681c)
Encryption: enabled
PeerDiscovery: enabled
Targets: 4
Connections: 4 (3 established, 1 failed)
Peers: 4 (with 12 established connections)
TrustedSubnets: none

Service: ipam
Status: ready
Range: 10.233.64.0/18
DefaultSubnet: 10.233.64.0/18
$ docker version
Client:
 Version:      17.03.2-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 02:21:36 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.2-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 02:21:36 2017
 OS/Arch:      linux/amd64
 Experimental: false
$ uname -a
Linux dmse02lx0681c 4.14.7-1.el7.elrepo.x86_64 #1 SMP Sun Dec 17 19:36:59 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3+coreos.0", GitCommit:"f588569ed1bd4a6c986205dd0d7b04da4ab1a3b6", GitTreeState:"clean", BuildDate:"2018-02-10T01:42:55Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.4+coreos.0", GitCommit:"5e73e0769e5c4ac497235e2817868b1a37032fba", GitTreeState:"clean", BuildDate:"2018-03-12T20:05:58Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Logs:

Nothing special, log contains only recurring entries like below

$ kubectl logs -n kube-system weave INFO: 2018/03/15 19:21:44.246540 Discovered remote MAC 9a:a6:6c:19:dd:24 at 2e:35:00:fd:81:38(dmse02lx0681c) INFO: 2018/03/15 19:26:06.389819 Discovered remote MAC fe:a3:83:ae:92:66 at 2e:35:00:fd:81:38(dmse02lx0681c) INFO: 2018/03/15 19:26:06.390422 Discovered remote MAC de:73:22:c9:c9:7a at 2e:35:00:fd:81:38(dmse02lx0681c) INFO: 2018/03/15 19:26:06.391007 Discovered remote MAC 0a:51:08:8f:be:c2 at 2e:35:00:fd:81:38(dmse02lx0681c) INFO: 2018/03/15 19:26:06.391265 Discovered remote MAC 5e:f6:93:7b:9f:6e at 2e:35:00:fd:81:38(dmse02lx0681c)


$ kubectl get events
No resources found.
-->
brb commented 6 years ago

Thanks for the report.

Have you checked kube-apiserver logs to see maybe timing out requests fail there?

Also, noting that there are plenty of 503/504 bug reports (https://github.com/kubernetes/dashboard/issues?q=503 and https://github.com/kubernetes/dashboard/issues?q=504) at the project Github page.

Adiqq commented 6 years ago

I didn't check kube-apiserver logs this time, but this issue is specific to kubespray with weave (or weave itself). I use combination of kubernetes dashboard, dex, oauth2_proxy and from what I remember, I could see also problem in dex logs, https://github.com/coreos/dex/issues/992 , there were similiar errors. I went back to Calico and it works fine again.

brb commented 6 years ago

Closing the issue due to inactivity and missing logs.