rancher / rancher

Complete container management platform
http://rancher.com
Apache License 2.0
23.25k stars 2.95k forks source link

Websockets not updating UI for cluster components. #22258

Closed jhartzell closed 5 years ago

jhartzell commented 5 years ago

What kind of request is this (question/bug/enhancement/feature request): Bug (I think?), on a number of versions I've tried. 2.2.4, 2.2.5, 2.2.6 and the latest as of writing (2.2.7)

Steps to reproduce (least amount of steps as possible):

  1. Create a Compute Instance
    • Machine Type: n1-standard-1
    • Image: ubuntu-os-cloud/ubuntu-1804-lts
  2. Run the following bootstrap commands, this is part of my Terraform so I will just add them line by line.
    sudo apt-get update
    sudo apt-get install -yq apt-transport-https ca-certificates curl software-properties-common
    sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
    sudo add-apt-repository 'deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable'
    sudo apt-get update
    sudo apt-cache policy docker-ce
    sudo apt-get install docker-ce=5:18.09.3~3-0~ubuntu-bionic
    sudo docker run -d --restart=unless-stopped -p 80:80 -p 443:443 -v /opt/rancher:/var/lib/rancher rancher/rancher --acme-domain <rancher.domain.com>
  3. Create a cluster with your google credentials json, any cluster doesn't matter from my tests.
  4. Deploy a workload nginxdemos/hello
  5. Under workloads, twirl down the deployed workload and increment the worker count from 1 to 2. Voila! UI will not update. It will infinitely sit there until you refresh the page.

Result: Upon creating the cluster, the websockets connection is flawless, you get provisioning status updates back as expected, however, as soon as the cluster is created and you attempt to scale a newly created workload, you will not see any live updates. This also applies to details of a workload (by clicking on the workload name and viewing the list of deployed pods, if you attempt to delete an instance from the pod list, it will not update until you refresh.

Maybe this screenshot will illustrate better what I'm attempting to articulate. Notice the current state of websockets in DevTools. The ironic part of this is the next screenshot.

Screen Shot 2019-08-17 at 8 36 43 PM

sockId=1 & sockId=2 are the only two connections receiving any messages. The rest of the socket connections are just empty

Screen Shot 2019-08-17 at 8 55 41 PM

Other details that may be helpful: I have tried this with various different configurations, SSL, no SSL, Let's Encrypt, domain, no domain, same results either way. The issue to me just seems like a Websocket bug with the UI. I say this because sockId=1 and sockId=2 are receiving proper data, I can't recall which exactly is getting the cluster level data but one of them is, its just not bound properly to the UI.

Also, maybe worth noting, socket connections will infinitely be created, I've seen up to 30 active until the timed refresh hits.

OS Details

NAME="Ubuntu" VERSION="18.04.2 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.2 LTS" VERSION_ID="18.04"

Kernel Details

4.15.0-1040-gcp

Docker Version

Client: Version: 18.09.3 API version: 1.39 Go version: go1.10.8 Git commit: 774a1f4 Built: Thu Feb 28 06:53:11 2019 OS/Arch: linux/amd64 Experimental: false

Docker Info Command

https://gist.github.com/jhartzell/e12fdd05eb4298ab8bfea8359094ad23

Further Details

Standalone, Single node - rancher/rancher image

kubectl client - v1.13.7 kubectl server - v1.13.7-gke.19

Can provide further details but I figure this is long enough for now.

jhartzell commented 5 years ago

Also noticed some rancher container logs when sitting on the cluster page.

Screen Shot 2019-08-17 at 11 48 11 PM

jhartzell commented 5 years ago

I'm closing this, after hour 7 and doing even further testing I've found that the issue was not rancher itself, it was a borked firewall rule.. I've confirmed this by hosting up an entirely new GCP account and retrying my steps above. It worked without fail, now to go find that rule :)