jupyterhub / mybinder.org-deploy

Deployment config files for mybinder.org
https://mybinder-sre.readthedocs.io/en/latest/index.html
BSD 3-Clause "New" or "Revised" License
76 stars 75 forks source link

Broken Grafana dashboard panels #1511

Open manics opened 4 years ago

manics commented 4 years ago

I just went through all the grafana dashboards and found the following problems. I've no idea if these are new, if they're been broken for a while, or if "no data" is intentional.

status ➡️ User pods running

https://grafana.mybinder.org/d/fLoQvRHmk/status?orgId=1&refresh=5m image

Node activity ➡️ DiskAvailable on Nodes

https://grafana.mybinder.org/d/nDQPwi7mk/node-activity?orgId=1 image

Node activity ➡️ Response coded (mybinder.org, hub.mybinder.org, CHP, Redirector)

https://grafana.mybinder.org/d/ygtPwi7ik/network-activity?orgId=1&refresh=1m&var-cluster=default image

Kubernetes API Health ➡️ k8s API calls from JupyterHub / BinderHub [2m]

https://grafana.mybinder.org/d/4QOBFHdiz/kubernetes-api-health?var-cluster=default image

Unchecked

Kubernetes cluster monitoring (binder-prod), attempting to open all the panels caused my browser to freeze 🙂 https://grafana.mybinder.org/d/QLzEwmnmz/kubernetes-cluster-monitoring-binder-prod

welcome[bot] commented 4 years ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

betatim commented 4 years ago

The first graph is "broken" because Turing is currently not working.

I know some of the others also don't work for some of the clusters (didn't check which cluster they were set to).

For DiskAvailable I know that it used to work on GKE, but it is a long time ago that I looked at it.

For all others: we should double check if there is a simple way to revive them or if we ditch them.

Thanks for helping out with housekeeping :D