Closed salvis2 closed 4 years ago
I will also update the CI and fix that merge conflict.
Monitoring deployments are up!
Both have HTTPS, which is cool. However, the GCP site seems to have almost no data. If I launch the dashboard "Cluster Monitoring for Kubernetes" (you should see it if you go to Dashboards > Manage), I find many pods with the AWS graphs, but only one on GCP. That single "pod" is named "Value" but is using quite a bit of memory, so I'm suspecting this is just the total.
A thought that I had when looking at the GCP binder cluster: I think there's a config error with some ingress bit on staging. I see kubectl get svc -n staging binderhub-proxy-nginx-ingress-controller
has an EXTERNAL-IP
of <pending>
. There is also a Service binderhub-proxy-nginx-ingress-default-backend
. There are similar Services but with names that start with staging
or prod
instead of binderhub-proxy
on their respective namespaces. The proper services are also on AWS.
On both staging deployments, I ran kubectl get deployments -n staging
. The GCP one appears to have these extra nginx-ingress
bits that are 41 days old. Maybe they were manually deployed by accident? Do you think I can just delete them @TomAugspurger ?
It's definitely possible that I messed up the ingress stuff. I'm fine with you deleting them, and if something breaks then I can take a look.
Deleted those Services / Deployments and nothing broke. The dashboards still have wildly different levels of details though.
Thanks for working on this. Feel free to merge when you're ready.
Closes #166 . See that issue for background on why the other helm charts are no good.
The old
stable
helm charts for Prometheus and Grafana have been deprecated, the new ones are here:pangeo-binder/requirements.yaml
andk8s-aws/readme.md
have been updated with the new helm repos. Both charts are pinned to their most recent versions.Configuration changes were mostly untabbing, since there is no longer an umbrella chart for these two. However, I changed Grafana's ingress configuration from the example in the Grafana readme, and suddenly it works with HTTPS. I also had to manually fill in the location of the Prometheus data source. The only way I could get that connected was by using the
ClusterIP
I find withkubectl get svc -n staging staging-prometheus-server
. I moved the datasource configuration to the secrets file, since now it will depend on the cluster and should not be a publicly-known address.For now, since we only need the one deployment, I will leave it on
staging
and remove the AWSprod
config for monitoring. @TomAugspurger let me know if you'd like me to set up the equivalent config for GCP. I should be able to log in and test manual deployment on staging / get theClusterIP
myself.