Binderhub scalability (grafana)

ltetrel commented 5 years ago

Grafana to see platform usage

agahkarakuzu commented 5 years ago

Prometheus (k8 deployment with manual configs) + Grafana (used Helm). I drafted an ugly dashboard to visualize some of the metrics from BinderHub, you can see mybinder's grafana here :) Grafana is running on localhost:3000 (rn on node3), I ssh-forwarded it to my laptop to see what's going on.

Prometheus and Grafana are under the namespace monitoring. You can see the services and pods:

# Pods responsible for monitoring 
kubectl get pods -n monitoring

# Services responsible for monitoring 
kubectl get svc -n monitoring

# Describe prometheus 
kubectl describe pods prometheus-deployment-bc795b5f4-vz56s -n monitoring

agahkarakuzu commented 5 years ago

I just wondered if in-progress info was accurate and started a build, it looks like working good:

ltetrel commented 5 years ago

Nice ! so how do we access this now ? Maybe we can create a subdomain (binder-graphana.conp.cloud) to view it from the outside ?

agahkarakuzu commented 5 years ago

I think we don't have to bother exposing it to an external IP for now. Plus grafana also requires to set an SMTP server, so that we can add users & set alerts and such. For a while to ensure that it works properly we can do forwarding. You can do the following after ssh:

Get grafana pod name:

export POD_NAME=$(kubectl get pods --namespace monitoring -l "app=grafana,release=grafana" -o jsonpath="{.items[0].metadata.name}")

Forward grafana to port 3000:

kubectl --namespace monitoring port-forward $POD_NAME 3000

Open a new terminal and forward port 3000 of remote to your local through ssh tunneling:

ssh -L 4000:127.0.0.1:3000 ubuntu@binder.conp.cloud

On your local just visit http://localhost:4000. If it asks for a password, ping me on slack :)

PS Prometheus is accessible as a service on port 30000 , you can forward it to another localhost port of your machine (e.g. 4001 🤷‍♂) and see them side by side.

A simple binder to test Dashboard in-progress launch:

https://binder.conp.cloud/v2/gh/agahkarakuzu/gbm2330/master

ltetrel commented 5 years ago

Is there a way to automatize this step ? And also you mentionned it is running on node 3, so this grafana will work just for this node, or it will benchmark all the other nodes also ?

agahkarakuzu commented 5 years ago

Prometheus scrapes metrics from the endpoints of all the nodes, you can see them on the Prometheus dashboard/status (port 30000). Grafana has access to all of them to use as data source.

The installations are on the k8 master node, so it should be up to load balancer to select a worker for the service. To let Grafana know, I had to use the IP of the Prometheus k8 service (the address that shows up when you do kubectl describe svc -n monitoring), if this IP is not dependent on the node on which it is spawned, it should make it through load balancing.

By automatizing, if you mean making it permanently present at a given address, sure.

ltetrel commented 5 years ago

by automatizing, I mean just reduce the burden of the current method (Get grafana pod name, Forward grafana to port 3000) :)

agahkarakuzu commented 5 years ago

Sure, we can expose it as a k8 service and access it over a DNS or IP right away to free you from the burden of the current method :)

ltetrel commented 5 years ago

Hey @agahkarakuzu, Do you have some documentation on how to setup the grafana (if we want to do it for other servers), since we are working with @mathieuboudreau to make a reproducible environment to create new binderhub instances on our allocation. Also do you have docs that explains how to use your grafana ?

Thanks :)

agahkarakuzu commented 5 years ago

Hey @ltetrel I don't have a through documentation, I just used Helms and some custom k8 configs. Resources I used are on the master. I had some handwritten notes, which are 9000km away from me atm :/ I can do that later, but if it is super urgent, I am afraid that I won't be available for a while.

To interact with grafana, you can refer to https://github.com/neurolibre/neurolibre-binderhub/issues/5#issuecomment-539133153 . As for usage, it is super intuitive. You can create/modify dashboards effortlessly. Once you login, the rest will make itself clear to you I believe.

ltetrel commented 5 years ago

ok thanks, we will work on the documentation later then

ltetrel commented 5 years ago

Since I run your commands, we cannot access anymore https://binder.conp.cloud Do you have some ideas ? maybe the port forwarding is not safe with binderhub ? Did you tested it on the master node or just on your side?

ltetrel commented 5 years ago

I am not sure if it is because of Arbutus outage last days or these commands :/ From cc side they told me everything is fine on their side...

agahkarakuzu commented 5 years ago

kubectl --namespace monitoring port-forward $POD_NAME 3000

There is nothing dangerous with port forwarding, all the services in cluster are doing it all the time. I used this many times without any problem. On the other hand, it has nothing to do with BinderHub itself, other than that they are both running on the same node.

neurolibre / neurolibre-binderhub

Binderhub scalability (grafana) #5