mle-infrastructure / mle-toolbox

Lightweight Tool to Manage Distributed ML Experiments 🛠
https://mle-infrastructure.github.io/mle_toolbox/toolbox/
MIT License
3 stars 1 forks source link

monitor-cluster with rich dashboard #25

Closed RobertTLange closed 3 years ago

RobertTLange commented 3 years ago

Currently, we are using a hacky combination of terminaltables and colorclass to print the current resource usage of the cluster. Let's move to rich. It looks beautiful and not too bad in terms of adaptation of our current setup. Furthermore, I would like to replace the tabulate table protocol visualization in load_local_protocol_db. This would reduce our install dependencies and replace terminaltables, colorclass and tabulate by a single package.

Check out this mini-tutorial for dashboards: https://www.willmcgugan.com/blog/tech/post/building-rich-terminal-dashboards/

RobertTLange commented 3 years ago

Here is a little progress report of what I got so far:

rich-dashboard-progress-4

Open things to do:

The solution for getting CPU utilisation works via qhost -h <host_name>. The CPU utilisation is given in percent LOAD and can be multiplied with NCPU! Memory on the other hand can be found in MEMTOT and MEMUSE.

RobertTLange commented 3 years ago

There seems to be a python API for monitoring GCP resources: https://googleapis.dev/python/monitoring/latest/query.html Main problem is the lack of nice examples :)

RobertTLange commented 3 years ago

We now have a basic version that works for all three commonly used resources:

In the future: Look into monitoring GCP as well.