Closed RobertTLange closed 3 years ago
Here is a little progress report of what I got so far:
Open things to do:
qconf -se <host_id>
gives overview of resources for a host. Also qstat -j <job_id>
gives all info associated with a specific job. Only problem: I am not sure whether this also works for jobs that are not your own.protocol_summary
print at experiment startup with rich version.The solution for getting CPU utilisation works via qhost -h <host_name>
. The CPU utilisation is given in percent LOAD
and can be multiplied with NCPU
! Memory on the other hand can be found in MEMTOT
and MEMUSE
.
There seems to be a python API for monitoring GCP resources: https://googleapis.dev/python/monitoring/latest/query.html Main problem is the lack of nice examples :)
We now have a basic version that works for all three commonly used resources:
sge-cluster
: #36 local
: 591dc14fba8f09c2a903e26509e6ee04efab43c6slurm-cluster
: e584482e8c0ffc136f9f853a92b95903fe3c41cfIn the future: Look into monitoring GCP as well.
Currently, we are using a hacky combination of
terminaltables
andcolorclass
to print the current resource usage of the cluster. Let's move torich
. It looks beautiful and not too bad in terms of adaptation of our current setup. Furthermore, I would like to replace thetabulate
table protocol visualization inload_local_protocol_db
. This would reduce our install dependencies and replaceterminaltables
,colorclass
andtabulate
by a single package.Check out this mini-tutorial for dashboards: https://www.willmcgugan.com/blog/tech/post/building-rich-terminal-dashboards/