mle-infrastructure / mle-toolbox

Lightweight Tool to Manage Distributed ML Experiments 🛠
https://mle-infrastructure.github.io/mle_toolbox/toolbox/
MIT License
2 stars 0 forks source link

General GridEngine Monitoring #86

Closed RobertTLange closed 2 years ago

RobertTLange commented 2 years ago

Right now the GridEngine monitoring is very much overfit to the SprekelerLab setup. Make it more general and efficient (too many nested loops).

P.S.: Think of writing a test for local monitoring at least.

RobertTLange commented 2 years ago

Think about refactoring rich dashboard and experiment protocol into yet another subpackage mle-monitor. The toolbox then in its core only implements the different experiment types and ties everything together. mle-monitor would implement the following core functionality:

  1. Monitoring functions for different clusters/cloud VM infrastructures.
  2. Dashboard visualization via rich.
  3. Protocol functionalities using Pickle DB.

The toolbox would interface by providing configuration data and calling the different core parts throughout mle run, etc.

RobertTLange commented 2 years ago

See mle-monitor