pytorch / torchx

TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.
https://pytorch.org/torchx
Other
332 stars 110 forks source link

[exploratory] TorchX Dashboard #567

Open d4l3k opened 2 years ago

d4l3k commented 2 years ago

Description

Add a new torchx dashboard command that will launch a local HTTP server that allows users to view all of their jobs with statuses, logs and integration with any ML specific extras such as artifacts, Tensorboard, etc.

Motivation/Background

Currently the interface for TorchX is only via programmatic or via the CLI. It would also be nice to have a UI dashboard that could be used to monitor all of your job as well as support deeper integrations such as experiment tracking and metrics.

Right now if users want to use a UI they have to use their platform specific one (i.e aws batch/ray dashboard) and many don't have one (slurm/volcano).

Detailed Proposal

This would be a fairly simple interface built on top of something such as Flask (https://flask.palletsprojects.com/en/2.1.x/quickstart/).

Pages:

Alternatives

Providing a way to view URLs for external services via the terminal.

Additional context/links

msaroufim commented 2 years ago

As an alternative, I've been seeing people in open source leverage Rich to create a dashboard inside of a terminal https://www.willmcgugan.com/blog/tech/post/building-rich-terminal-dashboards/

So maybe you can reduce the burden of creating a dashboard and the maintenance by just having torchx list/logs return rich text which will be ideal for people doing cloud deployments as well

I've used it a bunch in side projects so let me know if you have any questions https://github.com/Textualize/rich. I'm pretty sure we can hack together a prototype in 1-2 days