TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.
Currently a scheduler is tied to a specific image type. Some schedulers such as LSF (and Slurm) support multiple different types of images. Ideally we could allow swapping in multiple different workspace backends to enable this in a more easy to use way.
Detailed Proposal
Add some new syntax to the --workspace entrypoint to allow specifying different types of workspaces --workspace=docker:. or perhaps with a --workspace-type docker.
Change the Workspace interaction with the schedulers to allow for setting more than one available workspace type per scheduler.
class FooScheduler(Scheduler):
WORKSPACES: Iterable[Workspace] = (DockerWorkspace, DirWorkspace)
Update runner to be aware of the new workspace selectors
Alternatives
Additional context/links
Will depend on #589 since it removes workspace specific configs from runopts
Description
Currently a scheduler is tied to a specific image type. Some schedulers such as LSF (and Slurm) support multiple different types of images. Ideally we could allow swapping in multiple different workspace backends to enable this in a more easy to use way.
Detailed Proposal
Add some new syntax to the
--workspace
entrypoint to allow specifying different types of workspaces--workspace=docker:.
or perhaps with a--workspace-type docker
.Change the Workspace interaction with the schedulers to allow for setting more than one available workspace type per scheduler.
Alternatives
Additional context/links
588 - lsf scheduler PR