pytorch / torchx

TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.
https://pytorch.org/torchx
Other
329 stars 109 forks source link

Support multiple workspace types per scheduler #590

Open d4l3k opened 2 years ago

d4l3k commented 2 years ago

Description

Currently a scheduler is tied to a specific image type. Some schedulers such as LSF (and Slurm) support multiple different types of images. Ideally we could allow swapping in multiple different workspace backends to enable this in a more easy to use way.

Detailed Proposal

  1. Add some new syntax to the --workspace entrypoint to allow specifying different types of workspaces --workspace=docker:. or perhaps with a --workspace-type docker.

  2. Change the Workspace interaction with the schedulers to allow for setting more than one available workspace type per scheduler.

class FooScheduler(Scheduler):
    WORKSPACES: Iterable[Workspace] = (DockerWorkspace, DirWorkspace)
  1. Update runner to be aware of the new workspace selectors

Alternatives

Additional context/links

kurman commented 2 years ago

Will workspace still be synonymous with a 'patch' here?

d4l3k commented 2 years ago

Yeah -- would be. Just allows for different types of images/patching with the same scheduler