pytorch / torchx

TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.
https://pytorch.org/torchx
Other
327 stars 109 forks source link

Determine scheduler from component level #824

Open ryxli opened 7 months ago

ryxli commented 7 months ago

❓ Questions and Help

Please note that this issue tracker is not a help form and this issue will be closed.

Before submitting, please ensure you have gone through our documentation.

Question

Is it possible to tell or fill in at runtime which scheduler gets used in component logic? For example, if I have a ddp component, within the component, before I return specs.AppDef, can I set for example a macro that would tell me which scheduler this component gets ran with?

For example, I want to be setting some environment variables but differentiate based on which scheduler gets used.

ryxli commented 7 months ago

For example, if I want component level behavior specific to local schedulers (local, docker), I could for example add an additional redundant parameter --local in the component to specifiy that logic. But I can stick with default behavior for cloud based scheduler, like aws, gcp, etc...

Instead, wondering if it is possible to determine the scheduler from the component and define behavior there for specific schedulers, rather than vice versa. In this way, I can modify the component rather than multiple different schedulers.

Otherwise the only place where this scheduler metadata is propogated is ENV TORCHX_JOB_ID as part of runner api,

        - name: TORCHX_JOB_ID
          value: aws_batch://torchx/job_name

Another unideal hacky solution

cmd = [
  "$(echo $TORCHX_JOB_ID | cut -d':' -f1)"
]

Otherwise the only other way seems to be through macros through string comparisons which also seems unideal

$( [[ '{specs.macros.rank0_env}' == 'TORCHX_RANK0_HOST' ]] && echo local_docker || echo {default_scheduler_name()})