pytorch / elastic

PyTorch elastic training
BSD 3-Clause "New" or "Revised" License
730 stars 98 forks source link

implement ElasticRole, role args macro substitution #123

Closed kiukchung closed 4 years ago

kiukchung commented 4 years ago

Summary: This diff adds native support for a role that launches with torchelastic (a.k.a ElasticRole) to make this work I do the following three things:

  1. Implements ElasticRole (and ElasticRoleFB for fb internal)
  2. Implements macro substitution for {img_root} and {app_id} which can be used with Role.args (this is needed to specify launch --elastic_args ${img_root}/my_trainer_binary.par).
  3. Removes redundant object APIs from Session and use them directly as defined in torchelastic.tsm.driver.api

Differential Revision: D23701234

facebook-github-bot commented 4 years ago

This pull request was exported from Phabricator. Differential Revision: D23701234

facebook-github-bot commented 4 years ago

This pull request has been merged in pytorch/elastic@64caa5e6b52bffd0938836b15349ee8d96e75494.