pytorch / xla

Enabling PyTorch on XLA Devices (e.g. Google TPU)
https://pytorch.org/xla
Other
2.47k stars 471 forks source link

inconsistency in calling `get_ordinal` and `world_size` calls #7754

Closed miladm closed 2 months ago

miladm commented 2 months ago

📚 Usability / API / Documentation

Calling get_ordinal and world_size (1) are inconsistent in the use of their underlying library (2) are being migrated from xla_model to runtime.

(1) the two API must come from the same underlying package / import (2) there must be a compelling reason why we want to change the user code from xla_model to run_time. Is the reason is compelling, we need a RFC for it. (3) most importantly, we need a proposal to upstream these APIs

cc @will-cromar

zpcore commented 2 months ago

We are supposed to use xr.global_ordinal() and xr.world_size(). I will update the http documentation today.

will-cromar commented 2 months ago

The reasoning for lifting fundamental functionality higher in our API is covered in #6399. torch_xla.core.xla_model is needlessly verbose.

At this point, one import already gets you access to nearly all basic APIs you actually need to write PyTorch/XLA:

import torch_xla as xla

# Examples
xla.sync()
xla.device()
xla.runtime.world_size()
xla.runtime.global_ordinal()
xla.launch()

The only thing better than having ergonomic APIs is being an invisible backend of upstream PyTorch. For distributed APIs such as rank and world size, the best option is to implement torch.distributed and just use their APIs (dist.rank, dist.world_size()) in 90% of cases, but there is still work to be done there.

zpcore commented 2 months ago

Unify the path of get_ordinal, world_size in https://github.com/pytorch/xla/pull/7753.