Third party XLA backend support

kevint324 commented 1 year ago

❓ Questions and Help

Hi folks,

First, Happy new year! Hope you had a wonderful vacation.

We are working on a proprietary AI accelerator and we already have a working XLA backend within Tensorflow. And next we are trying to enable our chip in PyTorch through XLA.

I know PyTorch/XLA supports TPU and GPU, and recently AWS announced their Trainium chip and it seems they too use XLA to hook in PyTorch.

I'm asking if there is any design documents to share to help us quickly understand the layers and blocks of PyTorch/XLA design. I've read some design documents about the LTC . I've got a few questions:

From a vendor perspective which pieces of code should we look into to understand how to hook in a new accelerator though LTC?
Is there any extra work with the new announced Dynamo path?
As for the OpenXLA project, XLA is moving out of TF repo. So in some time PyTorch/XLA will depend on the OpenXLA repo instead of TF repo, is it? If yes, when might this happen? Q1? Q2?
Right now it seems the lazy tensor IR is converted to XLA HLO IR without MLIR on PyTorch side. The OpenXLA purposed a new StableHLO as a entry IR. Will PyTorch/XLA adopt StableHLO in some way?
From some docs I see the XRT the being deprecated. So is it a good idea that we just start with the PJRT?

Your reply would be greatly appreciated.

Thanks Kevin

JackCaoG commented 1 year ago

Sorry for missing this issue

I think for the most part vendor need to worry about the XLA more than PyTorch/XLA. PyTorch/XLA is a layer that converts the pytorch ops into xla ops. We are transitioning to the new PJRT runtime, you can check https://github.com/pytorch/xla/blob/master/third_party/xla_client/pjrt_computation_client.h . This is our interface to the hardware. We used it to allocate memory on device, compile a HLO program, execute a HLO program etc..
Dynamo should be opaque from hardware vendor perspective
We should start taking dependency on open XLA around end of q2, but we will unlikely to drop the existing tf dependency because legact XRT runtime and some profiler bits still only lives in tf.
PyTorch/XLA will provide a path to StableHLO in the upcoming quarters.
Yes, please start with PJRt. It is a much better maintained runtime and much cleaner interface/design.

kevint324 commented 1 year ago

Thanks a lot for the clarification. It helps a lot. Please feel free to close the issue.

JackCaoG commented 1 year ago

Thanks @kevint324 ! Feel free to open new issues if you have any other questions!

pytorch / xla

Third party XLA backend support #4406

❓ Questions and Help