openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators
Apache License 2.0
2.56k stars 400 forks source link

What is the difference between PJRT, IFRT, and TFRT? #15168

Open eaplatanios opened 1 month ago

eaplatanios commented 1 month ago

I'm currently building a library in Rust that leverages XLA and PJRT and while going through the APIs and the implementation of the Python bindings, I'm a little confused about the terminology and the various interfaces. Specifically, I ran into get_tfrt_cpu_client and its implementation in the Python bindings and I have a couple of questions related to it:

  1. I see that this wrapping the PjRtClient from xla/pjrt into an ifrt::PjRtClient from xla/python/pjrt_ifrt when a distributed client is provided. What is the difference between the two? Specifically, the CPU PjRtClient from xla/pjrt seems to support a distributed setting so what do the Python bindings do differently and is it necessary or is it a by-product of an older implementation?
  2. I think I understand what PJRT is and its goals. However, I cannot find much information about IFRT and do not understand what that part of the codebase does. It also appears to be specific to the Python bindings. Is this true? And more generally, what does it offer on top of PJRT?
  3. I also see the term TFRT used sometimes in the codebase. I assume this stands for something like "TensorFlow Runtime" (?). Is this also a remnant of an older implementation or does it offer something on top of PJRT?

Thank you!

jyingl3 commented 1 month ago

PJRT is device runtime interface. IFRT is distributed runtime interface. IFRT contains a global view of arrays and computations that span devices belong to different hosts, while PJRT only has local view to single host.

Global information in PJRT such as global topology is in the process of being moved to IFRT to make the separation more clear.

TFRT usually refers to one possible implementation of PJRT.

@skye @hawkinsp Any other information you want to add? Thanks!

eaplatanios commented 1 month ago

Thanks for clearing up the difference! In that case, I have a couple follow-up questions:

  1. Is xla/pjrt/distributed/ going away in the future and I should not bother generating bindings for that part of the code?
  2. Why is IFRT located under xla/python/ifrt and xla/python/pjrt_ifrt? Should I assume that these will move outside of the python package at some point since it sounds like they're not specific to the Python bindings?
  3. Is PJRT still responsible for single-node multi-GPU programs? Or is that also going to move to IFRT?

Thanks!

jyingl3 commented 1 month ago
  1. xla/pjrt/distributed/ is not going away. in_memory_key_value_store.* are still being used (it is just an implementation of the key-value store interface. https://github.com/openxla/xla/blob/main/xla/tsl/distributed_runtime/coordination/coordination_service.h is used for the implementations for DistributedRuntimeService so some of the old DistributedRuntimeService are no longer used. Otherwise most of the files are interface/client which are still relevant (I think).

  2. I am not sure why ifrt and pjrt_ifrt are located in xla/python and whether they will be moved out :)

  3. JAX mostly only interacts with IFRT (and IFRT wraps PJRT client, as in xla/python/pjrt_ifrt). Therefore, even it is a single-node multi-GPU programs, JAX still talks to IFRT.

devillove084 commented 3 weeks ago

@eaplatanios I'd like to ask you something. I'm also working on rust binding. Do you have a repo or something available?