openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators
Apache License 2.0
2.56k stars 400 forks source link

Porting XLA to different backends. #12524

Open abhaskumarsinha opened 3 months ago

abhaskumarsinha commented 3 months ago

Hello,

I'm very new to the XLA project, so pardon my ignorance here. I'm trying to learn more about the project, optimizations, and their interactions with the hardware. The code everywhere seems well-written and properly documented. The project seems very vast that has been maintained through the effort of tremendous volunteers and not so easy to debug language like Python, so it's very easy to get lost.

Here's what I've been able to gather about the project. The DL frameworks seem to construct graphs of their Deep Learning model prior to the computation, where these graphs with their tensor computations get optimized by XLA-like tools called - Linear Algebra Compilers.

Now, XLA uses HLO to perform such optimizations to produce latent computational information which is executed by CPU/GPU/TPU into their desired computational formats.


My goal here was to learn how backends of such can be ported swiftly to here are some of my learnings:

  1. The whole project depends upon the ecosystem of LLVM compilers.
  2. The LLVM Compilers need to be set and XLA::Compiler and XLA::CPUCompiler, XLA::GPUCompiler needs to be re-written accordingly according to the target platform. These correspond to the files here: xla\service\compiler.h, xla\service\cpu\cpu_compiler.h and similarly for GPU/TPU.
  3. In case, if someone fails to port LLVM and LLVM libraries, then things get ugly. All all the four header files mentioned here would require porting, knowing, executor contains stream.h and stream_executor_pimpl.h.

Right?


Also in case, please clarify do they need to ONLY adapt these header files into their new hardware (which doesn't seems so) OR other header files that have been additionally included into those header files too? (that'd be a bit of effort!). For example - if I manage to get LLVM into specific target hardware and now I seek to port my code into that particular hardware, to get that, do I need to ONLY start porting xla\service\compiler.h, xla\service\cpu\cpu_compiler.h OR every header file that has been included in those headers? (which amounts to 18+16 = 34 header files in total in number?)


In case there is any error in understanding the concept, or any disparity in my explanation do please point that out.

Best Regards, Abhas Kumar Sinha

abhaskumarsinha commented 3 months ago

I'm a bit confused.

rvptx_compiler.cc - which is supposed to be ported when using GPU-like ISA by default uses CUDA libraries, while there are https://github.com/openxla/xla/blob/main/xla/service/gpu/amdgpu_compiler.cc too probably corresponding to ROC-m by AMD.

Is the user supposed to remove Cuda libraries from rvptx_compiler.cc and include amdgpu_compiler.cc like libraries when trying to compile XLA for AMD GPUs? including gpu_compiler.cc?

abhaskumarsinha commented 3 months ago

Ah Oh,

The docs page: Developing a new backend for XLA has a bug!

Under the section Scenario 2

It is possible to model a new xla::Compiler implementation on the existing xla::CPUCompiler and xla::GPUCompiler classes, since these already emit LLVM IR. Depending on the nature of the hardware, it is possible that many aspects of the LLVM IR generation will have to be changed, but a lot of code can be shared with the existing backends.

A good example to follow is the GPU backend of XLA. The GPU backend targets a non-CPU-like ISA, and therefore some aspects of its code generation are unique to the GPU domain. Other kinds of hardware, e.g. DSPs like Hexagon (which has an upstream LLVM backend), can reuse parts of the LLVM IR emission logic, but other parts will be unique.

The Link xla::GPUCompiler should be gpu_compiler.cc and NOT nvptx_compiler.cc. NVPTX stands for "NVIDIA Parallel Thread Execution" which is a CUDA ISA that needs to be compiled with CUDA flags during built. Similarly, amdgpu_compiler.cc for ROCM devices exist too in the same library.


The instructions for Build from source are given here that support ONLY cuda or rocm devices, but NOT custom GPU that has been built! Please add a section explaining, how custom GPU-like ISA Devices that has been implemented in XLA can be compiled that aren't CUDA, nor ROCm.

No wonder, I had been scratching my head for the last few days.

GleasonK commented 3 months ago

Hello! Thanks for catching this!

The website is generated from repo markdown docs: https://github.com/openxla/xla/blob/main/docs/developing_new_backend.md

If you have a moment to make that fix feel free to send a PR, also if there are any other things you've learned while digging that you think should be in this doc, all improvements are welcome! Otherwise I'll try to get to this edit next week.

abhaskumarsinha commented 3 months ago

Hello @GleasonK ,

Thank you for the reply, I just realized this morning that gpu_compiler.cc is not a template, rather it is a common library file that is being included in ROCm (as amdgpu_compiler.cc) as well as in CUDA (nvptx_compiler.cc). I apparently figured it out that the docs did link the CUDA backend compiler file because it was complete (supporting 16-bit 4D Conv operations as well as Matmul), whereas ROCm of AMD is a bit limited in the implementation, so the doc authors probably wanted the readers to get an idea from NVPTX Compiler files of CUDA.

Apologies for the oversight, it was totally on me, and missed it. Unfortunately, there is very little documentation on how to extend XLA to different backends, especially for different devices, i.e. template files for newer compilers, what classes to implement, and where they should be, and how the installation should proceed post such additions.

My goal was to get OpenXLA and subsequently TensorFlow into either - OpenCL or SYCL, so that open-source GPUs or ASICs could also leverage the library as much as closed-source hardware makers like NVIDIA/AMD GPUs. I hoped the documentation could get more detailed in this regard so it becomes a bit easy-to-follow.

Please take it as a grain of salt, I'm not very good at C++, but I've been trying since past few days to work with different backends here.

Thank You for the response.