openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators
Apache License 2.0
2.74k stars 440 forks source link

Add DeferRelocatableCompilationCompilationProvider #19831

Closed copybara-service[bot] closed 3 days ago

copybara-service[bot] commented 3 days ago

Add DeferRelocatableCompilationCompilationProvider

This adds a compilation provider which adds limited support for parallel compilation even when the delegate compilation provider doesn't support compilation into a relocatable module.

Parallel compilation works by:

  1. Splitting the LLVM module into smaller modules at function boundaries
  2. Lowering each of the smaller modules in parallel in a thread pool
  3. and compiling the PTX into relocatable CUBIN modules in parallel.
  4. Linking everything together

Only ptxas and nvptxcompiler allow compilation into relocatable modules, but both of these two methods are not always available.

To still benefit from parallel LLVM lowering while not writing an entirely new compilation pipeline this compilation provider defers PTX compilation to the linking step.

PTX compilation will then not happen in parallel, but at least LLVM lowering will.

The implementation is not a new one. The same workaround is currently used in nvptxcompiler. This component will replace it.