Add DeferRelocatableCompilationCompilationProvider

This adds a compilation provider which adds limited support for parallel compilation even when the delegate compilation provider doesn't support compilation into a relocatable module.

Parallel compilation works by:

Splitting the LLVM module into smaller modules at function boundaries
Lowering each of the smaller modules in parallel in a thread pool
and compiling the PTX into relocatable CUBIN modules in parallel.
Linking everything together

Only ptxas and nvptxcompiler allow compilation into relocatable modules, but both of these two methods are not always available.

To still benefit from parallel LLVM lowering while not writing an entirely new compilation pipeline this compilation provider defers PTX compilation to the linking step.

PTX compilation will then not happen in parallel, but at least LLVM lowering will.

The implementation is not a new one. The same workaround is currently used in nvptxcompiler. This component will replace it.

openxla / xla

Add DeferRelocatableCompilationCompilationProvider #19831