Feature Request: Support caching HIP compilation using clang

Similar to NVIDIA CUDA, AMD's HIP can also be compiled using Clang, and I wonder whether maintainers would accept a new feature to cache HIP compilations. It works basically the same as caching CUDA using clang, e.g. you enable compiling as HIP with -x hip. Caching HIP is very beneficial because HIP compilation times grows linearly with the number of distinct GPU architectures to support, i.e. if the compiled program wants to run a 10 different GPU architectures it basically needs to be compiled 10 times, once for each architecture :sweat_smile: For a more detailed motivation, please see ROCm/ROCm#2817.

I have already implemented a prototype of this feature and I'm happy to continue to maintain this feature. The prototype has already been working really efficiently for caching HIP compilations and have saved me massive amounts of time when packaging AMD's ROCm software stack for Solus Linux.

The only potential trouble during the implementation is that implicit dependencies on some LLVM bitcodes and HIP runtime headers may be introduced, as shown here in the official documentation, but I think it is definitely not impossible to deal with this.

mozilla / sccache

Feature Request: Support caching HIP compilation using clang #2044