intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.21k stars 723 forks source link

ICE when compiling for gfx1036 #11203

Open tom91136 opened 11 months ago

tom91136 commented 11 months ago

Describe the bug

This is basically the same issue reported in #8016 which resolves it for gfx1034, but not anything newer because the list is not updated. I'm opening this one to bring some attention to https://github.com/intel/llvm/issues/8112 which would resolve this for future AMD GPUs. For reference, HIP, OpenCL, hipSYCL (both SYCL and stdpar), and even roc-stdpar all work on gfx1036 (AMD 7000's RDNA2 iGPU). I've verified this with BabelStream, miniBUDE, CloverLeaf and TeaLeaf.

JackAKirk commented 11 months ago

I think that this patch is all that is needed so long as @mmoadeli agrees: https://github.com/intel/llvm/issues/8016#issuecomment-1382951219

At the moment this little bug means that if a device does not have a corresponding arch macro as defined here, then compilation fails. As described in that linked document the purpose of these macros is to enable/disable features within the runtime implementation. Practically, at least at the moment, much of DPC++ can be used without such a device macro with the little fix suggested by @al42and , and I think it is likely to remain this way for many types of applications long into the future. I have played with the above mentioned fix, and I do not see any downsides to it. As others have mentioned it will allow us to potentially support new architectures before we have got around to adding a macro for them.

Aside: the idea that an arch macro indicates that a device is officially supported by DPC++, is not consistent with reality: many AMD devices that have arch macros in DPC++ are not officially supported by ROCM, and we have not fully tested them to know to what extend they are supported by the features exposed through DPC++.

Again official support doesn't mean that users cannot use such devices fine for some/all applications, and it would be smart for us to let people use such devices if desired. Aside from the AMD issue, I notice also that it is currently not possible to compile kepler (sm_3x) devices due to the same issue. Now sm_3x is not officially supported, however I expect that for many applications it would work. Since there are still active data centres using kepler devices, which for certain applications these older cards can still deliver useful performance (Gromacs is one that comes to mind), there is no reason for us to disallow people from using them.

mmoadeli commented 11 months ago

I agree with you @JackAKirk. Would be good to have this to support the architectures not having the shortcut.

JackAKirk commented 11 months ago

Hi @al42and

Do you want to open a PR for the small patch you suggested here?: https://github.com/intel/llvm/issues/8016#issuecomment-1382951219

I also don't mind doing it.

al42and commented 11 months ago

@JackAKirk: Done, https://github.com/intel/llvm/pull/11254

I did not run a thorough verification, but it (still) works for me, and you said that you used it too, so hopefully that's enough :)