Open fwyzard opened 1 year ago
@auroraperego FYI
@igorvorobtsov FYI
Hi @fwyzard, thanks for the report.
Lack of proper support for optional kernel features in AOT mode is a known limitation of our toolchain, but it is unlikely to be supported this year, because it depends on couple other mechanisms and interferes with some ongoing refactoring of the toolchain internals.
Some technical background here:
Two main things need to be supported in order to implement the request:
The first bullet is already partially implemented: -fsycl-targets
supports special values, which specify the exact device, but it has limitations:
Special target values specific to Intel, NVIDIA and AMD Processor Graphics support are accepted, providing a streamlined interface for AOT. Only one of these values at a time is supported.
- intel_gpu_pvc - Ponte Vecchio Intel graphics architecture
- intel_gpu_acm_g12 - Alchemist G12 Intel graphics architecture
- intel_gpu_acm_g11 - Alchemist G11 Intel graphics architecture
- ...
Better implementation, which supports multiple special targets is being designed in #8658
The second bullet will be fulfilled by recently added so-called device config file: #9371, #9846. That work should be expanded further to have full compile-time known database of supported optional features per architecture. Then it should be connected to special targets support and some extra logic in the toolchain to conditionally invoke AOT compiler.
Hi @AlexeySachkov, thanks for the detailed explanation.
The workaround that we are implementing is to use the new device-specific targets to know the actual device at compile time, and use preprocessor checks to make sure we compile a kernel only if the subgroup size is supported (see https://github.com/alpaka-group/alpaka/pull/1845).
The work on the device config files definitely looks interesting!
We are making progress towards support for optional kernel features in AOT mode. In particular, this issue should be resolved by #14590. We don't yet have specific targets for each our CPU, but we have registered a list of known supported sub-group sizes for generic spir64_x86_64
target.
Your example application (changed, but still) was added as a test case into our test suite here: sycl/test-e2e/AOT/reqd-sg-size.cpp
.
I will keep this issue open for now in case there are further questions or feedback
Thank you @AlexeySachkov .
Is the list of supported subgroup sizes 4, 8, 16, 32, 64
?
Given the time line, I assume this is not in oneAPI 2024.2.1, right ?
Is the list of supported subgroup sizes
4, 8, 16, 32, 64
?
Yes
Given the time line, I assume this is not in oneAPI 2024.2.1, right ?
Correct, it will only be a part of the next major release/update and won't be included into hotfix update releases.
OK, thanks.
Describe the bug
When compiling a SYCL/oneAPI application ahead of time for Intel CPUs, the current version of
opencl-aot
(2023.2.0) fails to compile a kernel that uses a subgroup size that is not supported by the OpenCL runtime.According to the SYCL specification, all SYCL implementations must be able to compile device code that uses these optional features (various subgroup sizes etc) regardless of whether the implementation supports the features on any of its devices.
To Reproduce
Please describe the steps to reproduce the behavior:
1. Include code snippet as short as possible:
subgroup_test.cc
2. Specify the command which should be used to compile the program
3. Specify the comment which should be used to launch the program
4. Indicate what is wrong and what was expected
The program fails to compile, with the error
The expected behaviour is that the program should compile correctly, compiling the kernel for all the supported subgroup sizes (4, 8, 16, 32, 64), possibly issuing a warning about the unsupported subgroup sizes (128).
For completeness, CodePlay's NVIDIA plugin produces only a warning about unsupported subgroup sizes, and builds the kernel correctly for the supported one:
Environment (please complete the following information):
Additional context
According to the latest SYCL 2020 specification:
(emphasis added)
Note: I would rate this issue as low priority, because the OpenCL CPU runtime supports the widest range of subgroup sizes (4, 8, 16, 32, 64) than any other SYCL backend. So, while the AOT compiler does not follow the SYCL specification, it is unlikely that this specific issue will cause any real world problems, as nobody will likely use subgroup sizes smaller than 4 or larger than 64.