Open al42and opened 2 years ago
@al42and Could you please provide an example of the code which uses such a compile-time macro?
Here is a specific discussion we had with @anton-v-gorshkov about our use case: https://gitlab.com/gromacs/gromacs/-/merge_requests/2248/diffs#note_743987749
The code there might be overly convoluted for historical reasons. But, in essence, something like this is attempted:
template<int subGroupSize>
void submitKernel(sycl::queue q, sycl::global_ptr<float> data, int size) {
// Submit the kernel which uses sub-group functionality
// Kernel is complex and takes a long time to compile
}
void doStuff(sycl::device dev, sycl::queue q, sycl::global_ptr<float> data, int size) {
switch(getVendor(dev)) {
case Vendor::Nvidia:
#if HAVE_NVIDIA
return submitKernel<32>(q, data, size);
#else
assert(false); // Don't instantiate the template for 32, don't waste time compiling it.
#endif
case Vendor::Intel:
#if HAVE_INTEL
return submitKernel<16>(q, data, size);
#else
assert(false); // Don't instantiate the template for 16, don't waste time compiling it.
#endif
}
}
EDIT: Subgroup size is the most obvious example. There might be other differences, e.g. whether to manually prefetch some values.
EDIT2: As a workaround for faster compilation here, one can do an early return in the kernel (e.g., if (defined(__NVPTX__) && subGroupSize != 32)
). But that does not help with other issues, like filtering out incompatible devices early.
@al42and just to make sure I understand requirement - you want these macros set during host compilation? For device compilation we have existing macros like NVPTX, etc
@al42and just to make sure I understand requirement - you want these macros set during host compilation? For device compilation we have existing macros like NVPTX, etc
Yes, I specifically want to check in the host code which offload architectures are enabled.
@al42and can the type trait any_device_has<aspect>
be used for this purpose? It is defined in SYCL 2020 spec as follows -
The implementation also provides two traits that the application can use to query aspects at compilation time. The trait any_device_has
inherits from std::true_type if the compilation environment supports any device which has the specified aspect, and it inherits from std::false_type if no device has the aspect. The trait all_devices_have inherits from std::true_type if all devices supported by the compilation environment have the specified aspect, and it inherits from std::false_type if any device does not have the aspect.
We are considering adding an extended aspect for each device type. For example, we might define aspects "aspect::ext_oneapi_intel_gpu" and "aspect::ext_oneapi_nvidia_gpu". Application can then be -
void doStuff(sycl::device dev, sycl::queue q, sycl::global_ptr<float> data, int size) {
switch(getVendor(dev)) {
case Vendor::Nvidia:
if constexpr (sycl::any_device_has_v<sycl::aspect::ext_oneapi_nvidia_gpu>) {
return submitKernel<32>(q, data, size);
} else {
assert(false); // Don't instantiate the template for 32, don't waste time compiling it.
}
case Vendr::Intel:
if constexpr (sycl::any_device_has_v<sycl::aspect::ext_oneapi_intel_gpu>) {
return submitKernel<16>(q, data, size);
} else {
assert(false); // Don't instantiate the template for 16, don't waste time compiling it.
}
}
}
After discussions with the team, the consensus is that we will be implementing the macros as an extension
@elizabethandrews, the solution with sycl::any_device_has
is indeed more elegant and sycl-esque than macros. As far as I can tell, it solves my problem perfectly.
That said, macros are ok too.
I believe macros help with avoiding all compile time overheads and offers more flexibility in some cases. So there is some sentiment to support it as well. Users can then choose to use whatever best suits their application
Hi!
Any progress on this?
I see that oneMKL project also has to manually parse the compiler flags in CMake to get the list of targets, and this is, to be honest, not a pretty solution.
Is your feature request related to a problem? Please describe
One or more targets can be passed to
-fsycl-targets
. For the program being compiled, it can be beneficial to know at compile time which targets were used. In my use case, different flavors of a kernel are used for different architectures (NVIDIA, Intel). If a certain architecture is not among the targets, one can skip compiling the corresponding flavor.An additional benefit is an ability to early filter-out outright-incompatible devices (
sycl::is_compatible
is more robust, but does not appear to be working at the moment: #5561).Describe the solution you would like
Describe alternatives you have considered
constexpr
aspect/flag tosycl::backend
.