Open jinz2014 opened 4 years ago
AFAIK they can. For example there is num_simd_work_items attribute (that works similarly as in OpenCL) in place. @GarveyJoe please comment on this.
Okay. I know that num_simd_work_items and num_compute_unit are attributes placed at the beginning of an OpenCL kernel. Thanks for advising where to place them in a SYCL program if it is not documented in oneAPI.
I indeed see no description in https://software.intel.com/sites/default/files/Intel-oneAPI-DPCPP-FPGA-Optimization-Guide_3.pdf . Perhaps it happen, that it's a pretty new feature, added in the end of December. That's how you can use it:
struct Fun {
void operator()(nd_item<1> it) [[intelfpga::num_simd_work_items(4)]] { ... }
};
// or
auto kernel = [](nd_item<1> it) [[intelfpga::num_simd_work_items(4)]] { ... };
Also there is a spec here: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/KernelRHSAttributes/SYCL_INTEL_attribute_style.asciidoc that describes usage of RHS attributes in general.
P.S. num_compute_unit attribute is not yet implemented. But you can raise the priority by opening such issues :)
I believe it will be implemented one day. There are a few papers that mention the optimizaiton using the compute-unit duplication.
An Empirically Guided Optimization Framework for FPGA OpenCL Exploring FPGA-specific Optimizations for Irregular OpenCL Applications High Performance Computing with FPGAs and OpenCL Performance-oriented Optimizations for OpenCL Streaming Kernels on the FPGA Evaluation of MD5Hash Kernel on OpenCL FPGA Platform.
hassan-fpga-opt-irr-opencl-reconfig18.pdf 3204919.3204920.pdf FPT18.pdf High Performance Computing with FPGA.pdf
Note that in many situations where one would use num_compute_units in OpenCL, the same algorithm can be expressed in SYCL using templated functions as each unique template instantiation of a kernel results in a different physical copy of the hardware. For an example of this technique see this tutorial: https://github.com/intel/BaseKit-code-samples/blob/master/FPGATutorials/FPGAExtensions/Pipes/pipe_array/src/pipe_array.cpp. The consumer function is templated on an int, ConsumerID, that is analagous to a compute ID in OpenCL. That function is invoked with different template parameters on lines 143-148 to produce multiple instances of the kernel.
And num_simd_work_items isn't in the docs yet because it is only supported in the frontend right now (and thus is a no-op). Once the backend support is available it'll be added to the docs.
Otherwise since SYCL is pure C++, it is possible to just unroll the kernel functor by the amount you want with meta-programming.
I am curious how it compares to the alien attribute solution from a QoR perspective...
People are not necessarily familiar with meta-programming. Users would like to apply these attributes in a plain and simple way.
Hi! There have been no updates for at least the last 60 days, though the ticket has assignee(s).
@MrSidims @GarveyJoe, could I ask you to take one of the following actions? :)
Thanks!
Hi! There have been no updates for at least the last 60 days, though the issue has assignee(s).
@MrSidims @GarveyJoe, could you please take one of the following actions:
Thanks!
I am not clear if users can enable kernel SIMD vectorization and compute-unit duplication, which are available in Intel OpenCL SDK for FPGA, in the oneAPI FPGA flow.
Thanks