intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.21k stars 724 forks source link

[SYCL][FPGA] question about vectorization and compute unit duplication #1589

Open jinz2014 opened 4 years ago

jinz2014 commented 4 years ago

I am not clear if users can enable kernel SIMD vectorization and compute-unit duplication, which are available in Intel OpenCL SDK for FPGA, in the oneAPI FPGA flow.

Thanks

MrSidims commented 4 years ago

AFAIK they can. For example there is num_simd_work_items attribute (that works similarly as in OpenCL) in place. @GarveyJoe please comment on this.

jinz2014 commented 4 years ago

Okay. I know that num_simd_work_items and num_compute_unit are attributes placed at the beginning of an OpenCL kernel. Thanks for advising where to place them in a SYCL program if it is not documented in oneAPI.

MrSidims commented 4 years ago

I indeed see no description in https://software.intel.com/sites/default/files/Intel-oneAPI-DPCPP-FPGA-Optimization-Guide_3.pdf . Perhaps it happen, that it's a pretty new feature, added in the end of December. That's how you can use it:

struct Fun {
   void operator()(nd_item<1> it) [[intelfpga::num_simd_work_items(4)]] { ... }
 };

// or

auto kernel = [](nd_item<1> it) [[intelfpga::num_simd_work_items(4)]] { ... };

Also there is a spec here: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/KernelRHSAttributes/SYCL_INTEL_attribute_style.asciidoc that describes usage of RHS attributes in general.

MrSidims commented 4 years ago

P.S. num_compute_unit attribute is not yet implemented. But you can raise the priority by opening such issues :)

jinz2014 commented 4 years ago

I believe it will be implemented one day. There are a few papers that mention the optimizaiton using the compute-unit duplication.

An Empirically Guided Optimization Framework for FPGA OpenCL Exploring FPGA-specific Optimizations for Irregular OpenCL Applications High Performance Computing with FPGAs and OpenCL Performance-oriented Optimizations for OpenCL Streaming Kernels on the FPGA Evaluation of MD5Hash Kernel on OpenCL FPGA Platform.

hassan-fpga-opt-irr-opencl-reconfig18.pdf 3204919.3204920.pdf FPT18.pdf High Performance Computing with FPGA.pdf

GarveyJoe commented 4 years ago

Note that in many situations where one would use num_compute_units in OpenCL, the same algorithm can be expressed in SYCL using templated functions as each unique template instantiation of a kernel results in a different physical copy of the hardware. For an example of this technique see this tutorial: https://github.com/intel/BaseKit-code-samples/blob/master/FPGATutorials/FPGAExtensions/Pipes/pipe_array/src/pipe_array.cpp. The consumer function is templated on an int, ConsumerID, that is analagous to a compute ID in OpenCL. That function is invoked with different template parameters on lines 143-148 to produce multiple instances of the kernel.

GarveyJoe commented 4 years ago

And num_simd_work_items isn't in the docs yet because it is only supported in the frontend right now (and thus is a no-op). Once the backend support is available it'll be added to the docs.

keryell commented 4 years ago

Otherwise since SYCL is pure C++, it is possible to just unroll the kernel functor by the amount you want with meta-programming.

I am curious how it compares to the alien attribute solution from a QoR perspective...

jinz2014 commented 4 years ago

People are not necessarily familiar with meta-programming. Users would like to apply these attributes in a plain and simple way.

KornevNikita commented 3 months ago

Hi! There have been no updates for at least the last 60 days, though the ticket has assignee(s).

@MrSidims @GarveyJoe, could I ask you to take one of the following actions? :)

Thanks!

github-actions[bot] commented 1 month ago

Hi! There have been no updates for at least the last 60 days, though the issue has assignee(s).

@MrSidims @GarveyJoe, could you please take one of the following actions:

Thanks!