viennacl / viennacl-dev

Developer repository for ViennaCL. Visit http://viennacl.sourceforge.net/ for the latest releases.
Other
281 stars 89 forks source link

How to achieve partial synchronization? #283

Closed WenyinWei closed 4 years ago

WenyinWei commented 4 years ago

Dear all,

May I ask about how to achieve convenient local synchronization? I.e., the concept similar to the wait list in OpenCL. I have noticed the ViennaCL::ocl::finish() function but it seems to be a global synchronization. Is it necessary to use OpenCL backend when I need to use the wait list concept?

karlrupp commented 4 years ago

OpenCL events are not exposed through ViennaCL, as it would undermine the expression template approach for dealing with vector and matrix operations. You will have to use viennacl::backend::finish ()

WenyinWei commented 4 years ago

Hello, Karl. Thanks for your reply.

I have implemented a really small part of ViennaCV library. And I found that I totally do not need to make a wait list to let the code work sequentially. The viennacl knows how to do that.

Great! Thank you for your awesome library ViennaCL.

WenyinWei commented 4 years ago

Hello Karl, one more question.

If I want to migrate the ViennaCL code to other processors such as Xilinx FPGA. Would it involve a lot of modification of the code to make the expression template approach work on these hardwares? Xilinx FPGA has its cl.h and its library so I guess there will be only a minor change to the ViennaCL code. However, there are some special features like wide array transfer, regarding an array of integers together and splitting them to 512 bits type. It may need some more efforts to fully excavate the computational power of FPGA.

Anticipate your opinion to make ViennaCL more universe. WY

karlrupp commented 4 years ago

OpenCL for FPGAs works completely differently. There is no just-in-time compiler, but instead the OpenCL kernels are synthesized into hardware. Last time I checked this was a process of minutes to hours, depending on the complexity of the kernels.

What is required for FPGAs is a library of carefully crafted OpenCL kernels tuned for the particular hardware, which can then be mixed-and-matched for the particular hardware. I'm not even sure whether such a library is technologically possible (given the constraints on work group sizes and composability) when demanding good performance.

WenyinWei commented 4 years ago

Thanks for your prompt reply. It is a really concise and easy-to-understand answer to my problem.