stfc / PSycloneBench

Various benchmarks used to inform PSyclone optimisations
BSD 3-Clause "New" or "Revised" License
6 stars 5 forks source link

Optimise OpenCL implementations for Xilinx FPGA #70

Closed sergisiso closed 3 years ago

sergisiso commented 3 years ago

This PR improves the PSyclone OpenCL script to generate FPGA code and the OpenCL tasks implementation. This is an effort towards #69, but still more improvements are necessary to achieve an acceptable performance. (the best performance in this PR is still x15 slower than a single core CPU, it still uses only 1 memory bank and bundles all the kernel accesses in a single port)

image

sergisiso commented 3 years ago

@arporter This PR is ready for review. It improves the OpenCL task implementation in manual_versions/psykal_opencl/allkernels_tasks.cl to produce the results in the chart above (is a quite large and verbose implementation as each buffer is manually created read and written). Other than that it has improvements on the xilinx environment files and Makefiles and bring the PSyclone scripts up to date.

sergisiso commented 3 years ago

@arporter This is ready for another look.

As I don't have a Xilinx environment I haven't actually tried building the new/updated version. Please could you confirm that the answers it gives validate?

It give the proper answer (but limited to nx,ny=250 for now, I will generalize it once the memory interface performance gets to appropriate levels)