viennacl / viennacl-dev

Developer repository for ViennaCL. Visit http://viennacl.sourceforge.net/ for the latest releases.
Other
281 stars 89 forks source link

Tuning for Qualcomm Adreno 330 #248

Closed sivagnanamn closed 6 years ago

sivagnanamn commented 6 years ago

Below is the benchmark of ViennaCL on Adreno 330 GPU (OpenCL 1.1 embedded profile):

----------------------------------------------
               Device Info
----------------------------------------------

Name:                QUALCOMM Adreno(TM)
Vendor:              QUALCOMM
Type:                GPU 
Available:           1
Max Compute Units:   4
Max Work Group Size: 512
Global Mem Size:     852883456
Local Mem Size:      8192
Local Mem Type:      1
Host Unified Memory: 1

Benchmark : BLAS
----------------
sCOPY : 0.806 GB/s
sAXPY : 0.756 GB/s
sDOT : 2.43 GB/s
sGEMV-N : 1.17 GB/s
sGEMV-T : 0.974 GB/s
sGEMM-NN : 0.00918 GFLOPs/s
sGEMM-NT : 0.00845 GFLOPs/s
sGEMM-TN : 0.00849 GFLOPs/s
sGEMM-TT : 0.00837 GFLOPs/s
----

Performance seems to be very sub-optimal. How to tune ViennaCL GEMM for this hardware?

karlrupp commented 6 years ago

We have look intensively at embedded GPUs two years ago and were unable to get any decent performance via OpenCL (even with our extensive autotuning framework). I claim that the GPU is simply too old and you will see better performance with more recent generations.