Timing of matrix vector multiplication in OpenCL mode

GoogleCodeExporter commented 9 years ago

The precise timing possibility of OpenCL matvec (removed in r1334) makes it 
hard to track issues with the OpenCL kernels on different devices. A desired 
goal would be to add an option to change the command queue into profiling mode 
and get the precise timings from events which are returned by 
clEnqueueNDRangeKernel. It should be possible to do this during runtime, so it 
can be implemented as option to adda directly instead of a compiler option 
using ifdefs. 
This would help to identify performance issues of the kernels on different 
devices.

Original issue reported on code.google.com by Marcus.H...@gmail.com on 21 May 2014 at 7:22

GoogleCodeExporter commented 9 years ago

Existing current workaround is using wall time for the whole ADDA run and 
compare it with total processor time (first two times in log). So the true 
matvec time is approximately equal to ( [Number of solutions (usually 1 or 
2)]*[matvec products (for one solution)] + [Total wall time] - [Total time] 
)/[Total number of matvecs] .

Marcus tested that this approach leads to consistent results for 
moderate-to-large runs. There are, however, two drawbacks: 
- the accuracy of [Total wall time] is 1 s, insufficient for smaller runs.
- there are a number of other reasons that may be causing the difference 
between these two timings, including other tasks executed by the operation 
system.

Original comment by yurkin on 29 May 2014 at 5:33

GoogleCodeExporter commented 9 years ago

In addition to previous comment - the accuracy of wall time results should be 
much better with r1350.

Original comment by yurkin on 1 Jun 2014 at 4:56

GoogleCodeExporter commented 9 years ago

Original comment by yurkin on 3 Aug 2014 at 4:56

Added labels: Component-Logic, Usability

yogevb / a-dda

Timing of matrix vector multiplication in OpenCL mode #197