Closed GoogleCodeExporter closed 8 years ago
The current revision r1023 is capable of doing the matrix vector multiplication
part on the GPU as long as no dimension is bigger than 512/256 on Nvidia/AMD
GPUs respectively. This restriction is because of the used FFT routine.
This might be a new issue but belongs to GPU acceleration as well:
AMD GPUs seem to suffer a lot more from unaligned read and write operations on
the global GPU memory in a kernel. So the next task is to align the global
memory access using local memory as cache. Both, AMD and Nvidia GPUs will
benefit from that.
Original comment by Marcus.H...@gmail.com
on 20 Feb 2011 at 8:26
Original comment by yurkin
on 10 Jun 2011 at 2:05
I have added a few specific issues for further development of adda_ocl. It is
already quite mature and almost ready for ADDA 1.1 release. Thus I am closing
this issue.
Original comment by yurkin
on 16 Apr 2012 at 1:38
Original issue reported on code.google.com by
yurkin
on 2 Dec 2010 at 5:02