Closed jonesn closed 5 years ago
Hi Nick,
Thank you for reporting this. This bug was fixed two weeks ago. Are you using the latest version (0.25.3)?
Here's the output I've just got from running the code in my REPL:
(large-square-matrix-mult-cuda 128)
"Elapsed time: 0.596927 msecs"
{:sum-of-elements 2097152N,
:matrix #CUGEMatrix[float, mxn:128x128, layout:column, offset:0]}
BTW, unrelated to this: your function leaks memory. Always use with-release
for matrix and vector objects when you want to do let
, especially for GPU memory.
BBTW sum
on the CPU is not provided by MKL, but is a Clojure code provided for convenience. If you know that all the elements are positive, asum
is much faster.
Hi,
Yes a bump from 24 -> 25.3 corrects this. And thanks for the tips on usage.
Sum operation in CUDA versus MKL
Hi Dragan, It looks like the sum operation is only taking the first 32 rows or cols of the Matrix in CUDA.
Native seems fine. Examples below.
Native
CUDA