Matrix-Vector Library Designed for Neural Network Construction. cuda (gpu) support, openmp (multithreaded cpu) support, partial support of BLAS, expression template based implementation PTX code generation identical to hand written kernels, and support for auto-differentiation
12
stars
4
forks
source link
Fix Tensor-Broadcasting and injection optimization #13
IE
y += w * x + b
where y and w are matrices and x and b are vectors will return an incorrect result.
(w*x --a gemv call, will evaluate 'into' y, followed by y += y + b, which will not be correct)