josephjaspers / blackcat_tensors

Matrix-Vector Library Designed for Neural Network Construction. cuda (gpu) support, openmp (multithreaded cpu) support, partial support of BLAS, expression template based implementation PTX code generation identical to hand written kernels, and support for auto-differentiation
12 stars 4 forks source link

Fix Tensor-Broadcasting and injection optimization #13

Closed josephjaspers closed 5 years ago

josephjaspers commented 5 years ago

IE

y += w * x + b

where y and w are matrices and x and b are vectors will return an incorrect result.

(w*x --a gemv call, will evaluate 'into' y, followed by y += y + b, which will not be correct)

josephjaspers commented 5 years ago

fixed with https://github.com/josephjaspers/BlackCat_Tensors/commit/3e55649398456d2fc5be135ed1da65d5e0e15dda