Closed ShigekiKarita closed 6 years ago
@9il Thank you for fast merge! but I still have a question on my ger example for the recurrent neural network backpropgation algorithm.
git clone https://github.com/ShigekiKarita/numir-char-rnn.git
git checkout -b <branch-name> origin/<branch-name>
git submodule update --init --recursive # lubeck is submoduled
dub clean --all-packages
dub run --compiler=ldc2 -b=release-nobounds
dub run --compiler=ldc2 -b=release-nobounds
https://github.com/ShigekiKarita/numir-char-rnn/blob/mir-blas-gemm/source/app.d#L63-L72 dub run --compiler=ldc2 -b=release-nobounds 59.09s user 0.94s system 395% cpu 15.180 total
https://github.com/ShigekiKarita/numir-char-rnn/blob/mir-blas-ger/source/app.d#L63-L72 dub run --compiler=ldc2 -b=release-nobounds 46.70s user 0.59s system 395% cpu 11.968 total
https://github.com/ShigekiKarita/numir-char-rnn/blob/master/source/app.d#L84-L94 dub run --compiler=ldc2 -b=release-nobounds 21.89s user 0.20s system 390% cpu 5.661 total
Do you have any comments on the reason why this ger inside mtimes is much slower than the raw ger?
@ShigekiKarita
3.Take a look into this line:
auto dh = mtimes(params["Why"].transposed, dy).slice;
There are two memory allocations. The first one is mtimes
, the second one is slice
.
@9il The mtimes documentation could be adjusted to say "General matrix-matrix multiplication. Allocates result to an uninitialized slice using GC."
I don't know how many of the other functions that will be in lubeck will need to think about these sorts of issues. It probably makes sense to create an Issue for further discussion on pre-allocation or the use of alternate allocators.
Thanks, I see. Do you have a plan to add an optional argument to store the result in lubeck? As we can see it in numpy and torch, it is good for the preallocation strategy.
The gemm
wrapper for ndslice from mir-blas do the same.
@9il The mtimes documentation could be adjusted to say "General matrix-matrix multiplication. Allocates result to an uninitialized slice using GC."
PR is welcome. In the same time it is a general Lubeck concept. Lubeck was originally created to port a commercial Matlab library to D. Similarity, readability, and simplicity were key features. The speed was too, but it was already increased more then one hundred times compared with the original Matlab code.
I don't know how many of the other functions that will be in lubeck will need to think about these sorts of issues. It probably makes sense to create an Issue for further discussion on pre-allocation or the use of alternate allocators.
Currently @EmTee70 works on matrix classes that can hold different payloads (like symmteric, diagonal and other matrixes) and have "*" overloaded operation.
I thought a lot about RC based separate Matrix type system with clever expressions that will be felt like Julia or Matlab. Something like that:
Assume it can hold different types of matrixes:
and is clever enough, for example, to fuse at run-time an expressions like
Mat C = alpha * J * J.t - beta * B * R;
into two BLAS (openblas) calls:
syrk
(11 - symmetric rank k operation) for
C = alpha * J * J.t.
and than
gemm
(12 - general rank k operation) for
`C -= beta * B * R`
Plus it would be able to solve linear systems using lapack:
B /= A;
@Laeeth, @firoozye, possibly it may be a good concept for math programming in D.
@9il Submitted PR.
I"m glad progress is being made on general matrix classes. Would Jean-Louis Leroy's open multi-methods library be useful for these run-time features? He has Matrix examples, but I don't see ones that implement operator overloading.
I met the following errors in this special case. I think using ger inside mtimes is the best.
I think this is related to the
LDB
(the 11th arg) specification http://www.netlib.org/lapack/explore-html/d1/d54/group__double__blas__level3_gaeda3cbd99c8fb834a60a6412878226e1.html#gaeda3cbd99c8fb834a60a6412878226e1where B.shape == [k, n]
P.S.
The following transposed B (1 < k < n) seems to be OK. I think only
k=1
has the problem.