Open 2020zyc opened 5 years ago
Hi! My explanation is that tensor decomposition methods require more mathematical operations: instead of one (highly optimized in Pytorch) matrix multiplication, we have several. I think it is possible to optimize our code of Tensor Train and Tucker methods and make it faster, but it is not obvious how to do it more efficiently.
It could be implemented it in one operation with einsum, however pytorch does not fully support broadcasting for einsum. (it did worked for me in numpy though).
However, I assume that torch.einsum calls many matmul operations "behind the surface" (like it does in tensorflow) so it won't be much better.
I also thought about implementing it as a numba kernel (however found that numba does not support einsum too).
thanks @saareliad
Can einsum accelerate the many matmul operations produced by tt/tucker decomposition?
It could be implemented it in one operation with einsum
And how to implement with einsum in one operation?
thanks @saareliad
Can einsum accelerate the many matmul operations produced by tt/tucker decomposition?
It could be implemented it in one operation with einsum
And how to implement with einsum in one operation?
Most of einsum code runs in C++ so it should be faster. I didn't check extensively. I believe that for top-optimized code one should re-write the C++/cuda kernels.
I compared memory consumption vs using a python loop with tensordots (tt-pytorch implementation) and einsum is better.
Can't publish the full code yet because its under active research. We changed the TT implementation quite a lot from the public github repos and used 4-dimensional tensors as tt.cores. (Note that in this repo the authors "squeezed" the cores into 2-dimensional tensors, to use simple matmuls).
something like
torch.einsum('adcbr,rdxk,kcym,mbzn->axyzn', x, *tt.cores)
does the job for d=3. Notice it depends on the shape of tt.cores
.
Can implement something similar for 2-d cores.
I hope that when the research is done it will be published as part of a paper or integrated to nlp-architect.
hi, I am in a puzzle about the inference time of the compressed model. Why is the compressed model more time consuming? Shouldn't it be faster with fewer parameters(about half of the orignal) ?
thx