Open SeguinBe opened 4 years ago
Well, as usual when I am blocked on something for hours and when I post about it, I find out why (well partially here).
Replacing:
auto mm1 = xt::reshape_view(m1, {m1.shape(0)*m1.shape(1), m1.shape(2)});
auto mm2 = xt::reshape_view(m2, {m2.shape(0)*m2.shape(1), m2.shape(2)});
with:
xt::xtensor<float, 2> mm1 = xt::reshape_view(m1, {m1.shape(0)*m1.shape(1), m1.shape(2)});
xt::xtensor<float, 2> mm2 = xt::reshape_view(m2, {m2.shape(0)*m2.shape(1), m2.shape(2)});
Seems to solve the main difference. So I have then two questions:
reshape_view
be called so that it has static rank information, which seemed to be the missing bit here?
Hello,
I'm just starting exploring the possibilities of the
xtensor-stack
, it was a bit rough at first as I had not done proper C++ in a while but I found my way around it after some time. However, I feel I am probably missing some things.My current setup is based on :
xtensor=0.21.4
(conda-forge)xsimd=7.4.7
(conda-forge)xtensor-blas
from sourceMKL
as blastarget_link_libraries(<my-module> PRIVATE xtensor xtensor::optimize xtensor::use_xsimd)
in the CMakegcc=7.3.0
I am doing a quick comparison with numpy (linked with mkl as well), generally I feel that I'm being 1.5x slower, which I find a bit surprising as most of what I do are large matrix computations.
For instance, I was trying to do a tensordot on the last dimension of two 3-d tensors. I was trying two methods :
Registered with
pybind
asNow trying a simple timing in a jupyter notebook![image](https://user-images.githubusercontent.com/7132817/79462651-150cd180-7ff8-11ea-84e1-205e5ea5f6de.png)
The direct
tensordot
is roughly 1.5x slower which is unfortunately what I seem to get often. But I am more confused at the manually reshaped and transposed version (tensordot_manual
), which is even faster in numpy but much MUCH slower with my code.Any thought on what is happening here? Having a look at
xt::linalg::dot
, it seems everything should be mapped to single blas-call, as the reshaping and the transposition should be just views of the same data.