Open sonots opened 6 years ago
The transpose
method returns a view of NArray. In general, operations on view arrays are slow due to discontinuous memory access. I recommend the use of copied array obtained by b.transpose.dup
.
If we can find whether narray data is C contiguous, we can provide a helper at extra.rb like:
unless a.c_contiguous?
b = b.dup
end
a.dot(b)
How about this idea?
FYI: CuPy does this https://github.com/cupy/cupy/blob/f96dacb7f9ff34fb806046072a65ac02d7cdf3d1/cupy/core/core.pyx#L3858
Also, cuBLAS (maybe cBLAS also) has transa
and transb
option, so it is possible to compute
a.dot(b.transpose)
like
a.gemm(b, transb: true)
for faster computation.
I want to provide a helper feature to convert the former into the latter internally automatically if possible. I felt it could be done if narray has a flag which was transposed, but it looks it does not have. Any idea to make it possible?
I feel these two examples are semantically different. It may be possible to recognize as a transposed array by stride length.
It may be possible to recognize as a transposed array by stride length.
I think that it is possible to recognize it as a transposed array with stride length at the following judgment.
$ cat stride.rb
a = Numo::SFloat.new(1, 2, 3).seq(0)
at = a.transpose
att = at.transpose
puts "[Not Transpose Case (View)]"
p att
p att.debug_info
puts "[Transpose Case (View)]"
p at
p at.debug_info
$ ruby stride.rb
[Not Transpose Case (View)]
Numo::SFloat(view)#shape=[1,2,3]
[[[0, 1, 2],
[3, 4, 5]]]
Numo::SFloat:
id = 0x7fcff285daa8
type = 2
flag = [0,0]
size = 6
ndim = 3
shape = 0x7fcff0efb3e0
shape = [ 1 2 3 ]
data = 0x7fcff285dfa8
offset = 0
stridx = 0x7fcff0efb400
stridx = [ 24 12 4 ]
nil
[Transpose Case (View)]
Numo::SFloat(view)#shape=[3,2,1]
[[[0],
[3]],
[[1],
[4]],
[[2],
[5]]]
Numo::SFloat:
id = 0x7fcff285dad0
type = 2
flag = [0,0]
size = 6
ndim = 3
shape = 0x7fcff3004c80
shape = [ 3 2 1 ]
data = 0x7fcff285dfa8
offset = 0
stridx = 0x7fcff0ec2ca0
stridx = [ 4 12 24 ]
nil
If Numo::NArray has a "transpose?" method, I think that it is better to use "transa", "transb" designation after "transpose" instead of "dup".
https://github.com/ruby-numo/numo-linalg/blob/master/lib/numo/linalg/function.rb#L82
def dot(a, b)
a = NArray.asarray(a)
b = NArray.asarray(b)
case a.ndim
when 1
case b.ndim
when 1
func = blas_char(a, b) =~ /c|z/ ? :dotu : :dot
Blas.call(func, a, b)
else
Blas.call(:gemv, b, a, trans:'t')
end
else
case b.ndim
when 1
Blas.call(:gemv, a, b)
else
if a.transpose?
transa='t'
a = a.transpose
else
transa='n'
end
if b.transpose?
transb='t'
b = b.transpose
else
transb='n'
end
Blas.call(:gemm, a, b, transa:transa, transb:transb)
end
end
end
This is considered only for gemm case, but I think that gemv can cope in the same way.
I think this problem has been fixed in numo-narray 0.9.1.4, numo-linalg 0.1.4, but what do you think?
With numo-linalg (I did not investigate without numo-linalg),
issues 12 times of memcpy at
https://github.com/ruby-numo/numo-narray/blob/7bba089c3114ff08d110e47ba8bc448aa775f0d4/ext/numo/narray/ndloop.c#L1127
This drastically slows down matrix dot computation.