sonots commented 6 years ago

With numo-linalg (I did not investigate without numo-linalg),

a = Numo::SFloat.new(3, 4).seq(0)
b = Numo::SFloat.new(3, 4).seq(0)
a.dot(b.transpose)

issues 12 times of memcpy at

https://github.com/ruby-numo/numo-narray/blob/7bba089c3114ff08d110e47ba8bc448aa775f0d4/ext/numo/narray/ndloop.c#L1127

This drastically slows down matrix dot computation.

masa16 commented 6 years ago

The transpose method returns a view of NArray. In general, operations on view arrays are slow due to discontinuous memory access. I recommend the use of copied array obtained by b.transpose.dup.

sonots commented 6 years ago

If we can find whether narray data is C contiguous, we can provide a helper at extra.rb like:

unless a.c_contiguous?
   b = b.dup
end
a.dot(b)

How about this idea?

FYI: CuPy does this https://github.com/cupy/cupy/blob/f96dacb7f9ff34fb806046072a65ac02d7cdf3d1/cupy/core/core.pyx#L3858

sonots commented 6 years ago

Also, cuBLAS (maybe cBLAS also) has transa and transb option, so it is possible to compute

a.dot(b.transpose)

like

a.gemm(b, transb: true)

for faster computation.

I want to provide a helper feature to convert the former into the latter internally automatically if possible. I felt it could be done if narray has a flag which was transposed, but it looks it does not have. Any idea to make it possible?

masa16 commented 6 years ago

I feel these two examples are semantically different. It may be possible to recognize as a transposed array by stride length.

naitoh commented 6 years ago

It may be possible to recognize as a transposed array by stride length.

I think that it is possible to recognize it as a transposed array with stride length at the following judgment.

View Case
offset = 0
stridx[0, 1, ... ,N-2 ,N-1] == [dtype shape[N-1] shape[N-2] .. shape[2] shape[1], dtype shape[N-1] shape[N-2] .. shape[2], : dtype shape[N-1], dtype]
- => Not Transposed
stridx[0, 1, ... ,N-2 ,N-1] == [dtype, dtype shape[0], : dtype shape[0] shape[1] .. shape[N-3], dtype shape[0] shape[1] .. shape[N-3] shape[N-2]]
- => Transposed

Sample Code

$ cat stride.rb
a = Numo::SFloat.new(1, 2, 3).seq(0)
at = a.transpose
att = at.transpose

puts "[Not Transpose Case (View)]"
p att
p att.debug_info
puts "[Transpose Case (View)]"
p at
p at.debug_info

$ ruby stride.rb
[Not Transpose Case (View)]
Numo::SFloat(view)#shape=[1,2,3]
[[[0, 1, 2], 
  [3, 4, 5]]]
Numo::SFloat:
  id     = 0x7fcff285daa8
  type   = 2
  flag   = [0,0]
  size   = 6
  ndim   = 3
  shape  = 0x7fcff0efb3e0
  shape  = [ 1 2 3 ]
  data   = 0x7fcff285dfa8
  offset = 0
  stridx = 0x7fcff0efb400
  stridx = [ 24 12 4 ]
nil
[Transpose Case (View)]
Numo::SFloat(view)#shape=[3,2,1]
[[[0], 
  [3]], 
 [[1], 
  [4]], 
 [[2], 
  [5]]]
Numo::SFloat:
  id     = 0x7fcff285dad0
  type   = 2
  flag   = [0,0]
  size   = 6
  ndim   = 3
  shape  = 0x7fcff3004c80
  shape  = [ 3 2 1 ]
  data   = 0x7fcff285dfa8
  offset = 0
  stridx = 0x7fcff0ec2ca0
  stridx = [ 4 12 24 ]
nil

Not Transposed Case

SFloat = 4 (=dtype)
shape = [ 1 2 3 ]
stridx = [ 24 12 4 ]

Not Transposed Chack => match

stridx[0] == dtype shape[2] shape[1] => 24
stridx[1] == dtype * shape[2] => 12
stridx[2] == dtype => 4

Transposed Chack => not match

stridx[0] != dtype => 4
stridx[1] != dtype * shape[0] => 4
stridx[2] != dtype shape[0] shape[1] => 8

Transposed Case

SFloat = 4 (=dtype)
shape = [ 3 2 1 ]
stridx = [ 4 12 24 ]

Not Transposed Chack => not match

stridx[0] != dtype shape[2] shape[1] => 8
stridx[1] != dtype * shape[2] => 4
stridx[2] != dtype => 4

Transposed Chack => match

stridx[0] == dtype => 4
stridx[1] == dtype * shape[0] => 12
stridx[2] == dtype shape[0] shape[1] => 24

Fix Code Sample

If Numo::NArray has a "transpose?" method, I think that it is better to use "transa", "transb" designation after "transpose" instead of "dup".

https://github.com/ruby-numo/numo-linalg/blob/master/lib/numo/linalg/function.rb#L82

  def dot(a, b)
    a = NArray.asarray(a)
    b = NArray.asarray(b)
    case a.ndim
    when 1
      case b.ndim
      when 1
        func = blas_char(a, b) =~ /c|z/ ? :dotu : :dot
        Blas.call(func, a, b)
      else
        Blas.call(:gemv, b, a, trans:'t')
      end
    else
      case b.ndim
      when 1
        Blas.call(:gemv, a, b)
      else
        if a.transpose?
          transa='t'
          a = a.transpose
        else
          transa='n'
        end
        if b.transpose?
          transb='t'
          b = b.transpose
        else
          transb='n'
        end

        Blas.call(:gemm, a, b, transa:transa, transb:transb)
      end
    end
  end

This is considered only for gemm case, but I think that gemv can cope in the same way.

naitoh commented 5 years ago

I think this problem has been fixed in numo-narray 0.9.1.4, numo-linalg 0.1.4, but what do you think?

ruby-numo / numo-narray

Lots of memcpy are issued on a.dot(b.transpose) #95

Sample Code

Not Transposed Case

Not Transposed Chack => match

Transposed Chack => not match

Transposed Case

Not Transposed Chack => not match

Transposed Chack => match

Fix Code Sample