ruby-numo / numo-narray

Ruby/Numo::NArray - New NArray class library
http://ruby-numo.github.io/narray/
BSD 3-Clause "New" or "Revised" License
418 stars 42 forks source link

Broadcast performance improvement. #94

Closed naitoh closed 6 years ago

naitoh commented 6 years ago

I wrote a patch to improve the performance of the broadcast. (See #93 for details.)

Benchmarked code

$ cat broadcast_fp32_0.rb
require 'benchmark'
require 'numo/narray'

num_iteration = 10000

Benchmark.bm 20 do |r|
  x = Numo::SFloat.ones([1000,784])
  y = Numo::SFloat.ones([1000,784])
  r.report "x.inplace + y" do
    num_iteration.times do
      x.inplace + y
    end
  end

  x = Numo::SFloat.ones([1000,784])
  y = Numo::SFloat.ones([1000,784])
  r.report "x.inplace + 1.0" do
    num_iteration.times do
      x.inplace + 1.0
    end
  end

  x = Numo::SFloat.ones([1000,784])
  z = Numo::SFloat.ones([1000,1])
  r.report "x.inplace + z" do
    num_iteration.times do
      x.inplace + z
    end
  end

  x = Numo::SFloat.ones([1000,784])
  y = Numo::SFloat.ones([1000,784])
  r.report "x.inplace - y" do
    num_iteration.times do
      x.inplace - y
    end
  end

  x = Numo::SFloat.ones([1000,784])
  y = Numo::SFloat.ones([1000,784])
  r.report "x.inplace - 1.0" do
    num_iteration.times do
      x.inplace - 1.0
    end
  end

  x = Numo::SFloat.ones([1000,784])
  z = Numo::SFloat.ones([1000,1])
  r.report "x.inplace - z" do
    num_iteration.times do
      x.inplace - z
    end
  end

  x = Numo::SFloat.ones([1000,784])
  y = Numo::SFloat.ones([1000,784])
  r.report "x.inplace * y" do
    num_iteration.times do
      x.inplace * y
    end
  end

  x = Numo::SFloat.ones([1000,784])
  y = Numo::SFloat.ones([1000,784])
  r.report "x.inplace * 1.0" do
    num_iteration.times do
      x.inplace * 1.0
    end
  end

  x = Numo::SFloat.ones([1000,784])
  z = Numo::SFloat.ones([1000,1])
  r.report "x.inplace * z" do
    num_iteration.times do
      x.inplace * z
    end
  end

  x = Numo::SFloat.ones([1000,784])
  y = Numo::SFloat.ones([1000,784])
  r.report "x.inplace / y" do
    num_iteration.times do
      x.inplace / y
    end
  end

  x = Numo::SFloat.ones([1000,784])
  y = Numo::SFloat.ones([1000,784])
  r.report "x.inplace / 1.0" do
    num_iteration.times do
      x.inplace / 1.0
    end
  end

  x = Numo::SFloat.ones([1000,784])
  z = Numo::SFloat.ones([1000,1])
  r.report "x.inplace / z" do
    num_iteration.times do
      x.inplace / z
    end
  end
end

numo-narray (0.9.1.2)

$ ruby broadcast_fp32_0.rb
                           user     system      total        real
x.inplace + y          7.117035   0.014939   7.131974 (  7.136282)
x.inplace + 1.0        6.789272   0.024060   6.813332 (  6.827599)
x.inplace + z          7.175892   0.017552   7.193444 (  7.202093)
x.inplace - y          7.153403   0.018990   7.172393 (  7.179199)
x.inplace - 1.0        7.126394   0.035196   7.161590 (  7.165015)
x.inplace - z          7.661959   0.027423   7.689382 (  7.691847)
x.inplace * y          7.292331   0.019903   7.312234 (  7.313645)
x.inplace * 1.0        7.400105   0.027464   7.427569 (  7.456064)
x.inplace * z          7.615981   0.014080   7.630061 (  7.648254)
x.inplace / y         21.070260   0.040930  21.111190 ( 21.118321)
x.inplace / 1.0       20.598696   0.037721  20.636417 ( 20.666177)
x.inplace / z         20.315364   0.034728  20.350092 ( 20.356939)

numo-narray (this pull request)

$ ruby broadcast_fp32_0.rb
                           user     system      total        real
x.inplace + y          7.155114   0.016712   7.171826 (  7.186810)
x.inplace + 1.0        3.459453   0.015183   3.474636 (  3.481500)
x.inplace + z          3.663317   0.010254   3.673571 (  3.680733)
x.inplace - y          6.845094   0.017954   6.863048 (  6.870259)
x.inplace - 1.0        3.560169   0.013040   3.573209 (  3.586768)
x.inplace - z          3.646052   0.005849   3.651901 (  3.660764)
x.inplace * y          7.467288   0.023760   7.491048 (  7.519944)
x.inplace * 1.0        3.894445   0.003156   3.897601 (  3.898232)
x.inplace * z          3.689050   0.006646   3.695696 (  3.699812)
x.inplace / y         20.479120   0.030151  20.509271 ( 20.584411)
x.inplace / 1.0        5.516712   0.005548   5.522260 (  5.536777)
x.inplace / z          5.517277   0.013617   5.530894 (  5.537079)

Environment

kojix2 commented 6 years ago

wow! I do not understand anything about the code. But the performance results look great.

naitoh commented 6 years ago

I have made additional improvements by c61b5e7.

numo-narray (this pull request)

$ ruby broadcast_fp32_0.rb 
                           user     system      total        real
x.inplace + y          6.472457   0.008103   6.480560 (  6.488018)
x.inplace + 1.0        3.067979   0.002201   3.070180 (  3.071858)
x.inplace + z          3.690922   0.005546   3.696468 (  3.697989)
x.inplace - y          6.192707   0.004674   6.197381 (  6.199437)
x.inplace - 1.0        3.379623   0.003397   3.383020 (  3.385782)
x.inplace - z          3.495210   0.004582   3.499792 (  3.502072)
x.inplace * y          6.091861   0.013382   6.105243 (  6.115937)
x.inplace * 1.0        3.299177   0.002403   3.301580 (  3.331252)
x.inplace * z          3.556242   0.004560   3.560802 (  3.564021)
x.inplace / y          6.605212   0.004621   6.609833 (  6.613972)
x.inplace / 1.0        5.443426   0.005829   5.449255 (  5.451560)
x.inplace / z          5.416341   0.009099   5.425440 (  5.430727)
masa16 commented 6 years ago

Thank you so much.

Try2Code commented 6 years ago

@naitoh awesome

naitoh commented 6 years ago

Thank you for the merge.