Closed naitoh closed 5 years ago
Benchmarke Updated. (transpose? => f_contiguous?)
$ cat numo_linalg_fp32.yaml
contexts:
- gems: { numo-linalg: 0.1.3 }
require: false
prelude: |
require 'numo/linalg'
- gems: { numo-linalg: 0.2.0 }
require: false
prelude: |
require 'numo/linalg'
loop_count: 1000
prelude: |
X = 1000
Y = 784
YY = Y + 1
a = Numo::SFloat.new(X, Y).seq(0)
b = Numo::SFloat.new(X, Y).seq(0).transpose
c = b.dup
d = Numo::SFloat.new(X, YY).seq(0)[true,1..-1] # contiguous? => false, f_contiguous? => false
e = Numo::SFloat.new(X, YY).seq(0)[true,1..-1].transpose # contiguous? => false, f_contiguous? => false
f = e.dup
sleep 30
benchmark:
'a.dot(b.transpose) : view (contiguous? => false, f_contiguous? => true) ' : a.dot(b)
'a.dot(b.transpose.dup) : not view (contiguous? => true) ' : a.dot(c)
'a.dot(d.transpose) : view (contiguous? => false, f_contiguous? => false)' : a.dot(e)
'a.dot(f.transpose.dup) : not view (contiguous? => true) ' : a.dot(f)
$ benchmark-driver numo_linalg_fp32.yaml
Calculating -------------------------------------
numo-linalg 0.1.3 numo-linalg 0.2.0
a.dot(b.transpose) : view (contiguous? => false, f_contiguous? => true) 38.316 59.741 i/s - 1.000k times in 26.098845s 16.739011s
a.dot(b.transpose.dup) : not view (contiguous? => true) 58.714 57.679 i/s - 1.000k times in 17.031678s 17.337227s
a.dot(d.transpose) : view (contiguous? => false, f_contiguous? => false) 36.577 48.925 i/s - 1.000k times in 27.339320s 20.439631s
a.dot(f.transpose.dup) : not view (contiguous? => true) 56.454 57.231 i/s - 1.000k times in 17.713388s 17.472999s
Comparison:
a.dot(b.transpose) : view (contiguous? => false, f_contiguous? => true)
numo-linalg 0.2.0: 59.7 i/s
numo-linalg 0.1.3: 38.3 i/s - 1.56x slower
a.dot(b.transpose.dup) : not view (contiguous? => true)
numo-linalg 0.1.3: 58.7 i/s
numo-linalg 0.2.0: 57.7 i/s - 1.02x slower
a.dot(d.transpose) : view (contiguous? => false, f_contiguous? => false)
numo-linalg 0.2.0: 48.9 i/s
numo-linalg 0.1.3: 36.6 i/s - 1.34x slower
a.dot(f.transpose.dup) : not view (contiguous? => true)
numo-linalg 0.2.0: 57.2 i/s
numo-linalg 0.1.3: 56.5 i/s - 1.01x slower
I changed the name of the method I am using. (f_contiguous? => fortran_contiguous?)
Thanks!
I wrote patch for improve the performance of the a.dot(b.transpose). (See https://github.com/ruby-numo/numo-narray/issues/95 for details.)
This patch requires https://github.com/ruby-numo/numo-narray/pull/116 .
Benchmarke code (numo-linalg 0.2.0 : this pull request version)
Benchmarked Result (numo-linalg 0.2.0 : this pull request version)