Closed mrkn closed 2 years ago
Hi. I tried given branch mrkn:ractor_support
with my current project and found out performance issues. I tried to isolate the issue with the simple benchmark
require 'benchmark'
require 'numo/narray'
Warning[:experimental] = false
puts 'Testing Numo'
data = Ractor.make_shareable Array.new(1_000_000) { Numo::SFloat.new(10).rand(100) }
Benchmark.bm do |bm|
bm.report('no ractor') do
4.times { data.each &:mean }
end
bm.report('1 ractor') do
Ractor.new(data) do |arr|
4.times { arr.each &:mean }
nil
end.take
end
bm.report('2 ractors') do
2.times.map do
Ractor.new(data) do |arr|
2.times { arr.each &:mean }
nil
end
end.each &:take
end
bm.report('4 ractors') do
4.times.map do
Ractor.new(data) do |arr|
arr.each &:mean
nil
end
end.each &:take
end
end
puts 'Testing core Array'
data = Ractor.make_shareable Array.new(2_000_000) { Array.new(10) { Random.rand } }
Benchmark.bm do |bm|
bm.report('no ractor') do
4.times { data.each { |v| v.sum / v.size.to_f } }
end
bm.report('1 ractor') do
Ractor.new(data) do |arr|
4.times { arr.each { |v| v.sum / v.size.to_f } }
nil
end.take
end
bm.report('2 ractors') do
2.times.map do
Ractor.new(data) do |arr|
2.times { arr.each { |v| v.sum / v.size.to_f } }
nil
end
end.each &:take
end
bm.report('4 ractors') do
4.times.map do
Ractor.new(data) do |arr|
arr.each { |v| v.sum / v.size.to_f }
nil
end
end.each &:take
end
end
Running on Ruby 3.1, Centos 8, it produces the following output on idling 14-core xeon e5-2680 v4.
Testing Numo
user system total real
no ractor 5.259529 0.021958 5.281487 ( 5.292886)
1 ractor 6.039938 0.114778 6.154716 ( 6.116115)
2 ractors 17.098116 2.135474 19.233590 ( 10.368513)
4 ractors 27.108385 7.667887 34.776272 ( 10.787219)
Testing core Array
user system total real
no ractor 1.408945 0.000000 1.408945 ( 1.411900)
1 ractor 1.742667 0.028465 1.771132 ( 1.774470)
2 ractors 1.458583 0.000000 1.458583 ( 0.735995)
4 ractors 1.495018 0.000000 1.495018 ( 0.385232)
For some reason the performance of multiple Ractors calculating Numo arrays degrades significantly
@orlando-labs At first, you need to understand that numo-narray is not always faster than Array. Numo-narray is designed for operating large numeric arrays. So testing with 10-length arrays is very disadvantageous for numo-narray.
With the following benchmark, you can see the running time chagnes in the differnt way between numo-narray and normal array. Numo-narray is slower than normal array when array_len < 1000
, but it is faster than normal array when array_len > 1000
, on my machine.
require 'benchmark'
require 'numo/narray'
array_count = 10000
[10, 100, 1000, 10000].each do |array_len|
data_numo = Array.new(array_count) { Numo::SFloat.new(array_len).rand(100) }
data_ary = Array.new(array_count) { Array.new(array_len) { Random.rand } }
puts
puts "# array_len = #{array_len}"
puts
Benchmark.bm do |bm|
bm.report('numo') do
4.times { data_numo.each &:mean }
end
bm.report('array') do
4.times { data_ary.each { |v| v.sum / v.size.to_f } }
end
end
end
# array_len = 10
user system total real
numo 0.041033 0.000000 0.041033 ( 0.041040)
array 0.004684 0.000000 0.004684 ( 0.004685)
# array_len = 100
user system total real
numo 0.048274 0.000000 0.048274 ( 0.048298)
array 0.014097 0.000000 0.014097 ( 0.014101)
# array_len = 1000
user system total real
numo 0.086663 0.000000 0.086663 ( 0.086707)
array 0.108927 0.000000 0.108927 ( 0.108994)
# array_len = 10000
user system total real
numo 0.399808 0.000000 0.399808 ( 0.400040)
array 1.062646 0.000000 1.062646 ( 1.063160)
With the following benchmark code that is similar to yours, numo-narray is faster than normal array.
require 'benchmark'
require 'numo/narray'
Warning[:experimental] = false
array_len = 10000
array_count = 10000
puts 'Testing Numo'
data = Ractor.make_shareable Array.new(array_count) { Numo::SFloat.new(array_len).rand(100) }
Benchmark.bm do |bm|
bm.report('no ractor') do
4.times { data.each &:mean }
end
bm.report('1 ractor') do
Ractor.new(data) do |arr|
4.times { arr.each &:mean }
nil
end.take
end
bm.report('2 ractors') do
2.times.map do
Ractor.new(data) do |arr|
2.times { arr.each &:mean }
nil
end
end.each &:take
end
bm.report('4 ractors') do
4.times.map do
Ractor.new(data) do |arr|
arr.each &:mean
nil
end
end.each &:take
end
end
puts 'Testing core Array'
data = Ractor.make_shareable Array.new(2*array_count) { Array.new(array_len) { Random.rand } }
Benchmark.bm do |bm|
bm.report('no ractor') do
4.times { data.each { |v| v.sum / v.size.to_f } }
end
bm.report('1 ractor') do
Ractor.new(data) do |arr|
4.times { arr.each { |v| v.sum / v.size.to_f } }
nil
end.take
end
bm.report('2 ractors') do
2.times.map do
Ractor.new(data) do |arr|
2.times { arr.each { |v| v.sum / v.size.to_f } }
nil
end
end.each &:take
end
bm.report('4 ractors') do
4.times.map do
Ractor.new(data) do |arr|
arr.each { |v| v.sum / v.size.to_f }
nil
end
end.each &:take
end
end
ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-linux]
Testing Numo
user system total real
no ractor 0.326137 0.000217 0.326354 ( 0.326562)
1 ractor 0.355712 0.000000 0.355712 ( 0.355770)
2 ractors 0.383952 0.000106 0.384058 ( 0.201436)
4 ractors 0.396320 0.000000 0.396320 ( 0.106362)
Testing core Array
user system total real
no ractor 2.058136 0.000000 2.058136 ( 2.059276)
1 ractor 2.062877 0.000000 2.062877 ( 2.063872)
2 ractors 2.098526 0.000000 2.098526 ( 1.052108)
4 ractors 2.203544 0.000006 2.203550 ( 0.560517)
Hi, @mrkn. Thanks for the response. I appreciate it. And it stays unclear why my example leads to growing processing times: 4 ractors with quarter-load did the job 1.5 times slower than 1 ractor with a full load. With yours 10k-sized, I see expected speedup.
This is an article that ko1, a developer of Ractor, posted on his company Cookpad's blog about a year ago. https://techlife.cookpad.com/entry/2020/12/26/131858 [Japanese]
Here, he says that using Ractor can be slower than not using it.
In the previous example, we were able to achieve a speedup of almost 4 times. However, this is a best case, or champion data, example that works well.
He writes that slow referencing of constants is one of the reasons why Ractor is slow.
- The inline cache used for constant lookups was not thread-safe, so the cache was disabled except for the main Ractor.
- The constant table is shared among Ractor, so it is locked, but if the lock conflicts, it is very slow.
He has written that he will fix this problem, so constant referencing may not be slow now.
As multiple-core CPUs become commonplace, the need to describe parallel computation is increasing. This phrase has been a standard preamble for more than 10 years when I was doing research at university. In fact, I don't think anyone would disagree that parallel computing is essential for writing high-performance software.
In order to perform parallel computation, the program must support parallel computation. In order to do so, parallel programming is required. Many programming languages already have a mechanism for parallel computing.
numo-narray is probably one of the areas in Ruby where Ractor will be used the most in the future. I think it is very important for future for Ruby that Ractor is available in numo-narray.
ping: If you don't mind, @ko1, could you take a look at this for us?
Hi, @mrkn, @kojix2, as long as I'm using ractor-compatible branch for 2 months, I see no issues, except the performance ones, which are not relative to numo-narray
Hi all I'm very interested in this feature. Is it merged with the main branch or a new checking is needed to allow this code be in production? Thank you very much Pedro Seoane
I'll contact with the owner.
Seeing similar slow down issues when using more than 2 Ractors and Numo. https://github.com/PlummersSoftwareLLC/Primes solution_2 uses Numo Single thread, Numo multithread using Ractor, and multithreaded no Numo used.
I want to let numo-narray support Ractor in this pull request.
The following changes are made:
UPCAST
constants to make them sharable in non-main RactorsNumo::RObject
I keep
Numo::RObject
non-sharable because its instances can have compound objects such asArray
andHash
.@masa16 Could you please take a look?