Closed jzakiya closed 2 months ago
Thank you for reporting the issue!
Could you please share your benchmarking results?
Which version of TruffleRuby are you using?
Related: https://github.com/oracle/truffleruby/issues/1999 We do modular exponentiation now: https://github.com/oracle/truffleruby/blob/51b497f921dacf71441f568ed653a6c5799d2ff0/src/main/ruby/truffleruby/core/integer.rb#L140 But always convert to BigInteger, maybe that's the overhead? If that's the case we could use similar logic to CRuby/JRuby for the small integers cases, but would be good to make sure that's the issue and how much slower it is before importing that code.
First code correction.
def modinv2(r, m)
inv = r
while (r * inv) % m != 1; inv = (inv * r) % m end
inv #% m
end
Here are times on my system for Ruby 3.3 w/wo YJIT, JRuby 9.4.6.0 and TruffleRuby 24.0.0, via rvm. Total times in secs, smaller the faster.
Ruby 3.3 w/o YJIT
1.544888113
1.054920322
0.954105064
3.531902812
0.369263064
Ruby 3.3 w/YJIT
0.375549548
0.383313453
0.36227265
3.213001315
0.235034272
JRuby 9.4.6.0
0.9543389999999999
0.99756
0.870005
1.6784890000000001
0.266971
TruffleRuby 24.0.0
0.222029
0.33000100000000004
0.19171500000000002
2.120432
4.006869
With a slightly modified script which:
#% m
for modinv2, otherwise the result differs from other methodsrequire 'benchmark'
# Enable YJIT if using CRuby >= 3.3"
RubyVM::YJIT.enable if defined?(RubyVM::YJIT.enable)
puts RUBY_DESCRIPTION
def modinv(a0, m0)
return 1 if m0 == 1
a, m = a0, m0
x0, inv = 0, 1
while a > 1
inv -= (a / m) * x0
a, m = m, a % m
x0, inv = inv, x0
end
inv += m0 if inv < 0
inv
end
def modinv1(r, m)
inv = r
while (r * inv) % m != 1; inv %= m; inv *= r end
inv % m
end
def modinv2(r, m)
inv = r
while (r * inv) % m != 1; inv = (inv * r) % m end
inv % m
end
def modinv3(r, m, rescnt)
(r ** (rescnt-1)) % m
end
def modinv4(r, m, rescnt)
r.pow(rescnt-1, m)
end
def tm(&b)
puts
3.times do
puts Benchmark.realtime(&b)
end
end
resp7 = [11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,
97,101,103,107,109,113,121,127,131,137,139,143,149,151,157,163,
167,169,173,179,181,187,191,193,197,199,209,211]
print resp7.map { |r| modinv(r, 210) }
puts
print resp7.map { |r| modinv1(r, 210) }
puts
print resp7.map { |r| modinv2(r, 210) }
puts
print resp7.map { |r| modinv3(r, 210, 48) }
puts
print resp7.map { |r| modinv4(r, 210, 48) }
puts
tm{ 100000.times {resp7.each { |r| modinv(r, 210) } } }
tm{ 100000.times {resp7.each { |r| modinv1(r, 210) } } }
tm{ 100000.times {resp7.each { |r| modinv2(r, 210) } } }
tm{ 100000.times {resp7.each { |r| modinv3(r, 210, 48) } } }
tm{ 100000.times {resp7.each { |r| modinv4(r, 210, 48) } } }
I get:
On Intel:
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [x86_64-linux]
1.347028863034211
1.3474754459457472
1.3452114629908465
1.2710674100089818
1.2754993910202757
1.2733817050466314
1.3521089899586514
1.351813138986472
1.3528226349735633
7.390963806945365
7.495342062029522
7.4823689730255865
0.8994149370118976
0.8945350690046325
0.9004962909966707
truffleruby 24.0.0, like ruby 3.2.2, Oracle GraalVM Native [x86_64-linux]
0.3218597859959118
0.24441085202852264
0.21928530297009274
0.20125181798357517
0.18273464200319722
0.1594632159685716
0.1859166320064105
0.16585971100721508
0.13765042601153255
4.247611464001238
4.127755031979177
4.079420825000852
8.350773187004961
8.356869303970598
8.30608620395651
On AMD Ryzen 7 3700X 8-Core Processor
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [x86_64-linux]
0.6688195530005032
0.6663745859996197
0.6647927419999178
0.7047924210000929
0.7027385810006308
0.70747054200001
0.6540912820000813
0.6548124430000826
0.6545112009998775
5.6871033949992125
5.716956408999977
5.703890113999478
0.46341663200018957
0.46297978599977796
0.46605407200058835
truffleruby 24.0.1, like ruby 3.2.2, Oracle GraalVM Native [x86_64-linux]
0.2904605989997435
0.207193976999406
0.18607605099987268
0.1610029299999951
0.13540141500016034
0.11250887000005605
0.13582420100010495
0.10616527599995607
0.08965392699974473
2.6358709440000894
2.63863470600063
2.630714566000279
5.454730677999578
5.439872395000748
5.455230954999934
truffleruby 24.0.1, like ruby 3.2.2, Oracle GraalVM JVM [x86_64-linux]
0.30870855699959066
0.19738048799990793
0.18506349099970976
0.17732479499954934
0.12824568500036548
0.12031344499973784
0.13241845700031263
0.09785182599989639
0.09070625899948936
2.0937121340002705
1.7953665820004971
1.7865803080003388
2.732251799000551
2.6282954050002445
2.6161076440002944
I noticed YJIT is much slower in my measurements, maybe you are on a different architecture, I guess darwin-aarch64?
But overall it's similar results, TruffleRuby is fastest everywhere except for the last method, modinv4
.
You mentioned regression, do you mean a regression compared to an older version of TruffleRuby, or you mean regression as "slower than CRuby/JRuby"?
Either way modinv4
amounts to (r between 11 and 211).pow(47, 210)
so that's all small integers as inputs, so we should optimize that case too like CRuby/JRuby, it seems the java.math.BigInteger#modPow code doesn't optimize that case well unfortunately.
@andrykonchin Could you port https://github.com/jruby/jruby/blob/9.4.4.0/core/src/main/java/org/jruby/RubyInteger.java#L935-L944 to TruffleRuby? (or from CRuby, but probably more work)
My system is a Lenovo Legion Slimjet laptop (2022), AMD Ryzen 9 5900HX, with Linux.
The issue is that pow
, which is used in modinv4
, is significantly slower in Truffleruby than in [C|J]Ruby.
In [C|J]Ruby using pow
is the fastest among different implementations, but is slowest in Truffleruby.
Fixed in https://github.com/oracle/truffleruby/pull/3552
Now modinv4
is as fast as modinv1
and modinv2
:
On AMD Ryzen 7 3700X 8-Core Processor
truffleruby 24.1.0-dev-52596ca3, like ruby 3.2.2, Oracle GraalVM JVM [x86_64-linux]
0.35175757099932525
0.21934590799719444
0.19075998200059985
0.20475720999820624
0.13995916100247996
0.11859457800164819
0.16480916699947556
0.10902587799864705
0.09228703799817595
2.012781329001882
1.7254886529990472
1.7228568719983741
0.21183235100033926
0.13329798599806963
0.11889045199859538
Same but running 5 times to ensure enough warmup since this is on jargraal and not libgraal:
truffleruby 24.1.0-dev-52596ca3, like ruby 3.2.2, Oracle GraalVM JVM [x86_64-linux]
0.36994781300018076
0.221259508998628
0.18763062200014247
0.19057154599795467
0.18439673299872084
0.19098085299992817
0.1328596249986731
0.11979981799959205
0.11720118900120724
0.11687311399873579
0.1548915470011707
0.10623463299998548
0.10084365199872991
0.09816986200166866
0.09051773600003798
2.040468414998031
1.7358531079989916
1.755651747000229
1.6976114799981588
1.7026588670014462
0.22171137699842802
0.13052473000061582
0.11817796299874317
0.11785270799737191
0.11395291900043958
As a bonus the new code is written in Ruby (which in this case is a lot faster than Java's BigInteger#modPow
!) and IMO looks a lot cleaner than the equivalents in CRuby / JRuby.
The code below shows this issue. On Truffleruby, using
pow(a, m)
is orders of magnitude slower than on [C|J]Ruby, where it's the fastest.