Benchmarks: are memowise results reliable?

tagliala commented 2 months ago

Hello,

I've checked the main branch ac1231e and changed the version number.

diff --git a/lib/memo_wise/version.rb b/lib/memo_wise/version.rb
index 428f25a..a4fb14d 100644
--- a/lib/memo_wise/version.rb
+++ b/lib/memo_wise/version.rb
@@ -1,5 +1,5 @@
 # frozen_string_literal: true

 module MemoWise
-  VERSION = "1.10.0"
+  VERSION = "1.10.0.dev"
 end

then I've tried to run the benchmarks

~/dev/memo_wise/benchmarks on ac1231e [!$]
$ bundle exec ruby benchmarks.rb
Skipping memoist (not loaded on 3.3.3)

Will BENCHMARK_GEMS:
    dry-core (1.0.1)
    memoist3 (1.0.0)
    memo_wise-local (1.10.0)
    alt_memery (2.1.0)
    memery (1.6.0)
    memo_wise-github-main (1.10.0)
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [arm64-darwin23]

I was expecting to see 1.10.0.dev in memo_wise-local and most important I was expecting to. see 1.00 (or something like that) in the comparison between memowise local and memowise github main, but I'm getting results like:

Method arguments	`memo_wise-github-main` (1.10.0)
`()` (none)	0.99x
`(a)`	1.02x
`(a, b)`	1.00x
`(a:)`	1.01x
`(a:, b:)`	1.00x
`(a, b:)`	1.01x
`(a, *args)`	0.97x
`(a:, **kwargs)`	0.86x
`(a, args, b:, *kwargs)`	1.00x

I'm asking because I'm getting different results for (a, *args) and (a:, **kwargs) almost all times

My config:

different ruby versions, from 2.7 to 3.3 (both arm and x64)
macOS 15.0 (24A335)
rvm 1.29.12-next (master)

JacobEvelyn commented 2 months ago

Hmm good catch; I too am seeing strange behavior. @pboling I expect puts MemoWise::VERSION to not succeed if I put it right after the doff_and_don call, but it prints 1.10.0. I similarly can't get it to print 1.10.0.dev if I modify that locally.

cc @ms-ati

tagliala commented 2 months ago

Thanks for the feedback. I would like to add that I was having this issue (or both issues, with non-expected benchmark results) even with the old approach with Dir.mktmpdir

pboling commented 2 months ago

Interesting... I think this is the result of some confusion.

The local copy is always the 1.00x, and is thus never displayed, and thus its version is never displayed.

The result you are seeing is from the git checkout of github's main branch of the code, (see benchmarks/Gemfile) and that one doesn't have the modified version.

pboling commented 2 months ago

Effectively you can compare local development (at whatever version you have checked out locally, which is never displayed) against github's main branch.

tagliala commented 2 months ago

Effectively you can compare local development (at whatever version you have checked out locally, which is never displayed) against github's main branch.

If I do this change

$ git diff
diff --git a/lib/memo_wise/internal_api.rb b/lib/memo_wise/internal_api.rb
index 3a80a8b..504b643 100644
--- a/lib/memo_wise/internal_api.rb
+++ b/lib/memo_wise/internal_api.rb
@@ -1,5 +1,7 @@
 # frozen_string_literal: true

+raise 'Required'
+
 module MemoWise
   class InternalAPI
     # Create initial mutable state to store memoized values if it doesn't

Should the benchmark raise an error?

pboling commented 2 months ago

I expect puts MemoWise::VERSION to not succeed if I put it right after the doff_and_don call, but it prints 1.10.0

I can't explain that. Perhaps bundler is autoloading it somehow... I can't find any code that would load the gem prior to that line unless the require: false in the Gemfile isn't working.

pboling commented 2 months ago

Oh, I know the issue. Regarding the Local code & GitHub checkout...

The GitHub checkout of the gem is being copied, and re-namespaced, but the original checkout, with its file paths are still there, because it is in the Gemfile. Normally the point of doing require: false in the Gemfile is so that you can manually require the gem later when you need it, and for that it has to be in the load path. But that's not our use case...

So, we have a memo_wise library in the load path already...

Now when we do this in benchmarks.rb:

require_relative "../lib/memo_wise"

It does load the local entry-point of the gem.

But, and this is where it gets very interesting, the gem does not use require_relative to load its own internal files... (as is standard "good" practice for modern Ruby libraries; require is reserved for external files, never internal files).

So when the local gem does this:

require "memo_wise/internal_api"
require "memo_wise/version"

It picks up the github checkout of the gem, which is still there, known to Bundler.

The fix is, and this should be applied generally to every library and app in Ruby, to use require_relative always when loading an internal file, to do this:

require_relative "memo_wise/internal_api"
require_relative "memo_wise/version"

pboling commented 2 months ago

You have found a very niche bug in memo_wise, but it is easy to fix! 🌟

pboling commented 2 months ago

Which reminds me that I need to make the same fix in gem_bench (and many others of my gems), which I originally wrote 11 years ago... before require_relative existed!

require_relative is Ruby >= 1.9.2.

tagliala commented 2 months ago

The fix is, and this should be applied generally to every library and app in Ruby, to use require_relative always when loading an internal file, to do this:

Thanks,

but I guess that in order to use GemBench, PR to hundreds of libraries should be done, given the fact that require_relative is not so used in this context

I've checked previously bundled gems like logger and it appears to have switch to require_relative for performance

https://github.com/ruby/logger/commit/1e2aab4bea73c24a2e26bc2c6b97c7ede74d471e

There is an interesting case in webrick:

https://github.com/ruby/webrick/blame/master/lib/webrick.rb

with this commit: https://github.com/ruby/webrick/blame/f5faca9222541591e1a7c3c97552ebb0c92733c7/lib/webrick.rb#L214-L232

But Rails, SimpleForm (just to give some examples) are not using require_relative

@JacobEvelyn are you going to change to require_relative here?

Another interesting PR:

https://github.com/ruby/psych/pull/522

https://github.com/activeadmin/arbre/pull/622

pboling commented 2 months ago

I looked at what gem_bench does to copy gems in the Jersey class, and the required file path it uses is an absolute path starting with /, which is a different use case, and ignores the $LOAD_PATH of Ruby.

When it loads gems that themselves use require instead of require_relative (which is a lot of them) it can have problems like this, but that is unavoidable. The ruby community has a lot of work to do.

And that includes gem_bench itself loading itself... I will have a new release with that fix soon!

tagliala commented 2 months ago

@pboling beside the require_relative issue, there is another one related to significant variation of results according to the orders of execution

I've tried to summarize it here: tagliala/memo_wise@b849956

Reporting the commit message here in markdown for simplicity:

According to the order of run, we have 14% difference among executions for (a:, **kwargs)

GitHub main tested before local

$ bundle exec ruby benchmarks.rb 

Will BENCHMARK_GEMS:
  memo_wise-github-main (1.10.0)
  memo_wise-local (1.10.0.dev)
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-darwin23]
Warming up --------------------------------------
memo_wise-github-main (1.10.0): (a:, **kwargs)
                       125.193k i/100ms
memo_wise-local (1.10.0.dev): (a:, **kwargs)
                       143.649k i/100ms
Calculating -------------------------------------
memo_wise-github-main (1.10.0): (a:, **kwargs)
                          1.146M (±23.0%) i/s  (872.35 ns/i) -      5.508M in   5.067507s
memo_wise-local (1.10.0.dev): (a:, **kwargs)
                          1.309M (±18.0%) i/s  (764.21 ns/i) -      6.464M in   5.169169s

Comparison:
memo_wise-local (1.10.0.dev): (a:, **kwargs):  1308539.2 i/s
memo_wise-github-main (1.10.0): (a:, **kwargs):  1146322.9 i/s - same-ish: difference falls within error

Method arguments	`memo_wise-github-main` (1.10.0)
`(a:, **kwargs)`	1.14x

local tested before GitHub main

$ bundle exec ruby benchmarks.rb 

Will BENCHMARK_GEMS:
  memo_wise-local (1.10.0.dev)
  memo_wise-github-main (1.10.0)
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-darwin23]
Warming up --------------------------------------
memo_wise-local (1.10.0.dev): (a:, **kwargs)
                       130.824k i/100ms
memo_wise-github-main (1.10.0): (a:, **kwargs)
                       150.842k i/100ms
Calculating -------------------------------------
memo_wise-local (1.10.0.dev): (a:, **kwargs)
                          1.184M (±23.6%) i/s  (844.93 ns/i) -      5.756M in   5.139460s
memo_wise-github-main (1.10.0): (a:, **kwargs)
                          1.381M (±17.8%) i/s  (723.94 ns/i) -      6.637M in   5.026159s

Comparison:
memo_wise-github-main (1.10.0): (a:, **kwargs):  1381332.2 i/s
memo_wise-local (1.10.0.dev): (a:, **kwargs):  1183526.0 i/s - same-ish: difference falls within error

Method arguments	`memo_wise-github-main` (1.10.0)
`(a:, **kwargs)`	0.86x

pboling commented 2 months ago

Interesting, so whichever is tested first has a significant (0.14x) advantage. I'm not sure what could be the cause of this...

tagliala commented 2 months ago

Interesting, so whichever is tested first has a significant (0.14x) advantage. I'm not sure what could be the cause of this...

as you can notice, for some reasons, even if the latter is 14% faster, ips considers them "same-ish", like that 14% does not matter, which is surprising, but at the same time the correct behavior:

memo_wise-github-main (1.10.0): (a:, **kwargs):  1381332.2 i/s
memo_wise-local (1.10.0.dev): (a:, **kwargs):  1183526.0 i/s - same-ish: difference falls within error

pboling commented 2 months ago

Oh! I fully read those results wrong. It is the one that is run last that performs best each time.

I would guess that this is due to Ruby's internal optimizations, and that since the two versions of memo_wise tested are nearly identical, the one running second will benefit from the previous run due to various low-level Ruby optimizations.

There is supporting evidence of this as well, in that the first run benefits from itself, as in both cases whichever runs first has a much greater variance in iterations/second.

memo_wise-github-main (1.10.0): (a:, **kwargs)
                          1.146M (±23.0%) i/s  (872.35 ns/i) -      5.508M in   5.067507s
memo_wise-local (1.10.0.dev): (a:, **kwargs)
                          1.309M (±18.0%) i/s  (764.21 ns/i) -      6.464M in   5.169169s

and

memo_wise-local (1.10.0.dev): (a:, **kwargs)
                          1.184M (±23.6%) i/s  (844.93 ns/i) -      5.756M in   5.139460s
memo_wise-github-main (1.10.0): (a:, **kwargs)
                          1.381M (±17.8%) i/s  (723.94 ns/i) -      6.637M in   5.026159s

Around 23% variance for whichever ran first, vs 18% variance for whichever ran second. That's a delta of about a quarter in the variance between first and second runs, which tracks almost exactly in both cases.

I think what you are seeing is just Ruby being Ruby. Maybe there is a way to clear/reset Ruby in between runs, but then I'm not sure of the point of the warmup stage.

pboling commented 2 months ago

I think we can safely chalk this up to "all benchmarks are bullshit (to some degree)", and differences of 14% aren't anything to worry about, because they aren't that precise (in this kind of scenario).

tagliala commented 2 months ago

but then I'm not sure of the point of the warmup stage.

This is surprising me too, too little warmup?

pboling commented 2 months ago

Perhaps bumping up the warmup could help... Ideally I'd expect to see similar variance for first and last runs.

Technically, with gem_bench you could even benchmark a gem against itself at the same exact version, and then the only difference possible would be how Ruby is optimized internally when it sees the same code (strings, symbols, etc).

JacobEvelyn commented 2 months ago

Thanks for digging in! I'm open to trying a longer warmup period to try to stabilize the results a bit more. I can make a PR for that.

tagliala commented 2 months ago

I can make a PR for that.

Does that actually help?

Answering myself: it does

    x.config(suite: suite, warmup: warmup)

x64 Ruby:

2 seconds: ~13%
3 seconds: ~8%
4 seconds: ~4%
5 seconds: ~2%
6 seconds: ~2%
7 seconds: ~2%

arm64 Ruby:

2 seconds: ~19%
3 seconds: ~11%
4 seconds: ~4%
5 seconds: ~0%
6 seconds: ~0%
7 seconds: ~0%

pboling commented 2 months ago

That is so awesome. Well done!

panorama-ed / memo_wise

Benchmarks: are memowise results reliable? #349

GitHub main tested before local

local tested before GitHub main