On hexapdf benchmark, speed destabilizes and memory usage grows over time

oracle / truffleruby

A high performance implementation of the Ruby programming language, built on GraalVM.

https://www.graalvm.org/ruby/

Other

3.02k stars 185 forks source link

On hexapdf benchmark, speed destabilizes and memory usage grows over time #3130

Open XrXr opened 1 year ago

XrXr commented 1 year ago

On a benchmark we have using hexapdf, TruffleRuby seems to have speed that becomes more and more variable the longer one lets the benchmark run. Also, memory usage seems to grow over time, though I don't know if it stops growing at some point. In any case, I'm filing this as it seems like a performance bug.

Version: truffleruby 23.0.0, like ruby 3.1.3, Oracle GraalVM Native [x86_64-linux]

Benchmark in question: https://github.com/Shopify/yjit-bench/tree/tr-hexpdf-problem SHA at the time of bug submission: https://github.com/Shopify/yjit-bench/commit/c663283bcd268736f4ae9a4510f06b2f8af57865 To run, make sure ruby is TruffleRuby and do $ ruby run_benchmarks.rb --harness=rss hexapdf. It runs for 5 minutes but you can tweak the code if desired. The gap between fastest and slowest iteration time seems to grow the longer the benchmark runs.

For reference, here is the same graph from running the benchmark using CRuby (interpreter only)

eregon commented 1 year ago

Thank you for the report. I wonder if the problem could be related to the fact hexapdf creates so many Fibers: https://github.com/Shopify/yjit-bench/pull/47#issuecomment-1478313399 If the Fibers are not executed to completion (we have to check, I don't know if that happens for the hexapdf benchmark), then TruffleRuby currently does not GC them, unlike CRuby, because it seems very difficult to do that safely, have correct semantics, and GC-based resource release is I would say an anti-pattern because it can have very large delays and cause all sorts of problems (e.g. causing extra GCs).

eregon commented 1 year ago

we have to check, I don't know if that happens for the hexapdf benchmark

That's not the case, there is always 2 Fibers and 2 Threads when running this benchmark, the main thread and the reference processor thread. Checked using:

# truffleruby_primitives: true
...
p fibers: Primitive.all_fibers_backtraces.map { |fiber,| "#{fiber} of #{Primitive.fiber_thread(fiber)}" }

in benchmarks/hexapdf/benchmark.rb.

eregon commented 1 year ago

I can reproduce it on 23.0.0 both in Native and JVM mode (on Oracle GraalVM). Note that for benchmarking one should use JVM mode: https://github.com/oracle/truffleruby/blob/master/doc/user/benchmarking.md

With --engine.TraceCompilation I noticed we seem to have a deoptimization loop in Truffle::Splitter.add_substring. I will check if that is still the case on master.

eregon commented 1 year ago

On master there are many compilations of Truffle::Splitter.add_substring but it does stabilize.