Open XrXr opened 1 year ago
Thank you for the report. I wonder if the problem could be related to the fact hexapdf creates so many Fibers: https://github.com/Shopify/yjit-bench/pull/47#issuecomment-1478313399 If the Fibers are not executed to completion (we have to check, I don't know if that happens for the hexapdf benchmark), then TruffleRuby currently does not GC them, unlike CRuby, because it seems very difficult to do that safely, have correct semantics, and GC-based resource release is I would say an anti-pattern because it can have very large delays and cause all sorts of problems (e.g. causing extra GCs).
we have to check, I don't know if that happens for the hexapdf benchmark
That's not the case, there is always 2 Fibers and 2 Threads when running this benchmark, the main thread and the reference processor thread. Checked using:
# truffleruby_primitives: true
...
p fibers: Primitive.all_fibers_backtraces.map { |fiber,| "#{fiber} of #{Primitive.fiber_thread(fiber)}" }
in benchmarks/hexapdf/benchmark.rb
.
I can reproduce it on 23.0.0 both in Native and JVM mode (on Oracle GraalVM). Note that for benchmarking one should use JVM mode: https://github.com/oracle/truffleruby/blob/master/doc/user/benchmarking.md
With --engine.TraceCompilation
I noticed we seem to have a deoptimization loop in Truffle::Splitter.add_substring
. I will check if that is still the case on master.
On master there are many compilations of Truffle::Splitter.add_substring
but it does stabilize.
On a benchmark we have using
hexapdf
, TruffleRuby seems to have speed that becomes more and more variable the longer one lets the benchmark run. Also, memory usage seems to grow over time, though I don't know if it stops growing at some point. In any case, I'm filing this as it seems like a performance bug.Version:
truffleruby 23.0.0, like ruby 3.1.3, Oracle GraalVM Native [x86_64-linux]
Benchmark in question: https://github.com/Shopify/yjit-bench/tree/tr-hexpdf-problem SHA at the time of bug submission: https://github.com/Shopify/yjit-bench/commit/c663283bcd268736f4ae9a4510f06b2f8af57865 To run, make sure
ruby
is TruffleRuby and do$ ruby run_benchmarks.rb --harness=rss hexapdf
. It runs for 5 minutes but you can tweak the code if desired. The gap between fastest and slowest iteration time seems to grow the longer the benchmark runs.For reference, here is the same graph from running the benchmark using CRuby (interpreter only)