oracle / truffleruby

A high performance implementation of the Ruby programming language, built on GraalVM.
https://www.graalvm.org/ruby/
Other
3.02k stars 185 forks source link

NonProfiledCodeHeapSize warning benchmarking Rails 7 scaffold app #2805

Open headius opened 1 year ago

headius commented 1 year ago

Set-up instructions below, with 16-way concurrency/DB on server and siege hitting it with 16 threads. I could not find a way to set this flag. This is TruffleRuby 22.3 JVM CE.

$ RAILS_ENV=production RUBYOPT='--jvm --experimental-options --cexts-lock=false' rails s
=> Booting Puma
=> Rails 7.0.4 application starting in production 
=> Run `bin/rails server --help` for more startup options
Puma starting in single mode...
* Puma version: 5.6.5 (truffleruby 22.3.0 - ruby 3.0.3) ("Birdie's Version")
*  Min threads: 16
*  Max threads: 16
*  Environment: production
*          PID: 4195
* Listening on http://0.0.0.0:3000
Use Ctrl-C to stop
[551.055s][warning][codecache] CodeHeap 'non-profiled nmethods' is full. Compiler has been disabled.
[551.055s][warning][codecache] Try increasing the code heap size using -XX:NonProfiledCodeHeapSize=
OpenJDK 64-Bit Server VM warning: CodeHeap 'non-profiled nmethods' is full. Compiler has been disabled.
OpenJDK 64-Bit Server VM warning: Try increasing the code heap size using -XX:NonProfiledCodeHeapSize=
CodeHeap 'non-profiled nmethods': size=119472Kb used=114825Kb max_used=118198Kb free=4646Kb
 bounds [0x0000000117530000, 0x000000011e9dc000, 0x000000011e9dc000]
CodeHeap 'profiled nmethods': size=119456Kb used=114672Kb max_used=114672Kb free=4783Kb
 bounds [0x0000000110088000, 0x0000000117530000, 0x0000000117530000]
CodeHeap 'non-nmethods': size=6832Kb used=1802Kb max_used=2725Kb free=5029Kb
 bounds [0x000000010f9dc000, 0x000000010fc9c000, 0x0000000110088000]
 total_blobs=24270 nmethods=23071 adapters=742
 compilation: disabled (not enough contiguous free space left)
              stopped_count=1, restarted_count=0
 full_count=0

Set-up of application

RAILS_ENV=production rake db:create # if necessary for target database
RAILS_ENV=production rake db:migrate
RAILS_ENV=production rake assets:precompile
RAILS_ENV=production RAILS_MAX_THREADS=16 rails s # with appropriate options for target runtime
siege -b --no-parser -c 16 -t 60s http://localhost:3000/posts/1 > /dev/null
headius commented 1 year ago

Running on 22.3 GraalVM CE on MacOS 13 on M1.

eregon commented 1 year ago

Thank you for the report. I wonder if this is architecture-specific or so, we'll check.

headius commented 1 year ago

Happens as well on x86_64 Linux.

eregon commented 1 year ago

Does it work if you increase it like the message suggests? It should be something like --vm.XX:NonProfiledCodeHeapSize=512m, in TRUFFLERUBYOPT or RUBYOPT or as a command line argument. I can't double-check myself right now.

eregon commented 1 year ago

Could you share the siege command line and which database & gem is used? Even better would be a repository I can just use to reproduce.

eregon commented 1 year ago

I cannot reproduce it with the instructions in the description on truffleruby 22.3.1, like ruby 3.0.3, GraalVM EE Native [x86_64-linux] (JDK 17) and guessing the rest. I did:

I don't see any warning about CodeHeap.

I downloaded EE via:

bash <(curl -sL https://get.graalvm.org/ee-token)                                                                 
bash <(curl -sL https://get.graalvm.org/jdk)

as documented on https://www.graalvm.org/downloads/

There are several problems with this repro though:

This would all be addressed by sharing a github repository, and e.g. a script containing the siege command line.

headius commented 1 year ago

I will push a repository later but also I will update the instructions in the other bug.

headius commented 1 year ago

The other bug is closed so I added instructions for building the app here. It is a standard scaffolded blog post app with one post, running 16-way on default sqlite configuration.

eregon commented 1 year ago

Thank you for the more detailed repro instructions, I can now reproduce it. It needs quite a few siege runs, and letting it compile for a while without hitting it with siege made it happen for me.

Unfortunately we are very close to RubyKaigi so I don't think I will have time to investigate much further before the conference. Maybe there is a deoptimization loop in 22.3 that causes the high usage of JITed code (notably there seems to be many repeated compilations of Truffle::Splitter methods). That's possibly fixed on master, I will try to check.

Setting only NonProfiledCodeHeapSize alone does not work, as I guess you saw:

$ export RUBYOPT='--jvm --experimental-options --cexts-lock=false --vm.XX:NonProfiledCodeHeapSize=512m'                                 
$ RAILS_ENV=production RAILS_MAX_THREADS=16 rails s                                                    
Error occurred during initialization of VM
Invalid code heap sizes: NonNMethodCodeHeapSize (4K) + ProfiledCodeHeapSize (4K) + NonProfiledCodeHeapSize (524288K) = 524296K is greater than ReservedCodeCacheSize (327680K).

But setting ReservedCodeCacheSize as well does work:

$ export RUBYOPT='--jvm --experimental-options --cexts-lock=false --vm.XX:NonProfiledCodeHeapSize=512m --vm.XX:ReservedCodeCacheSize=1g'

Obviously this is not tuned for the best memory usage, that would need a proper Xmx and finding a good value for NonProfiledCodeHeapSize.

For benchmarking, use TruffleRuby --jvm EE, as mentioned on https://github.com/oracle/truffleruby/blob/master/doc/user/benchmarking.md#use-truffleruby-ee. CE is not representative of TruffleRuby's best performance (a bit like JRuby without invokedynamic).

eregon commented 1 year ago

From the output with --engine.TraceCompilation, the many repeated compilations of Truffle::Splitter methods seems the problem, which sounds like a deoptimization loop. That could explain the high amount of generated code, memory usage, and likely also lower performance, as it means some part often runs in the interpreter.

eregon commented 1 year ago

To give an idea, on 22.3.1 CE I see ~4500 RPS and on master EE I see ~7850 RPS, on a linux-amd64 machine with 18 physical cores.

That run on master EE didn't run into that deoptimization loop issue, but it would need more runs to be confident it's solved as it seems to not always happen. There is also current work in the compiler to avoid problems with deoptimization loops.