Closed bjfish closed 7 months ago
Some sample memory/cpu sampling below.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sampling Histogram. Recorded 160968 samples with period 1ms.
Self Time: Time spent on the top of the stack.
Total Time: Time spent somewhere on the stack.
Opt %: Percent of time spent in compiled and therefore non-interpreted code.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Thread: Thread[main,5,main]
Name | Total Time | Opt % || Self Time | Opt % | Location
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Parser::Lexer#advance | 14072ms 8.7% | 0.0% || 11500ms 7.1% | 0.0% | ../../truffleruby-ws/truffleruby/mxbuild/truffleruby-native/jre/languages/ruby/lib/gems/gems/parser-2.7.1.4/lib/parser/lexer.rb~10965-23583:456549-780083
yaml_parser_update_buffer | 10082ms 6.3% | 0.2% || 10072ms 6.3% | 0.2% | ../../truffleruby-ws/truffleruby/src/main/c/psych/yaml/reader.c~142:3763-3825
yaml_parser_scan_plain_scalar | 7749ms 4.8% | 2.5% || 6336ms 3.9% | 1.8% | ../../truffleruby-ws/truffleruby/src/main/c/psych/yaml/scanner.c~3389:92048-92120
rb_funcallv_public | 11309ms 7.0% | 9.0% || 3716ms 2.3% | 1.6% | ../../truffleruby-ws/truffleruby/src/main/c/cext/call.c~26:885-968
parse | 52508ms 32.6% | 26.5% || 3586ms 2.2% | 4.6% | ../../truffleruby-ws/truffleruby/src/main/c/psych/psych_parser.c~251:5989-6041
yaml_parser_scan_to_next_token | 7246ms 4.5% | 37.9% || 2824ms 1.8% | 1.3% | ../../truffleruby-ws/truffleruby/src/main/c/psych/yaml/scanner.c~1919:51908-51960
Truffle::StringOperations.copy_from | 2719ms 1.7% | 0.0% || 2205ms 1.4% | 0.0% | resource:/truffleruby/core/truffle/string_operations.rb~150-161:4432-5008
String#inspect_char | 2361ms 1.5% | 0.0% || 2124ms 1.3% | 0.0% | resource:/truffleruby/core/string.rb~562-636:14822-16557
yaml_parser_fetch_more_tokens | 26478ms 16.4% | 24.5% || 2054ms 1.3% | 4.4% | ../../truffleruby-ws/truffleruby/src/main/c/psych/yaml/scanner.c~800:24651-24702
yaml_parser_fetch_next_token | 24340ms 15.1% | 26.6% || 2046ms 1.3% | 2.4% | ../../truffleruby-ws/truffleruby/src/main/c/psych/yaml/scanner.c~860:25970-26020
Hash#[] | 10496ms 6.5% | 3.9% || 1989ms 1.2% | 1.8% | (core)~1:0
Hash#[]= | 3042ms 1.9% | 18.5% || 1829ms 1.1% | 13.7% | (core)~1:0
Truffle::POSIX.getcwd | 2244ms 1.4% | 33.3% || 1820ms 1.1% | 34.5% | resource:/truffleruby/core/posix.rb~134:3842-3897
Class#new | 47618ms 29.6% | 66.5% || 1791ms 1.1% | 50.6% | (core)~1:0
yaml_parser_parse_node | 1696ms 1.1% | 0.8% || 1694ms 1.1% | 0.8% | ../../truffleruby-ws/truffleruby/src/main/c/psych/yaml/parser.c~529:17086-17151
String#inspect | 8655ms 5.4% | 0.0% || 1670ms 1.0% | 0.0% | resource:/truffleruby/core/string.rb~516-560:13654-14819
Truffle::CExt#rb_funcallv_public | 5191ms 3.2% | 12.8% || 1657ms 1.0% | 15.9% | ../../truffleruby-ws/truffleruby/mxbuild/truffleruby-native/jre/languages/ruby/lib/truffle/truffle/cext.rb~838-840:18075-18156
Kernel#gem_original_require | 10362ms 6.4% | 0.0% || 1630ms 1.0% | 0.0% | resource:/truffleruby/core/kernel.rb~226-238:7091-7471
rb_ary_store | 1788ms 1.1% | 49.5% || 1530ms 1.0% | 51.0% | ../../truffleruby-ws/truffleruby/src/main/c/cext/array.c~69:1516-1572
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Location Histogram with Allocation Counts. Recorded a total of 7960056 allocations.
Total Count: Number of allocations during the execution of this element.
Self Count: Number of allocations in this element alone (excluding sub calls).
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Name | Self Count | Total Count | Location
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
rb_ary_store | 815085 10.2% | 815090 10.2% | ../../truffleruby-ws/truffleruby/src/main/c/cext/array.c~69:1516-1572
rb_funcallv_public | 576002 7.2% | 1745427 21.9% | ../../truffleruby-ws/truffleruby/src/main/c/cext/call.c~27:979
RSpec::Core::ExampleStatusParser#split_line | 523112 6.6% | 523112 6.6% | ../../truffleruby-ws/truffleruby/mxbuild/truffleruby-native/jre/languages/ruby/lib/gems/gems/rspec-core-3.9.2/lib/rspec/core/example_status_persister.rb~231:7295-7329
RSpec::Core::FilterManager#file_scoped_include? | 521060 6.5% | 521060 6.5% | ../../truffleruby-ws/truffleruby/mxbuild/truffleruby-native/jre/languages/ruby/lib/gems/gems/rspec-core-3.9.2/lib/rspec/core/filter_manager.rb~103:3400-3439
RSpec::Core::Metadata::HashPopulator#populate_location_attributes | 376106 4.7% | 376106 4.7% | ../../truffleruby-ws/truffleruby/mxbuild/truffleruby-native/jre/languages/ruby/lib/gems/gems/rspec-core-3.9.2/lib/rspec/core/metadata.rb~155:5706-5783
Parser::Base#parse | 336930 4.2% | 648944 8.2% | ../../truffleruby-ws/truffleruby/mxbuild/truffleruby-native/jre/languages/ruby/lib/gems/gems/parser-2.7.1.4/lib/parser/base.rb~189:5203-5216
rb_ary_new_capa | 336716 4.2% | 336716 4.2% | ../../truffleruby-ws/truffleruby/src/main/c/cext/array.c~30:686
RSpec::Core::ExampleStatusDumper#formatted_row_from | 289863 3.6% | 359078 4.5% | ../../truffleruby-ws/truffleruby/mxbuild/truffleruby-native/jre/languages/ruby/lib/gems/gems/rspec-core-3.9.2/lib/rspec/core/example_status_persister.rb~177:5815-5886
RSpec::Core::Metadata.relative_path | 235314 3.0% | 235334 3.0% | ../../truffleruby-ws/truffleruby/mxbuild/truffleruby-native/jre/languages/ruby/lib/gems/gems/rspec-core-3.9.2/lib/rspec/core/metadata.rb~45:1840-1901
rb_str_new | 218499 2.7% | 218499 2.7% | ../../truffleruby-ws/truffleruby/src/main/c/cext/string.c~64:1560
RSpec::ExampleGroups.base_name_for | 195732 2.5% | 228672 2.9% | ../../truffleruby-ws/truffleruby/mxbuild/truffleruby-native/jre/languages/ruby/lib/gems/gems/rspec-core-3.9.2/lib/rspec/core/example_group.rb~862:31082-31130
RSpec::Core::ExampleStatusMerger#sort_value_from | 143661 1.8% | 143661 1.8% | ../../truffleruby-ws/truffleruby/mxbuild/truffleruby-native/jre/languages/ruby/lib/gems/gems/rspec-core-3.9.2/lib/rspec/core/example_status_persister.rb~134:4683-4742
block in Object#<top (required)> | 103470 1.3% | 7862788 98.8% | ../../truffleruby-ws/truffleruby/mxbuild/truffleruby-native/jre/languages/ruby/lib/gems/gems/bundler-1.17.2/exe/bundle~30:902-943
RuboCop::YAMLDuplicationChecker.traverse | 85038 1.1% | 103084 1.3% | lib/rubocop/yaml_duplication_checker.rb~23:679-753
RSpec::Core::ExampleStatusParser#initialize | 82844 1.0% | 82844 1.0% | ../../truffleruby-ws/truffleruby/mxbuild/truffleruby-native/jre/languages/ruby/lib/gems/gems/rspec-core-3.9.2/lib/rspec/core/example_status_persister.rb~213:6929-6984
RuboCop::ConfigLoader.yaml_safe_load | 78474 1.0% | 1107677 13.9% | lib/rubocop/config_loader.rb~246:8538-8572
RuboCop::Cop::Badge.parse | 76520 1.0% | 76520 1.0% | lib/rubocop/cop/badge.rb~30:922-961
RSpec::Core::FlatMap#flat_map
Thanks a lot @bjfish for looking into this. I wanted to ask how I can get these samples while running RSpec? I was checking the profiling documentation, but it wasn't clear to me and I would like to dig a little bit more into these performance issues.
You can just pass --cpusampler
to TruffleRuby: https://github.com/oracle/truffleruby/blob/master/doc/user/tools.md#cpu-sampler
Can be done with TRUFFLERUBYOPT=--cpusampler
too so you don't need to figure out how to pass it through RSpec.
The top hit is:
Parser::Lexer#advance
self time 11500ms 7.1%, never compiled (Opt % = 0.0%), parser-2.7.1.4/lib/parser/lexer.rb~10965-23583
.
So that's a 12618 lines Ruby method! That's just too big to compile as a single unit under basically any compiler.
The Parser::Lexer issue can easily be reproduced with:
gem install parser
ruby --engine.TraceCompilation --cpusampler -rparser/current -e '20000.times { Parser::CurrentRuby.parse("2 + 2") }'
which shows
[engine] opt done Parser::Lexer#advance |ASTSize 24619 |Time 14586(2069+12517)ms |Tier Last |DirectCallNodes I 0/D 558 |GraalNodes 41290/185329 |CodeSize 652038 |CodeAddress 0x7f7e74018000 |Source lexer.rb:10965
So actually it does compile! (maybe due to this extremely simple code to lex). But it took 14.5 seconds to compile and it's 24619 AST nodes, that's huge.
-----------------------------------------------------------------------------------------------------------------------------------------
Sampling Histogram. Recorded 11569 samples with period 1ms.
Self Time: Time spent on the top of the stack.
Total Time: Time spent somewhere on the stack.
Opt %: Percent of time spent in compiled and therefore non-interpreted code.
-----------------------------------------------------------------------------------------------------------------------------------------
Thread: Thread[main,5,main]
Name | Total Time | Opt % || Self Time | Opt % | Location
-----------------------------------------------------------------------------------------------------------------------------------------
Parser::Lexer#advance | 9575ms 82.8% | 15.2% || 9478ms 81.9% | 15.3% | .rubies/truffleruby-dev/lib/gems/gems/parser-2.7.1.4/lib/parser/lexer.rb~10965-23583:456549-780083
So here we see some time was spent in compiled code (Opt % > 0).
It's still huge though, and it seems worth checking if there are other Ragel flags to produce a more reasonable method that would compile faster and more efficiently.
I'm a bit surprised at the 2nd and 3rd place... I imagine this is from reading RuboCop's config? It has 4k lines, but should be read once only (maybe a few times more in some tests), but I don't get it...
I think the sampling by @bjfish was only on a small period of time (160s) so it's probably highly focused on startup performance. (and reading YAML is rather slow until compiled)
It would be useful to run the full tests in GitHub Actions with TRUFFLERUBYOPT=--cpusampler
to get more insights.
=> Running on https://github.com/eregon/rubocop/actions (https://github.com/eregon/rubocop/runs/1041307575?check_suite_focus=true)
Trying on a larger file:
$ ruby --engine.TraceCompilation --cpusampler -rparser/current -e 'code=File.read("src/main/ruby/truffleruby/core/string.rb"); 20000.times { Parser::CurrentRuby.parse(code) }'
...
[engine] opt done Parser::Lexer#advance <split-5d55eaac> |ASTSize 31540 |Time 34964(4806+30158)ms |Tier Last |DirectCallNodes I 0/D 862 |GraalNodes 92299/334191 |CodeSize 1192658 |CodeAddress 0x7fccfa9e3000 |Source lexer.rb:10965
This method just gets larger in number of nodes (31540 nodes) and time to compile (35s, 1MB of native code) as more branches are actually used.
I'm seeing lots of split Array#[]
and String#[]
, we are already planning to AST-inline those, which should help warmup and maybe reduce a bit that method's size in AST nodes.
The whole suite seems to take long and maybe running out of memory: https://github.com/eregon/rubocop/runs/1050668543?check_suite_focus=true Is there a smaller subset suite I could run that's still more than just one spec file to profile?
@eregon how about running it with bundle exec rspec spec/rubocop/cop/
?
Found a specific spec file that is slower than the average: bundle exec rspec spec/rubocop/cli/cli_options_spec.rb
@jaimerave Could you do a run in your fork with TRUFFLERUBYOPT="--cpusampler"
similar to how I did in https://github.com/eregon/rubocop/commit/d8de6c8b186501b4254795ff8cf4865350a6afb1 but on a subset of the specs?
From https://gist.github.com/jaime-rave-conichi/54d6dddfefa82d090a813be50c49313b
block (2 levels) in Truffle::POSIX.attach_function_eagerly
which is a blocking native call but unfortunately it does not tell us the name of it. Will try to fix that. Maybe it's waitpid()
.String#inspect
which is known to be very slow and we should rewrite.@jaimerave I see that Truffleruby runs much faster now than before, mb it's time to try to enable it on CI again? Results of my recent runs
Finished in 17 minutes 46 seconds (files took 5 minutes 4 seconds to load)
This seems much improved, and when RuboCop defaults to Prism will be significantly faster (the parser gem is slow on truffleruby due to the humongous advance
method).
From comment: https://github.com/oracle/truffleruby/issues/1415#issuecomment-682061107