Closed iliabylich closed 1 year ago
This looks interesting, I'll have a look this week
I replied in https://github.com/whitequark/parser/issues/871#issuecomment-1385570115 to keep all my measurements together (I hope you don't mind). In short, this is great and makes every metric on TruffleRuby significantly better.
Please correct me if I'm reading it wrong, but
CodeSize
is roughly the same. I changed most rules to havebeginning_of_literal_pattern c_any
with delegation to sub-lexer, but looks like it's not a bottleneck.
The CodeSize
compared to what? To https://github.com/whitequark/parser/issues/871#issuecomment-1362218586 maybe? We can't really compare if we use different TruffleRuby versions (I use a local build of TruffleRuby master) and runtime config (I use --jvm
, I think you are using --native
above).
In my measurements the CodeSize is significantly smaller in this PR.
But the CodeSize is not the metric that matters the most.
The metric that matters the most for TruffleRuby is the GraalGraphSize. That's not shown by TraceCompilation
, only as [engine] opt failed ... Graph too big to safely compile ...
if it's above the limit, and [engine] opt done
if it's within the limit.
This PR brings the GraalGraphSize from 180k to 153k, so this is awesome.
Performance on MRI, master branch:
0.22344800014980137
0.22889799997210503
0.22801800002343953
0.22188099985942245
0.215658999979496
0.23656500014476478
0.22172700008377433
0.2242219999898225
0.21810900000855327
0.2218509998638183
0.22798700002022088
0.22605099994689226
Performance on MRI, patch:
0.23099999991245568
0.2553759999573231
0.23117999988608062
0.2359160000924021
0.22565700020641088
0.24311299994587898
0.23387099988758564
0.2327790001872927
0.2283179999794811
0.23300900007598102
0.2435709999408573
0.23459000000730157
0.23156400001607835
The difference comes from introducing indirection (things that used to be ivars/locals are a part of a different sub-lexer now), but I think it's acceptable, I wanted to do this refactoring for a long time.
Performance on TruffleRuby, master branch:
8.871330896159634
6.567537926137447
[engine] opt done Parser::Lexer#advance |AST 14159|Time 7165(2129+5035)ms|Tier 2|Inlined 0Y 175N|IR 41506/87802|CodeSize 356400|Addr 0x11d000000|Src lexer-F0.rb:11507
[engine] opt deopt Parser::Lexer#advance |AST 14159|Calls/Thres 47382/ 3|CallsAndLoop/Thres 931543/ 1000|Src lexer-F0.rb:11507
3.492685919860378
4.992397190071642
[engine] opt done Parser::Lexer#advance |AST 14159|Time 6075(1958+4117)ms|Tier 2|Inlined 0Y 175N|IR 41516/88231|CodeSize 350344|Addr 0x12044b000|Src lexer-F0.rb:11507
1.803598809055984
0.9968494030181319
Performance on TruffleRuby, patch:
8.562273496063426
7.1901286309584975
[engine] opt done Parser::Lexer#advance |AST 10937|Time 5058(2158+2901)ms|Tier 2|Inlined 0Y 113N|IR 34209/66939|CodeSize 271855|Addr 0x12adea000|Src lexer-F0.rb:8543
[engine] opt deopt Parser::Lexer#advance |AST 10937|Calls/Thres 45337/ 3|CallsAndLoop/Thres 817912/ 1000|Src lexer-F0.rb:8543
2.910490944981575
4.281942526111379
[engine] opt done Parser::Lexer#advance |AST 10937|Time 4525(1771+2754)ms|Tier 2|Inlined 0Y 113N|IR 34219/67368|CodeSize 265671|Addr 0x1299f9000|Src lexer-F0.rb:8543
1.076639185892418
0.9211365671362728
0.933931423118338
CodeSize
went down from 356400
to 271855
(23%), compilation time improved from 7165ms
to 5058ms
(29%), so it's definitely worth it. Merging.
I'll take a look at performance (for the first time :D) once I'm done with size optimisations.
@eregon @headius I suppose now it's ok to "inline" methods back to Ragel actions in extracted sub-lexer, right? Sub-lexer is quite small, but this magic with taking/returning p
makes it less readable, so I'd like to revert those changes.
Quite small is still 4810-3108 = 1702 lines for LexerStrings#advance
Ragel is pretty "efficient" at generating huge amounts of code: it turns a 938 lines .rl file into 5203 lines :/
Tweaking the command-line to show both #advance
methods:
truffleruby 23.0.0-dev-f3db7ba6, like ruby 3.1.3, GraalVM CE JVM [x86_64-linux]
$ truffleruby --jvm --engine.TraceCompilation --experimental-options --engine.MaximumGraalGraphSize=153000 -Ilib -rparser/current -rbenchmark -e 'code=File.read("test/test_lexer.rb"); 300.times { p Benchmark.realtime { Parser::CurrentRuby.parse(code) } }' |& grep -E '#advance|^[0-9]'
[engine] opt done id=5978 LexerStrings#advance |Tier 1|Time 1047( 610+437 )ms|AST 4413|Inlined 0Y 57N|IR 6565/ 17508|CodeSize 88918|Addr 0x7f46b3884060|Timestamp 1660571591456|Src lexer-strings.rb:3108
[engine] opt deopt id=5978 LexerStrings#advance | |Timestamp 1660802465675|Src lexer-strings.rb:3108
[engine] opt done id=5978 LexerStrings#advance |Tier 2|Time 1491( 594+897 )ms|AST 4413|Inlined 28Y 63N|IR 8210/ 19793|CodeSize 108376|Addr 0x7f46b3b31ea0|Timestamp 1662199674045|Src lexer-strings.rb:3108
[engine] opt deopt id=5978 LexerStrings#advance | |Timestamp 1662213063821|Src lexer-strings.rb:3108
[engine] opt done id=5978 LexerStrings#advance |Tier 1|Time 870( 444+426 )ms|AST 4413|Inlined 0Y 63N|IR 6959/ 18826|CodeSize 93653|Addr 0x7f46b3b898c0|Timestamp 1663086958447|Src lexer-strings.rb:3108
[engine] opt done id=5978 LexerStrings#advance |Tier 2|Time 1447( 580+867 )ms|AST 4413|Inlined 27Y 64N|IR 8092/ 19772|CodeSize 102448|Addr 0x7f46b3c928e0|Timestamp 1664634103981|Src lexer-strings.rb:3108
[engine] opt done id=5939 Lexer#advance |Tier 1|Time 4399(1526+2873)ms|AST 10580|Inlined 0Y 108N|IR 14318/ 43010|CodeSize 240189|Addr 0x7f46b405ce40|Timestamp 1668163715899|Src lexer-F0.rb:8555
[engine] opt done id=5939 Lexer#advance |Tier 2|Time 4104(1440+2665)ms|AST 10580|Inlined 0Y 108N|IR 12251/ 36971|CodeSize 207576|Addr 0x7f46b469e720|Timestamp 1672271968370|Src lexer-F0.rb:8555
0.06205050200014739
0.06224363499995889
0.06184562899989032
0.06213130400010414
0.06224232599993229
To compare numbers more easily:
LexerStrings#advance |Tier 1|Time 870( 444+426 )ms|AST 4413|Inlined 0Y 63N|IR 6959/ 18826|CodeSize 93653|Src lexer-strings.rb:3108
Lexer#advance |Tier 1|Time 4399(1526+2873)ms|AST 10580|Inlined 0Y 108N|IR 14318/ 43010|CodeSize 240189|Src lexer-F0.rb:8555
LexerStrings#advance |Tier 2|Time 1447( 580+867 )ms|AST 4413|Inlined 27Y 64N|IR 8092/ 19772|CodeSize 102448|Src lexer-strings.rb:3108
Lexer#advance |Tier 2|Time 4104(1440+2665)ms|AST 10580|Inlined 0Y 108N|IR 12251/ 36971|CodeSize 207576|Src lexer-F0.rb:8555
We should look at Tier 1 numbers here, because in Tier 2 LexerStrings#advance inlines 27 methods and so of course is bigger due to that.
So LexerStrings is about half the size in number of AST nodes (maybe the most intuitive notion of size),
about half in number of compiler nodes (IR), more than half in number of calls (Inlined), a bit less than half in CodeSize.
LexerStrings is much faster to compile, which is great (these measurements are on JVM CE no libgraal so compile time is not precise as it includes warmup of Graal itself).
A CodeSize of 94 KB in Tier 1 is a huge method (that's a huge amount of assembly code for a single method).
So from a JIT point of view, LexerStrings#advance
is already huge, but at the same time much better than Lexer#advance
.
From a JIT point of view, smaller methods are always better (until they become reasonable size let's say ~10 lines then it's less clear) because it compiles faster (compilation is not strictly linear with amount of code, e.g., the register allocator)
and because the JIT can then choose to inline what makes more sense and optimize that part better.
If the method becomes bigger, the JIT has less room, is of course forced to include everything in the bigger method but then might not be able to inline the most important part(s) of the method, and the slow paths included might prevent some optimizations (e.g. might give up on loop peeling or duplication because the graph is so big already). We can see here TruffleRuby (Graal) chose to inline 27 out of 64 calls, so 1) it's likely too much code to inline all of them and/or 2) some calls are less important and it's better for performance to not inline those.
It would be useful if @headius can check whether LexerStrings#advance
compiles on JRuby.
I suspect LexerStrings#advance might be too big for JRuby as it is.
So I would suggest to keep it as it is. For TruffleRuby it's worse for warmup and possibly worse for peak performance to make it bigger, but it should still compile. You are the maintainer though, so you are the one to decide. I would suggest to keep it as separate methods unless it proves too much a hassle in practice.
That said, from a quick (and maybe wrong) regexp search (\bp\n
), it seems there are only 2 or few methods which return p
, I only found extend_string_eol_heredoc_intertwined
and e_heredoc_nl
.
Both of these are small, I think it's fine and wont' change things significantly to inline those.
But maybe I missed some other methods returning p
?
With a slightly better search with \bp =
:
p = e_heredoc_nl(p)
p = current_literal.heredoc_e - 1
p = extend_string_eol_heredoc_intertwined(p)
p = extend_interp_var(current_literal)
heredoc_e
seems an attribute reader, so not relevant.
extend_interp_var
is not trivial and called about 8 times in the .rb
so that would blow up the size quite a bit.
extend_string_eol_heredoc_intertwined
seems called 9 times in the .rb
, so still a sizable increase given the code would be "forced inlined" 9 times.
e_heredoc_nl
doesn't seem called at all?
e_heredoc_nl doesn't seem called at all?
You are still on the old version of the code, e_heredoc_nl
doesn't exist anymore.
I used this PR's commit, 5b16cd0a8a50e1df7092cf6e62c1c1602cdcf9fa for the above.
The new changes look great! With no flags to JRuby, everything compiles except the two advance
methods.
advance
methods interpreted and allowing everything else to compile. Even when forcing the LexerStrings advance
to JVM bytecode, the JVM decides it is still too big to native compile.I will play with some flags and see if I can get more of this to compile in JRuby with recent changes.
WIP, but there's something to share.
In this PR I've extracted everything related to strings to sub-lexer (
lexer-strings.rl
). Code in this branch doesn't handle all cases yet, but it is able to lex most string literals and it is able to parsetest_parser.rb
. As I understand it doesn't change anything:@eregon @headius Please correct me if I'm reading it wrong, but
CodeSize
is roughly the same. I changed most rules to havebeginning_of_literal_pattern c_any
with delegation to sub-lexer, but looks like it's not a bottleneck.