Closed headius closed 1 year ago
There's lots of repetition of these escape calls, but I'm unsure whether there's enough to warrant outlining them. For example:
@escape = encode_escape(tok(p - 2, p).to_i(16))
...appears 9 times. Reducing it to a single call would drop maybe a half-dozen instructions on JRuby, so outlining just this only reduces the instruction count by a few dozen. In aggregate, however, outlining similar sequences may amount to hundreds of instructions.
Sorry to say but it makes code cryptic. Also there's no way to "call" any Ragel DSL from extracted methods without inlining them again into actions, so I can't promise that it will not get re-written back in a month.
Actually, some parts look good enough to be merged as is (extracting constants, turning the whole action into a single method), but all methods that take (or even return) p
look too complex, not because it's complex on its own, but because of how much context should be tracked to understand and change anything, and now there's more.
Does Ragel support ivars for p
? Let me check it quickly first. If it doesn't hurt performance most of you changes will be OK, other locals can also be turned into ivars hopefully.
It's variable p @p;
, with a some local changes I was able to run a full test suite, benchmarks (parsing test_parser.rb):
Before:
0.26340699987486005
0.24227400007657707
0.2389680000487715
0.25204400019720197
0.24021800002083182
0.2268710001371801
0.22673400002531707
0.2313620001077652
0.23082199995405972
0.23081399989314377
0.22557200002484024
0.2313079999294132
0.22624099999666214
0.2364000000525266
0.2290109999012202
After:
0.23141700006090105
0.22681099991314113
0.24080900009721518
0.22510300017893314
0.23191100009717047
0.2234749998897314
0.23476299992762506
0.2259360000025481
0.22546899993903935
0.23276899987831712
0.22822699998505414
0.22566900006495416
0.22536400007084012
0.2339080001693219
0.22188700013794005
Looks like there's no difference, so I'll send a PR in a minute. With this change it'll be possible to convert most (if not all) actions into methods and call them with %{ action }
instead of %action
(i.e. instead of going to action for the source code we'll go to the method definition), which sounds good enough for me.
I've merged #900 , so p
is now @p
everywhere, and taking and passing it back is not necessary anymore. Could you rebase you PR please? I understand that my change affected your branch a lot, so if you have no time I could do prepare a PR myself.
Also current_literal
is taken from literal
everywhere which is a method, so it doesn't have to be passed as well.
The change you made makes this PR impossible to rebase, unfortunately. Nearly every commit breaks.
Could you undo that and reapply it on top of mine?
FWIW, @p
will have more overhead on all implementations than just passing p
so we might reduce code complexity slightly but probably end up slower.
If there's somewhere we could live chat about this it would be a great help. I have just pushed the last change I intend to make using this "flay" methodology, but there are other less-valuable changes possible.
On Ragel DSL that prevents us from turning everything into methods, I wonder if there's a performant way to maybe somehow "return" information about needed fnext
/fbreak
/fgoto
from the action as a value and "dispatch" it maybe from a lambda that has access to lvars that are mutated by this DSL. Pseudo-code:
NO_ACTION = 0
FBREAK = 1
FNEXT_EXPR_BEGIN = 2
FNEXT_EXPR_VALUE = 4
# ...
def on_pattern_x
# do something ...
return FBREAK | FNEXT_EXPR_BEGIN
end
dispatch = ->(action) {
if action & FBREAK
fbreak;
end
if action & FNEXT_EXPR_BEGIN
fnext expr_begin;
end
# ...
}
'pattern_x' => { dispatch(on_pattern_x) }
I know, it's terrible
I've done as much as I can do with this methodology. I got about halfway down the "flay" report before the "mass" got too low for it to be worth further refactoring.
Overall, this represents a reduction in F1 size from 19235 instructions to 13849, a reduction of nearly 5400 instructions. It also reduces the F0 lexer from 11691 to 9872, less 1819 instructions.
It cannot be merged now because p
became @p
. If given a diff based on this PR I could apply it, or #900 could be reverted and remade atop my PR.
FWIW, @p will have more overhead on all implementations than just passing p so we might reduce code complexity slightly but probably end up slower.
From my naive POV ivar/lvar access sounds like an array/hash lookup, so it should be roughly the same in terms of performance. Or is it because locals can be stored in registers on some platforms after JITing? I see no difference on MRI and as I understand on 3.2 with object shapes ivars will be even faster.
If there's somewhere we could live chat about this it would be a great help.
Sure, feel free to drop me an email with a link to any public chat and we can talk. Or is there a public JRuby/TruffleRuby chat that I could join?
Pseudo-code
Yeah with the work in this PR a large part of the remaining duplicated code is entangled with state changes, so it's hard to go further without a way to avoid that. I'm not sure moving more of these variables into instance variables is the right approach, and allocating a carrier object would reduce code size but mean another allocation for every call to advance
.
I would like to get this PR merged before we look at other more creative ways to reduce the lexer size.
Also current_literal is taken from literal everywhere
Aha, good catch! We can apply that on top of this PR once it merges again.
The p
change is good too, but pretty please can we do it after this PR? I have 20 commits I would have to rebase.
The p change is good too, but pretty please can we do it after this PR? I have 20 commits I would have to rebase.
Done, there are no conflicts now.
allocating a carrier object would reduce code size but mean another allocation for every call to advance
I thought there are no allocations, it's a pure bit arithmetic that IIRC doesn't allocate any Ruby objects (because small numbers are "embedded" in VALUE
in MRI)
ivar/lvar access sounds like an array/hash lookup, so it should be roughly the same
Local variable access will indeed be the fastest in nearly all situations, since it does not have any additional lookup to do (the index of the variable in the stack, or the register in JIT, are emitted directly into the code).
Instance variables have improved over time, but even with all the latest changes in CRuby it will still require at least a couple memory indirections, plus some cache validation checks.
Array access should be slightly faster than the best instance variable optimizations on CRuby, but probably not by much.
Hash access is very slow in comparison to all of these.
Your benchmark of p
vs @p
may indicate that it's not enough overhead to be a problem, but I know @p
will always be slower than p
even on the best JITs because it will at least be a memory access (rather than a register) unless the object is completely inlined too.
is there a public JRuby/TruffleRuby chat
JRuby's chat is on Matrix: https://matrix.to/#/#jruby:matrix.org
I'm active there today along with @enebo who has also been looking at this work.
it's a pure bit arithmetic that IIRC doesn't allocate any Ruby objects
Hmm... perhaps we can figure out a way to pack all the state variables into a single 64-bit Integer? That would be a single return value, immediate for most values on CRuby, and we could update multiple states at once.
This is well outside my understanding of Ragel's state machine, though.
Done, there are no conflicts now.
This PR is ready to merge any time! I ran rake test
for each change along the way, so it should be pretty clean.
Your p
change will require some methods to be tweaked.
Thank you for the naming suggestions. I just did a best guess on what to call some of these. I will get these methods renamed after lunch.
Updated method names based on review comments. All set!
@headius Thanks!
I missed this until now (and subscribed to the repo to see the next changes), it looks great :)
Array access should be slightly faster than the best instance variable optimizations on CRuby, but probably not by much.
A small note: Array access is typically slower because it needs to check bounds (no need with a Shape), normalize the index if negative and it has one more indirection.
Indeed local vars are still much better than ivars, it's a register access or less (e.g. the JIT can see where the value goes and not even need a register) vs an actual read/write into memory + the Shape check + other checks if not using Shapes. OTOH if there is any block in the method and that's not completely escape analyzed (unlikely here), that will mean local variables become memory accesses because those blocks can observe them, so then they become very similar to ivars with Shapes, but still potentially with more information about what can access it when (e.g., avoids redundant "shape checks" for ivars if no block in between).
See #871 for longer discussion about the size of the
advance
method and the problems it causes for JIT-enabled runtimes.This PR is a start at "outlining" (as opposed to inlining) repetitive code blocks from the
advance
method by moving them into utility methods. Each commit details the change made plus the reduction in the F1advance
instruction count on JRuby 9.4.My process for outlining, for anyone that wishes to reproduce it:
At the time of this PR creation, it reduces the F1
advance
size from 19235 to 15573. It may improve performance on JRuby slightly, but the method is still too large to fully JIT.This PR also reduces the size of the F0
advance
method from 11691 to 10235—still far too large to JIT but a good reduction.