Open daniels220 opened 9 months ago
Thanks @daniels220, nice description! It seems to be a general problem with methods returning literals.
The problem is during the bc-to-ast cache building:
OCBytecodeToASTCache >> generateForNode: aNode
| methodOrBlockIr currentBcOffset |
methodOrBlockNode := aNode.
methodOrBlockIr := aNode ir.
bcToASTMap := Dictionary new.
firstBcOffset := methodOrBlockIr compiledMethod initialPC.
...
It is built based on node IR. This IR is generated and also cached in the node.
If you generate a new IR, you get one similar to the real bytecodes in the method:
label: 1
pushLiteral: 10
returnTop
But, when you ask for ir compiledMethod
(as it is done during bc-to-ast cache building), it has to generate it (because was never generated for the IR).
So the IR is fixed using a IRFix
, here the class comment:
Transforms:
store, pop => popInto
some returns => quick returns
And it transform the return of a literal to a "quick returns" (?, always. So, after that, the IR is affected and transformed to
label: 1
returnLiteral: 10
I don't know why it is like this, but the bytecode is good generated, so at some point this IR works.
But generating the bc-to-ast cache from that "fixed" IR is more difficult, and finish with these unexpected consequences.
I'll investigate more...
Okay, that makes sense I think. I think the reason it's like this is that for some literals there is a single bytecode that effectively performs both the push and return. However this set is very limited—just nil
, true
and false
—while the IR performs this transformation for all literal returns. See IRTranslator>>returnLiteral:
which translates the optimized IR by calling IRBytecodeGenerator>>returnConstant:
, which can only generate a return-special-literal for those three cases.
So the obvious fix would be to in some way modify IRPushLiteral>>canBeQuickReturn
to answer true
for only those three cases. The problem I see with this is that technically that list of quick-returnable literals could differ between bytecode encodings, and the IR is, I assume, explicitly not supposed to be coupled to the bytecode encoding—so we kinda have no way to get the relevant information at that point, as not even IRFix
has access to the chosen bytecode encoder, never mind the IR nodes themselves.
More subtly, it seems that if the whole method is just returning a literal, this can be accomplished by a "quick method" (kinda like a primitive, you probably understand this better than I do), and in that case the list of possible literals includes -1
, 0
, 1
, 2
as well. And IRTranslator>>returnLiteral:
is also responsible for setting a special lastSpecialReturn
variable so that this can be recognized once bytecode generation finishes. So even setting aside the issue of dependency on the encoding, a naive change limiting things to nil
,true
,false
would also disable quick-returns of those four integers. Or we could allow them too, leaving this issue still in effect for methods returning -1
,0
,1
,2
—admittedly almost a total non-issue in practice, but it rubs me the wrong way.
(Somewhat annoyingly there is yet a third set of special literals, those which can be pushed with a single bytecode, which includes 0
and 1
but not -1
or 2
—but that's neither here nor there for this issue.)
So, it becomes a matter of fixing the bcToAstCache generation. There I think the basic problem is that we have one IR instruction that corresponds to two AST nodes (or, looked at another way, an AST node without a corresponding IR node, as the IR node is most correctly associated with the return node, if we have to pick one, leaving no IR node for the literal node), and generates two bytecodes, such that it really, truly needs two separate entries in the BC-to-AST map. If it were just a matter of making sure the entry for the return bytecode points to the return node, that would be doable if not trivial—I'm quickly getting lost in figuring out how IRInstruction>>bytecodeIndex
is set, but I'm sure I could figure it out eventually, and someone with more knowledge of the compiler should have no trouble. But the real problem is that the line bcToASTMap at: ir bytecodeOffset ifAbsentPut: [ ir sourceNode ].
would need to be a loop, it might need to put two entries in the map.
...one option would be to have an IRUnfix
that we run on the IR before generating the BC-to-AST map (we already check if the method is quick and take a completely different approach if so, so we don't have to worry about that). It wouldn't want to reverse everything that IRFix does—the pop-and-store transformation doesn't have any of these problems as pop
instructions don't correspond to any AST node anyway. And we still have to contend with needing to examine the bytecode encoding to handle the cases where there is in fact a single-bytecode form, lest we erroneously generate a map like:
45 -> RBLiteralValueNode(nil)
46 -> RBReturnNode(^nil)
for a method whose bytecodes actually look like:
45 <5B> return: nil
I feel like there may be other considerations here too, but that's all I have time for right now. Hope it's helpful!
Yes, I agree with you @daniels220 to have the Opal IR "agnostic" to VM implementation.
However, the compiler is already coupled to it, because it is generating the optimized bytecode sequence at the end.
The problem here is when the IR is different to the bytecode that the compiler generates:
IRBytecodeGenerator >> quickMethodPrim
.This cause that, for the examples in this issue, quick methods (as primitives) and normal methods (returning literals) have the same IR but different bytecodes. So, building the cache and the bytecode is more complex.
So, we perform those checks during IR construction, and cache and bytecode generation are easier after.
I would like to know what @MarcusDenker thinks about it 🤔
See also:
IRBytecodeGenerator >> returnConstant:
IRBytecodeGenerator >> pushLiteral:
Oh, great idea! As a quick sketch it looks like this would mean modifying OCASTTranslator>>visitReturnNode:
to, after visiting the value of the return node, check if the last IR instruction emitted is a push that can be converted to a quick-return, and do so if so. We would need to either expose IRBuilder.currentSequence
, or add some API for interacting with it in a more controlled way, but then this code would end up looking very similar to what IRFix is doing—with the usage of IRPush*>>quickRetNode
and InstructionSequence>>replaceNode:withNode:
. To keep the IR truly bytecode-agnostic, perhaps we should change canBeQuickReturn
-> canBeQuickReturnWithEncoding: aBytecodeEncoder
so we can account for potentially-differing lists of quick-returnable literals. I'm also not exactly sure how to handle the whole quick-method distinction as this is a nonlocal property—it depends on the entire method containing nothing else, and we don't strictly speaking even know if we're done when we process the return node. I suppose we would need to add something at the top level of OCASTTranslator>>visitMethodNode:
that re-examines things at that point.
Then there's probably some cleanup to do in IRBytecodeGenerator
to remove codepaths that will no longer be taken as the IR will no longer be asking for something the bytecode can't deliver.
Just some notes.
Indeed, keeping the IR as agnostic from the target language (the bytecode and its particular encoding) as possible is a general rule that's good to follow.
However, eventually code needs to be lowered to the target, and sometimes doing transformations at the level of the IR can simplify code generation. In that case (which is this case), we end up at some point with an IR that has some coupling with the target... One way to manage this is to keep transformations that are dependent on the target closer to the back end of the compiler, doing them at the very end. Fortunately in this case Opal is not really a multi-pass optimizing compiler, so even without that property this should be easy to manage ^^.
Another thing to notice is that we have a single bytecode encoding right now, so I don't believe that it's worth to let encoders creep more inside the compiler.
So, it becomes a matter of fixing the bcToAstCache generation. There I think the basic problem is that we have one IR instruction that corresponds to two AST nodes (or, looked at another way, an AST node without a corresponding IR node, as the IR node is most correctly associated with the return node, if we have to pick one, leaving no IR node for the literal node), and generates two bytecodes, such that it really, truly needs two separate entries in the BC-to-AST map. If it were just a matter of making sure the entry for the return bytecode points to the return node, that would be doable if not trivial—I'm quickly getting lost in figuring out how
IRInstruction>>bytecodeIndex
is set, but I'm sure I could figure it out eventually, and someone with more knowledge of the compiler should have no trouble. But the real problem is that the linebcToASTMap at: ir bytecodeOffset ifAbsentPut: [ ir sourceNode ].
would need to be a loop, it might need to put two entries in the map.
Also, this is a real problem that we have right now, and it would be nice to have a working solution.
All optimized bytecode sequences (peephole optimizations like you note) but also loops and conditionals (specially loops) have a very bad debugging experience. Try stepping on a whileTrue:
:)
Bug description Under some circumstances, the method containing a clean block gets confused about the PC -> source node mapping such that
CompiledBlock>>sourceNodeInOuter
returns something that isn't anRBBlockNode
. This has lots of interesting consequences, most obviously the block's answer tosourceCode
contains extraneous text. It also gives the wrong AST in the AST inspector, and sometimes breaks the IR inspector with e.g.RBReturnNode DNU #ir
, or displays the IR for the whole method.I think the common factor is that the block is returned immediately, by a
^
directly to the left of the opening[
(i.e. not the result of a containing conditional), or the last statement in a DoIt.To Reproduce
^
is highlighted, not just the block. AST inspector shows an RBReturnNode, and IR inspector fails withRBReturnNode DNU #ir
.RBDoItMethodNode
, so nothing fails but it's even more wrong.Expected behavior All cases correctly retrieve the RBBlockNode and display correct IR.
Additional notes Looking at the two broken examples, the outer method's bytecodes are the same:
The
CompiledBlock
'spcInOuter
is correctly determined to be33
, but thebcToASTCache
has only one entry, mapping33
to the return node (with an explicit return), or the whole method (if the return is implicit to the DoIt). This is wrong,33
should map to the block node and34
to the return node/DoItMethod.As a further clue, if we look at:
we get bytecodes:
The
CompiledBlock
'spcInOuter
is still correct—44
—andbcToASTCache
map looks like (with the explicit return):So again, the return node is shifted back one instruction, it should be 45 leaving 44 for the block.
Even more interestingly, examining the
true ifTrue: [^[...]]
case, we get bytecodes:pcInOuter
is43
, and the map is:So now the return node is in the right place, it's just also in the wrong place, stomping over where the block node should be.
Version information: Any OS, Pharo 12 for sure
Expected development cost I've gotten this far but I do not feel like diving into the bc-to-AST-map generation, sorry :/. I hope this helps!