Answer to #ast is wrong for clean blocks returned at end of method

Bug description Under some circumstances, the method containing a clean block gets confused about the PC -> source node mapping such that CompiledBlock>>sourceNodeInOuter returns something that isn't an RBBlockNode. This has lots of interesting consequences, most obviously the block's answer to sourceCode contains extraneous text. It also gives the wrong AST in the AST inspector, and sometimes breaks the IR inspector with e.g. RBReturnNode DNU #ir, or displays the IR for the whole method.

I think the common factor is that the block is returned immediately, by a ^ directly to the left of the opening [ (i.e. not the result of a containing conditional), or the last statement in a DoIt.

To Reproduce

Inspect:

<compilerOptions: #( optionCleanBlockClosure )>
^[ :n :m | n + 1 ].

Notice that the ^ is highlighted, not just the block. AST inspector shows an RBReturnNode, and IR inspector fails with RBReturnNode DNU #ir.

Inspect:

<compilerOptions: #( optionCleanBlockClosure )>
[ :n | n + 1 ].

Now the ast returned is the entire RBDoItMethodNode, so nothing fails but it's even more wrong.
Add some bytecodes after the block, even if it's still the actual return value, like:
```
<compilerOptions: #( optionCleanBlockClosure )>
^true ifTrue: [[ :n | n + 1 ]].
```
Now everything works as expected.

But move the return right next to the block:

<compilerOptions: #( optionCleanBlockClosure )>
true ifTrue: [^[ :n | n + 1 ]].

And the problem is back.

Expected behavior All cases correctly retrieve the RBBlockNode and display correct IR.

Additional notes Looking at the two broken examples, the outer method's bytecodes are the same:

33 <20> pushConstant: [ :n :m | n + 1 ].
34 <5C> returnTop

The CompiledBlock's pcInOuter is correctly determined to be 33, but the bcToASTCache has only one entry, mapping 33 to the return node (with an explicit return), or the whole method (if the return is implicit to the DoIt). This is wrong, 33 should map to the block node and 34 to the return node/DoItMethod.

As a further clue, if we look at:

<compilerOptions: #( optionCleanBlockClosure )>
self isNil.
[ :n | n + 1 ].

we get bytecodes:

41 <4C> self
42 <80> send: isNil
43 <D8> pop
44 <21> pushConstant: self isNil.
[ :n :m | n + 1 ].
45 <5C> returnTop

The CompiledBlock's pcInOuter is still correct—44—and bcToASTCache map looks like (with the explicit return):

41  RBVariableNode(self)
42  RBMessageNode(self isNil)
43  RBMessageNode(self isNil)
44  RBReturnNode(^ [ :n :m | n + 1 ])

So again, the return node is shifted back one instruction, it should be 45 leaving 44 for the block.

Even more interestingly, examining the true ifTrue: [^[...]] case, we get bytecodes:

41 <4D> pushConstant: true
42 <C1> jumpFalse: 45
43 <20> pushConstant: [ :n | n + 1 ]
44 <5C> returnTop
45 <4F> pushConstant: nil
46 <5C> returnTop

pcInOuter is 43, and the map is:

41  RBLiteralValueNode(true)
42  RBMessageNode(true ifTrue: [ ^ [ :n | n + 1 ] ])
43  RBReturnNode(^ [ :n | n + 1 ])
44  RBReturnNode(^ [ :n | n + 1 ])
45  RBMessageNode(true ifTrue: [ ^ [ :n | n + 1 ] ])
46  {the whole DoItMethod}

So now the return node is in the right place, it's just also in the wrong place, stomping over where the block node should be.

Version information: Any OS, Pharo 12 for sure

Expected development cost I've gotten this far but I do not feel like diving into the bc-to-AST-map generation, sorry :/. I hope this helps!

Thanks @daniels220, nice description! It seems to be a general problem with methods returning literals.

The problem is during the bc-to-ast cache building:

OCBytecodeToASTCache >> generateForNode: aNode
    | methodOrBlockIr currentBcOffset |
    methodOrBlockNode := aNode.
    methodOrBlockIr := aNode ir.
    bcToASTMap := Dictionary new.
    firstBcOffset := methodOrBlockIr compiledMethod initialPC.
...

It is built based on node IR. This IR is generated and also cached in the node.

If you generate a new IR, you get one similar to the real bytecodes in the method:

label: 1
pushLiteral: 10
returnTop

But, when you ask for ir compiledMethod (as it is done during bc-to-ast cache building), it has to generate it (because was never generated for the IR). So the IR is fixed using a IRFix, here the class comment:

Transforms:

store, pop => popInto
some returns => quick returns

And it transform the return of a literal to a "quick returns" (?, always. So, after that, the IR is affected and transformed to

label: 1
returnLiteral: 10

I don't know why it is like this, but the bytecode is good generated, so at some point this IR works.

But generating the bc-to-ast cache from that "fixed" IR is more difficult, and finish with these unexpected consequences.

I'll investigate more...

Okay, that makes sense I think. I think the reason it's like this is that for some literals there is a single bytecode that effectively performs both the push and return. However this set is very limited—just nil, true and false—while the IR performs this transformation for all literal returns. See IRTranslator>>returnLiteral: which translates the optimized IR by calling IRBytecodeGenerator>>returnConstant:, which can only generate a return-special-literal for those three cases.

So the obvious fix would be to in some way modify IRPushLiteral>>canBeQuickReturn to answer true for only those three cases. The problem I see with this is that technically that list of quick-returnable literals could differ between bytecode encodings, and the IR is, I assume, explicitly not supposed to be coupled to the bytecode encoding—so we kinda have no way to get the relevant information at that point, as not even IRFix has access to the chosen bytecode encoder, never mind the IR nodes themselves.

More subtly, it seems that if the whole method is just returning a literal, this can be accomplished by a "quick method" (kinda like a primitive, you probably understand this better than I do), and in that case the list of possible literals includes -1, 0, 1, 2 as well. And IRTranslator>>returnLiteral: is also responsible for setting a special lastSpecialReturn variable so that this can be recognized once bytecode generation finishes. So even setting aside the issue of dependency on the encoding, a naive change limiting things to nil,true,false would also disable quick-returns of those four integers. Or we could allow them too, leaving this issue still in effect for methods returning -1,0,1,2—admittedly almost a total non-issue in practice, but it rubs me the wrong way.

(Somewhat annoyingly there is yet a third set of special literals, those which can be pushed with a single bytecode, which includes 0 and 1 but not -1 or 2—but that's neither here nor there for this issue.)

So, it becomes a matter of fixing the bcToAstCache generation. There I think the basic problem is that we have one IR instruction that corresponds to two AST nodes (or, looked at another way, an AST node without a corresponding IR node, as the IR node is most correctly associated with the return node, if we have to pick one, leaving no IR node for the literal node), and generates two bytecodes, such that it really, truly needs two separate entries in the BC-to-AST map. If it were just a matter of making sure the entry for the return bytecode points to the return node, that would be doable if not trivial—I'm quickly getting lost in figuring out how IRInstruction>>bytecodeIndex is set, but I'm sure I could figure it out eventually, and someone with more knowledge of the compiler should have no trouble. But the real problem is that the line bcToASTMap at: ir bytecodeOffset ifAbsentPut: [ ir sourceNode ]. would need to be a loop, it might need to put two entries in the map.

...one option would be to have an IRUnfix that we run on the IR before generating the BC-to-AST map (we already check if the method is quick and take a completely different approach if so, so we don't have to worry about that). It wouldn't want to reverse everything that IRFix does—the pop-and-store transformation doesn't have any of these problems as pop instructions don't correspond to any AST node anyway. And we still have to contend with needing to examine the bytecode encoding to handle the cases where there is in fact a single-bytecode form, lest we erroneously generate a map like:

45 -> RBLiteralValueNode(nil)
46 -> RBReturnNode(^nil)

for a method whose bytecodes actually look like:

45 <5B> return: nil

I feel like there may be other considerations here too, but that's all I have time for right now. Hope it's helpful!

Yes, I agree with you @daniels220 to have the Opal IR "agnostic" to VM implementation.

However, the compiler is already coupled to it, because it is generating the optimized bytecode sequence at the end.

The problem here is when the IR is different to the bytecode that the compiler generates:

The IR is always optimizing the return with literals.
But then, during bytecode generation, there are some checks to generate a deoptimized version from it. See IRBytecodeGenerator >> quickMethodPrim.

This cause that, for the examples in this issue, quick methods (as primitives) and normal methods (returning literals) have the same IR but different bytecodes. So, building the cache and the bytecode is more complex.

I suggest to move that logic to the IR generation (OCASTTranslator).

So, we perform those checks during IR construction, and cache and bytecode generation are easier after.

I would like to know what @MarcusDenker thinks about it 🤔

See also:

IRBytecodeGenerator >> returnConstant:
IRBytecodeGenerator >> pushLiteral:

Oh, great idea! As a quick sketch it looks like this would mean modifying OCASTTranslator>>visitReturnNode: to, after visiting the value of the return node, check if the last IR instruction emitted is a push that can be converted to a quick-return, and do so if so. We would need to either expose IRBuilder.currentSequence, or add some API for interacting with it in a more controlled way, but then this code would end up looking very similar to what IRFix is doing—with the usage of IRPush*>>quickRetNode and InstructionSequence>>replaceNode:withNode:. To keep the IR truly bytecode-agnostic, perhaps we should change canBeQuickReturn -> canBeQuickReturnWithEncoding: aBytecodeEncoder so we can account for potentially-differing lists of quick-returnable literals. I'm also not exactly sure how to handle the whole quick-method distinction as this is a nonlocal property—it depends on the entire method containing nothing else, and we don't strictly speaking even know if we're done when we process the return node. I suppose we would need to add something at the top level of OCASTTranslator>>visitMethodNode: that re-examines things at that point.

Then there's probably some cleanup to do in IRBytecodeGenerator to remove codepaths that will no longer be taken as the IR will no longer be asking for something the bytecode can't deliver.

Just some notes.

Indeed, keeping the IR as agnostic from the target language (the bytecode and its particular encoding) as possible is a general rule that's good to follow.

However, eventually code needs to be lowered to the target, and sometimes doing transformations at the level of the IR can simplify code generation. In that case (which is this case), we end up at some point with an IR that has some coupling with the target... One way to manage this is to keep transformations that are dependent on the target closer to the back end of the compiler, doing them at the very end. Fortunately in this case Opal is not really a multi-pass optimizing compiler, so even without that property this should be easy to manage ^^.

Another thing to notice is that we have a single bytecode encoding right now, so I don't believe that it's worth to let encoders creep more inside the compiler.

So, it becomes a matter of fixing the bcToAstCache generation. There I think the basic problem is that we have one IR instruction that corresponds to two AST nodes (or, looked at another way, an AST node without a corresponding IR node, as the IR node is most correctly associated with the return node, if we have to pick one, leaving no IR node for the literal node), and generates two bytecodes, such that it really, truly needs two separate entries in the BC-to-AST map. If it were just a matter of making sure the entry for the return bytecode points to the return node, that would be doable if not trivial—I'm quickly getting lost in figuring out how IRInstruction>>bytecodeIndex is set, but I'm sure I could figure it out eventually, and someone with more knowledge of the compiler should have no trouble. But the real problem is that the line bcToASTMap at: ir bytecodeOffset ifAbsentPut: [ ir sourceNode ]. would need to be a loop, it might need to put two entries in the map.

Also, this is a real problem that we have right now, and it would be nice to have a working solution. All optimized bytecode sequences (peephole optimizations like you note) but also loops and conditionals (specially loops) have a very bad debugging experience. Try stepping on a whileTrue: :)

pharo-project / pharo

Answer to #ast is wrong for clean blocks returned at end of method #15946

I suggest to move that logic to the IR generation (OCASTTranslator).