Open ghost opened 2 years ago
Let's examine the following Scala code.
import java.util.concurrent.atomic.AtomicInteger
object main {
val counter: AtomicInteger = new AtomicInteger(0)
def str(): Any = {
counter.incrementAndGet()
"abc"
}
def main(args: Array[String]): Unit = {
str() match {
case s: String =>
println(s) // Breakpoint here.
case n: Int =>
println(n)
case _ =>
}
}
}
The important things to note are the side-effectful def str(): Any
method and the pattern match on its possible types.
Let's suppose that we have a breakpoint on line 14 (println(s)
).
Invoking the debugger on this code snippet will stop on the provided breakpoint, as expected. Both the IntelliJ and the Metals debugger show only this
and args
as the values in scope.
A much better user experience would be to also show the s
variable, which is bound within the case
. Here's a screenshot of the same code compiled in Scala 3.
The difference comes from the emitted bytecode (which we will examine a bit later in much more detail) between Scala 2 and Scala 3.
Unfortunately, showing bound variables in this context is only one problem. Expression evaluation is also negatively affected by this. Here's where the side-effectful nature of def str(): Any
comes into play. As we can see from the screenshots, on line 14, the expression str()
as part of the match
has already been evaluated and its value stored in the variable s
. However, because of the missing debugging information, the debugger is unable to find what local variable this value has been written to, in order to display its value, because this variable does not have a name.
In theory, all of the local variables of a method can be (and are) queried according to their index, and not their name. This could be exploited to examine what local variables are written to right after the instanceof
checks, and subsequently query those variables.
Unfortunately, I have found examples where the debugger is unable to read unnamed local variables and throwing cryptic exceptions, with varying behavior in situations like reordered case
s (case n: Int =>
coming before case s: String =>
). This might be an issue with the IntelliJ Platform Debugger (the code that the Scala plugin is built on top of), or it might be some sort of bytecode patterns that are produced by Scala that the debugger is unable to work with, I cannot tell at the moment, because that is very difficult to debug. In any case, that approach of examining unnamed local variables is very brittle in practice and I believe is not used by debuggers (whether it's the IntelliJ Java debugger, or the debugger used in VS Code).
Coming back to the expression evaluation example. Because the debugger is unable to find the value of s
, it has to resort to traversing the Scala source up to the match
ed expression (in this case str()
) and evaluate that, in order to extract the value. However, as you can see, that expression can be side-effectful and thus reevaluating it changes the state of the world that we're trying to observe. The ideal solution (and what is available in Scala 3) is to just read the value of the evaluated match
expression from the local variable it was saved it.
public void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: (0x0001) ACC_PUBLIC
Code:
stack=2, locals=6, args_size=2
0: aload_0
1: invokevirtual #40 // Method str:()Ljava/lang/Object;
4: astore_3
5: aload_3
6: instanceof #42 // class java/lang/String
9: ifeq 33
12: aload_3
13: checkcast #42 // class java/lang/String
16: astore 4
18: getstatic #47 // Field scala/Predef$.MODULE$:Lscala/Predef$;
21: aload 4
23: invokevirtual #51 // Method scala/Predef$.println:(Ljava/lang/Object;)V
26: getstatic #57 // Field scala/runtime/BoxedUnit.UNIT:Lscala/runtime/BoxedUnit;
29: astore_2
30: goto 77
33: goto 36
36: aload_3
37: instanceof #59 // class java/lang/Integer
40: ifeq 67
43: aload_3
44: invokestatic #65 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
47: istore 5
49: getstatic #47 // Field scala/Predef$.MODULE$:Lscala/Predef$;
52: iload 5
54: invokestatic #69 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
57: invokevirtual #51 // Method scala/Predef$.println:(Ljava/lang/Object;)V
60: getstatic #57 // Field scala/runtime/BoxedUnit.UNIT:Lscala/runtime/BoxedUnit;
63: astore_2
64: goto 77
67: goto 70
70: getstatic #57 // Field scala/runtime/BoxedUnit.UNIT:Lscala/runtime/BoxedUnit;
73: astore_2
74: goto 77
77: return
StackMapTable: number_of_entries = 5
frame_type = 253 /* append */
offset_delta = 33
locals = [ top, class java/lang/Object ]
frame_type = 2 /* same */
frame_type = 30 /* same */
frame_type = 2 /* same */
frame_type = 255 /* full_frame */
offset_delta = 6
locals = [ class main$, class "[Ljava/lang/String;", class scala/runtime/BoxedUnit, class java/lang/Object ]
stack = []
LineNumberTable:
line 12: 0
line 13: 5
line 14: 18
line 13: 33
line 15: 36
line 16: 49
line 15: 67
line 17: 70
line 12: 77
LocalVariableTable:
Start Length Slot Name Signature
0 78 0 this Lmain$;
0 78 1 args [Ljava/lang/String;
MethodParameters:
Name Flags
args final
The important part is the LocalVariableTable
down at the bottom, which only lists the implicit this
argument (which is available for all instance methods on the JVM) and the args
function parameter of main
. There's no other mention of named local variables, even though the function can use up to 6 local variables, as described in stack=2, locals=6, args_size=2
.
public void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: (0x0001) ACC_PUBLIC
Code:
stack=2, locals=5, args_size=2
0: aload_0
1: invokevirtual #49 // Method str:()Ljava/lang/Object;
4: astore_2
5: aload_2
6: instanceof #51 // class java/lang/String
9: ifeq 27
12: aload_2
13: checkcast #51 // class java/lang/String
16: astore_3
17: getstatic #56 // Field scala/Predef$.MODULE$:Lscala/Predef$;
20: aload_3
21: invokevirtual #60 // Method scala/Predef$.println:(Ljava/lang/Object;)V
24: goto 57
27: aload_2
28: instanceof #62 // class java/lang/Integer
31: ifeq 54
34: aload_2
35: invokestatic #68 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
38: istore 4
40: getstatic #56 // Field scala/Predef$.MODULE$:Lscala/Predef$;
43: iload 4
45: invokestatic #72 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
48: invokevirtual #60 // Method scala/Predef$.println:(Ljava/lang/Object;)V
51: goto 57
54: goto 57
57: return
StackMapTable: number_of_entries = 3
frame_type = 252 /* append */
offset_delta = 27
locals = [ class java/lang/Object ]
frame_type = 26 /* same */
frame_type = 2 /* same */
LineNumberTable:
line 11: 0
line 12: 0
line 13: 5
line 14: 17
line 15: 27
line 16: 40
line 17: 54
LocalVariableTable:
Start Length Slot Name Signature
17 10 3 s Ljava/lang/String;
40 14 4 n I
0 58 0 this Lmain$;
0 58 1 args [Ljava/lang/String;
Signature: #46 // ([Ljava/lang/String;)V
MethodParameters:
Name Flags
args final
Examining the LocalVariableTable
in the Scala 3 bytecode shows both s
and n
as named local variables within their validity scope (from 17 to 27 (length 10) and 40 to 54 (length 14) respectively). The debugger is able to examine the values of these variables and display them, as well as evaluate expressions based on s
and n
without changing the semantics of the program by reevaluating side effects.
I hope that I've been able to present a clear improvement with this report. I expect the other instances that I will file to follow a similar style. Please do not hesitate to ask for more clarifications and suggest further improvements. Thanks for reading.
I'll repeat one thing I said on Discord, namely:
Would backport PRs (of Scala 3 back end improvements, moved back to Scala 2) be reviewed and likely merged, is this an area in which Scala 2 remains open to change? The answer to that is yes, absolutely. Examples of recently merged backend PRs in scala/scala: 9930, 9976, 10022, 10025, ...
I'm wondering if this overlaps at all with https://github.com/scala/scala/pull/9121 ? (which seems abandoned now for Scala 2, though similar work is going forward for Scala 3)
As far as I can tell, that PR (and the one in Scala 3) mainly deal with enabling better inspections in inlined code, which is a whole can of worms on its own. 😆
@vasilmkd-jetbrains having bytecode for matches that the debugger understands better would be excellent. I'm happy to assist where I can.
@lrytz Please assume that I have no knowledge of the bytecode generation backend. Where would I start with something like the above? Thanks.
Without looking in detail - is this possibly "only" about the local variables table? There's a method emitLocalVarScope
in the backend, you can probably work your way back from there to find out why pattern bound locals are not emitted.
Thanks, that's still a helpful pointer.
To my knowledge, we don't have learning resources oriented towards src/compiler/scala/tools/nsc/backend
in particular.
The absolute best way to learn is to ask very specific questions on issues and PRs, especially your own PRs. Half-finished PRs and not-even-half-finished PRs in the scala/scala repo are totally fine to open and then ask questions on, so we can help you get unstuck and take the next steps.
I haven't done much work on the back end itself (Lukas is by far the most expert on that), but I have built an alternate back end, so I know a lot about the tree shapes that come in to the back end. There is other knowledge spread around among Scala 2 team & contributors and the Scala 3 ones too.
One thing that perhaps you already know, but I'll state explicitly just in case, is that the Scala 2 and Scala 3 back ends are basically the same code. They've drifted a bit out of sync with each other, as improvements haven't always been forward-ported or back-ported. But they're not drastically different. Moving improvements (and knowledge) back and forth between the two is often pretty straightforward.
(Though for pattern matching in particular, the Scala 2 tree shapes and the Scala 3 tree shapes are rather different...)
So, I took a look at the backend. It seems that BCodeBodyBuilder.genLoadTo
is the method that generates code for ValDef
, and it even has a helpful comment:
if (!isSynth) { // there are case <synthetic> ValDef's emitted by patmat
varsInScope ::= (sym -> localVarStart)
}
I compared this code to the Dotty backend, and it seems to be the same.
This hints to me that Scala 2 is marking patmat vals as synthetic
, while Scala 3 does not. How do we proceed from here? Thanks.
~perhaps https://github.com/scala/scala/commit/f6ee85bed766dff518e88ca216893992987f1b3a (which I found by grepping the git history with ... | grep -i pat | grep -i synthet
) should be reverted? unfortunately the original commit lacks an explanation of why the change seemed desirable~ (never mind)
~note that I'm just eyeballing this, I'm not actually 100% sure that what you're talking about is the same thing Paul was talking about. nor am I sure that https://github.com/scala/scala/pull/1666 (which turned up in the same search) isn't relevant. wdyt?~ (never mind)
Should I ask the Scala 3 team as well, for their thoughts on the matter?
Um, not sure. It's hard to be sure who you would ask exactly, as you're asking about the absence of a flag. You might look through the git history in their repo and see if you turn up any leads (though again, it's hard to gather evidence about an absence). Perhaps you could ask them a meta-question about whether someone in particular is knowledgeable in this area.
(It's often tempting to say "ask Guillaume" since he often seems to know everything, but of course always asking Guillaume doesn't scale...)
My idea was more or less to have a high level discussion about whether patmat variables should be synthetic or not. My claim is that there is merit to them being not synthetic and visible. It would also shine a light on whether Scala 3's behavior is a bug or a feature. 😄
It seems reasonable that patmat variables are not synthetic, since they are just like other variables. @vasilmkd-jetbrains I encourage you to make them not synthetic in a PR. Then, I am not sure exactly how we could assess the impact of such a change. Do you guys think there could be some tooling relying on patmat variables being flagged as synthetic…?
Exactly, I'm also a bit vary of such a change having a larger impact than wanted or anticipated. It's also why I would still like to invite someone from the Scala 3 team to discuss it.
I'm kind of responsible for the pattern matcher in Scala 3, although I did not write its first iteration. I can't tell how the history of synthetic-ness for patmat variables happened.
That said, I'm pretty sure that any definition visible in the source code should not be synthetic. So I'd be in favor of making patmat variables non-synthetic.
Thank you @sjrd for chiming in!
I just took a closer look at https://github.com/scala/scala/commit/f6ee85bed766dff518e88ca216893992987f1b3a and I no longer think it's relevant — it looks to apply only to synthetic temporaries. Sorry for the misdirection.
I'm pretty sure that any definition visible in the source code should not be synthetic
Agree.
I asked the following question on the Scala Discord and I was invited by @SethTisue to open a discussion here, in hopes of getting more eyes on the issue.
I'm including the text from the Discord message verbatim:
I would like to use this tracking issue to present current pain points with the Scala 2 debugging experience that I believe are feasible to solve on the compiler side. I will post those examples as follow up comments to this issue.
I haven't worked on the compiler backend to date, but I'm willing to learn and help in the process, as well as provide feedback and testing of the implementations. Thanks in advance.