scala / scala-dev

Scala 2 team issues. Not for user-facing bugs or directly actionable user-facing improvements. For build/test/infra and for longer-term planning and idea tracking. Our bug tracker is at https://github.com/scala/bug/issues
Apache License 2.0
130 stars 15 forks source link

Generating better JVM bytecode for debuggers #811

Open ghost opened 2 years ago

ghost commented 2 years ago

I asked the following question on the Scala Discord and I was invited by @SethTisue to open a discussion here, in hopes of getting more eyes on the issue.

I'm including the text from the Discord message verbatim:

Hi. I'm working on the IntelliJ Scala plugin team, and one of my main longer term goals is improving the debugger experience. Since we're focusing on improving Scala 3 support, I've already opened a few issues in the Scala 3 repo that tackle changes that benefit all debuggers (both IntelliJ and Metals). These usually come in the form of backend changes and fixes which output JVM bytecode that lends itself for better inspection. Despite bugs and optimization opportunities (this is not a criticism, just pointing out the fact that Scala 3 is a new compiler), I'm pleased to say that Scala 3 has indeed given tooling developers (talking about debuggers) more and better bytecode to work with (even though not all of these opportunities are taken advantage of, I for one am working on that). My question is, geared more towards the Scala 2 backend people, how feasible would it be and is there any willingness to backport some of these changes, so that the debugging experience in Scala 2 can also be improved. I'm asking here, because I don't think these are necessarily bugs, but more like "tooling focused feature requests" without any language changes (and so this question doesn't really belong on the GitHub issue tracker). I would love to discuss some examples. Looking forward to your responses. I'm tagging @lrytz (looking at the GitHub readme for Scala 2, they are a person to talk to about the backend), but also feel free to tag other people that have relevant knowledge of this area, and of course, anyone else is welcome to chime in. Thanks in advance.

I would like to use this tracking issue to present current pain points with the Scala 2 debugging experience that I believe are feasible to solve on the compiler side. I will post those examples as follow up comments to this issue.

I haven't worked on the compiler backend to date, but I'm willing to learn and help in the process, as well as provide feedback and testing of the implementations. Thanks in advance.

ghost commented 2 years ago

Match expressions on types don't generate enough debugging information

Let's examine the following Scala code.

import java.util.concurrent.atomic.AtomicInteger

object main {
  val counter: AtomicInteger = new AtomicInteger(0)

  def str(): Any = {
    counter.incrementAndGet()
    "abc"
  }

  def main(args: Array[String]): Unit = {
    str() match {
      case s: String =>
        println(s) // Breakpoint here.
      case n: Int =>
        println(n)
      case _ =>
    }
  }
}

The important things to note are the side-effectful def str(): Any method and the pattern match on its possible types.

Let's suppose that we have a breakpoint on line 14 (println(s)).

Invoking the debugger on this code snippet will stop on the provided breakpoint, as expected. Both the IntelliJ and the Metals debugger show only this and args as the values in scope.

Screen Shot 2022-05-16 at 12 50 18 Screen Shot 2022-05-16 at 12 50 52

A much better user experience would be to also show the s variable, which is bound within the case. Here's a screenshot of the same code compiled in Scala 3.

Screen Shot 2022-05-16 at 12 53 23

The difference comes from the emitted bytecode (which we will examine a bit later in much more detail) between Scala 2 and Scala 3.

How it affects expression evaluation

Unfortunately, showing bound variables in this context is only one problem. Expression evaluation is also negatively affected by this. Here's where the side-effectful nature of def str(): Any comes into play. As we can see from the screenshots, on line 14, the expression str() as part of the match has already been evaluated and its value stored in the variable s. However, because of the missing debugging information, the debugger is unable to find what local variable this value has been written to, in order to display its value, because this variable does not have a name.

In theory, all of the local variables of a method can be (and are) queried according to their index, and not their name. This could be exploited to examine what local variables are written to right after the instanceof checks, and subsequently query those variables.

Unfortunately, I have found examples where the debugger is unable to read unnamed local variables and throwing cryptic exceptions, with varying behavior in situations like reordered cases (case n: Int => coming before case s: String =>). This might be an issue with the IntelliJ Platform Debugger (the code that the Scala plugin is built on top of), or it might be some sort of bytecode patterns that are produced by Scala that the debugger is unable to work with, I cannot tell at the moment, because that is very difficult to debug. In any case, that approach of examining unnamed local variables is very brittle in practice and I believe is not used by debuggers (whether it's the IntelliJ Java debugger, or the debugger used in VS Code).

Coming back to the expression evaluation example. Because the debugger is unable to find the value of s, it has to resort to traversing the Scala source up to the matched expression (in this case str()) and evaluate that, in order to extract the value. However, as you can see, that expression can be side-effectful and thus reevaluating it changes the state of the world that we're trying to observe. The ideal solution (and what is available in Scala 3) is to just read the value of the evaluated match expression from the local variable it was saved it.

Examining Scala 2 and Scala 3 bytecode

Scala 2 bytecode

public void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: (0x0001) ACC_PUBLIC
    Code:
      stack=2, locals=6, args_size=2
         0: aload_0
         1: invokevirtual #40                 // Method str:()Ljava/lang/Object;
         4: astore_3
         5: aload_3
         6: instanceof    #42                 // class java/lang/String
         9: ifeq          33
        12: aload_3
        13: checkcast     #42                 // class java/lang/String
        16: astore        4
        18: getstatic     #47                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
        21: aload         4
        23: invokevirtual #51                 // Method scala/Predef$.println:(Ljava/lang/Object;)V
        26: getstatic     #57                 // Field scala/runtime/BoxedUnit.UNIT:Lscala/runtime/BoxedUnit;
        29: astore_2
        30: goto          77
        33: goto          36
        36: aload_3
        37: instanceof    #59                 // class java/lang/Integer
        40: ifeq          67
        43: aload_3
        44: invokestatic  #65                 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
        47: istore        5
        49: getstatic     #47                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
        52: iload         5
        54: invokestatic  #69                 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
        57: invokevirtual #51                 // Method scala/Predef$.println:(Ljava/lang/Object;)V
        60: getstatic     #57                 // Field scala/runtime/BoxedUnit.UNIT:Lscala/runtime/BoxedUnit;
        63: astore_2
        64: goto          77
        67: goto          70
        70: getstatic     #57                 // Field scala/runtime/BoxedUnit.UNIT:Lscala/runtime/BoxedUnit;
        73: astore_2
        74: goto          77
        77: return
      StackMapTable: number_of_entries = 5
        frame_type = 253 /* append */
          offset_delta = 33
          locals = [ top, class java/lang/Object ]
        frame_type = 2 /* same */
        frame_type = 30 /* same */
        frame_type = 2 /* same */
        frame_type = 255 /* full_frame */
          offset_delta = 6
          locals = [ class main$, class "[Ljava/lang/String;", class scala/runtime/BoxedUnit, class java/lang/Object ]
          stack = []
      LineNumberTable:
        line 12: 0
        line 13: 5
        line 14: 18
        line 13: 33
        line 15: 36
        line 16: 49
        line 15: 67
        line 17: 70
        line 12: 77
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0      78     0  this   Lmain$;
            0      78     1  args   [Ljava/lang/String;
    MethodParameters:
      Name                           Flags
      args                           final

The important part is the LocalVariableTable down at the bottom, which only lists the implicit this argument (which is available for all instance methods on the JVM) and the args function parameter of main. There's no other mention of named local variables, even though the function can use up to 6 local variables, as described in stack=2, locals=6, args_size=2.

Scala 3 bytecode

  public void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: (0x0001) ACC_PUBLIC
    Code:
      stack=2, locals=5, args_size=2
         0: aload_0
         1: invokevirtual #49                 // Method str:()Ljava/lang/Object;
         4: astore_2
         5: aload_2
         6: instanceof    #51                 // class java/lang/String
         9: ifeq          27
        12: aload_2
        13: checkcast     #51                 // class java/lang/String
        16: astore_3
        17: getstatic     #56                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
        20: aload_3
        21: invokevirtual #60                 // Method scala/Predef$.println:(Ljava/lang/Object;)V
        24: goto          57
        27: aload_2
        28: instanceof    #62                 // class java/lang/Integer
        31: ifeq          54
        34: aload_2
        35: invokestatic  #68                 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
        38: istore        4
        40: getstatic     #56                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
        43: iload         4
        45: invokestatic  #72                 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
        48: invokevirtual #60                 // Method scala/Predef$.println:(Ljava/lang/Object;)V
        51: goto          57
        54: goto          57
        57: return
      StackMapTable: number_of_entries = 3
        frame_type = 252 /* append */
          offset_delta = 27
          locals = [ class java/lang/Object ]
        frame_type = 26 /* same */
        frame_type = 2 /* same */
      LineNumberTable:
        line 11: 0
        line 12: 0
        line 13: 5
        line 14: 17
        line 15: 27
        line 16: 40
        line 17: 54
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
           17      10     3     s   Ljava/lang/String;
           40      14     4     n   I
            0      58     0  this   Lmain$;
            0      58     1  args   [Ljava/lang/String;
    Signature: #46                          // ([Ljava/lang/String;)V
    MethodParameters:
      Name                           Flags
      args                           final

Examining the LocalVariableTable in the Scala 3 bytecode shows both s and n as named local variables within their validity scope (from 17 to 27 (length 10) and 40 to 54 (length 14) respectively). The debugger is able to examine the values of these variables and display them, as well as evaluate expressions based on s and n without changing the semantics of the program by reevaluating side effects.

Discussion about the report

I hope that I've been able to present a clear improvement with this report. I expect the other instances that I will file to follow a similar style. Please do not hesitate to ask for more clarifications and suggest further improvements. Thanks for reading.

SethTisue commented 2 years ago

I'll repeat one thing I said on Discord, namely:

Would backport PRs (of Scala 3 back end improvements, moved back to Scala 2) be reviewed and likely merged, is this an area in which Scala 2 remains open to change? The answer to that is yes, absolutely. Examples of recently merged backend PRs in scala/scala: 9930, 9976, 10022, 10025, ...

SethTisue commented 2 years ago

I'm wondering if this overlaps at all with https://github.com/scala/scala/pull/9121 ? (which seems abandoned now for Scala 2, though similar work is going forward for Scala 3)

ghost commented 2 years ago

As far as I can tell, that PR (and the one in Scala 3) mainly deal with enabling better inspections in inlined code, which is a whole can of worms on its own. 😆

lrytz commented 2 years ago

@vasilmkd-jetbrains having bytecode for matches that the debugger understands better would be excellent. I'm happy to assist where I can.

ghost commented 2 years ago

@lrytz Please assume that I have no knowledge of the bytecode generation backend. Where would I start with something like the above? Thanks.

lrytz commented 2 years ago

Without looking in detail - is this possibly "only" about the local variables table? There's a method emitLocalVarScope in the backend, you can probably work your way back from there to find out why pattern bound locals are not emitted.

ghost commented 2 years ago

Thanks, that's still a helpful pointer.

SethTisue commented 2 years ago

To my knowledge, we don't have learning resources oriented towards src/compiler/scala/tools/nsc/backend in particular.

The absolute best way to learn is to ask very specific questions on issues and PRs, especially your own PRs. Half-finished PRs and not-even-half-finished PRs in the scala/scala repo are totally fine to open and then ask questions on, so we can help you get unstuck and take the next steps.

I haven't done much work on the back end itself (Lukas is by far the most expert on that), but I have built an alternate back end, so I know a lot about the tree shapes that come in to the back end. There is other knowledge spread around among Scala 2 team & contributors and the Scala 3 ones too.

One thing that perhaps you already know, but I'll state explicitly just in case, is that the Scala 2 and Scala 3 back ends are basically the same code. They've drifted a bit out of sync with each other, as improvements haven't always been forward-ported or back-ported. But they're not drastically different. Moving improvements (and knowledge) back and forth between the two is often pretty straightforward.

(Though for pattern matching in particular, the Scala 2 tree shapes and the Scala 3 tree shapes are rather different...)

ghost commented 2 years ago

So, I took a look at the backend. It seems that BCodeBodyBuilder.genLoadTo is the method that generates code for ValDef, and it even has a helpful comment:

if (!isSynth) { // there are case <synthetic> ValDef's emitted by patmat
  varsInScope ::= (sym -> localVarStart)
}

I compared this code to the Dotty backend, and it seems to be the same.

This hints to me that Scala 2 is marking patmat vals as synthetic, while Scala 3 does not. How do we proceed from here? Thanks.

SethTisue commented 2 years ago

~perhaps https://github.com/scala/scala/commit/f6ee85bed766dff518e88ca216893992987f1b3a (which I found by grepping the git history with ... | grep -i pat | grep -i synthet) should be reverted? unfortunately the original commit lacks an explanation of why the change seemed desirable~ (never mind)

~note that I'm just eyeballing this, I'm not actually 100% sure that what you're talking about is the same thing Paul was talking about. nor am I sure that https://github.com/scala/scala/pull/1666 (which turned up in the same search) isn't relevant. wdyt?~ (never mind)

ghost commented 2 years ago

Should I ask the Scala 3 team as well, for their thoughts on the matter?

SethTisue commented 2 years ago

Um, not sure. It's hard to be sure who you would ask exactly, as you're asking about the absence of a flag. You might look through the git history in their repo and see if you turn up any leads (though again, it's hard to gather evidence about an absence). Perhaps you could ask them a meta-question about whether someone in particular is knowledgeable in this area.

(It's often tempting to say "ask Guillaume" since he often seems to know everything, but of course always asking Guillaume doesn't scale...)

ghost commented 2 years ago

My idea was more or less to have a high level discussion about whether patmat variables should be synthetic or not. My claim is that there is merit to them being not synthetic and visible. It would also shine a light on whether Scala 3's behavior is a bug or a feature. 😄

julienrf commented 2 years ago

It seems reasonable that patmat variables are not synthetic, since they are just like other variables. @vasilmkd-jetbrains I encourage you to make them not synthetic in a PR. Then, I am not sure exactly how we could assess the impact of such a change. Do you guys think there could be some tooling relying on patmat variables being flagged as synthetic…?

ghost commented 2 years ago

Exactly, I'm also a bit vary of such a change having a larger impact than wanted or anticipated. It's also why I would still like to invite someone from the Scala 3 team to discuss it.

sjrd commented 2 years ago

I'm kind of responsible for the pattern matcher in Scala 3, although I did not write its first iteration. I can't tell how the history of synthetic-ness for patmat variables happened.

That said, I'm pretty sure that any definition visible in the source code should not be synthetic. So I'd be in favor of making patmat variables non-synthetic.

ghost commented 2 years ago

Thank you @sjrd for chiming in!

SethTisue commented 2 years ago

I just took a closer look at https://github.com/scala/scala/commit/f6ee85bed766dff518e88ca216893992987f1b3a and I no longer think it's relevant — it looks to apply only to synthetic temporaries. Sorry for the misdirection.

I'm pretty sure that any definition visible in the source code should not be synthetic

Agree.