Scopes and variable shadowing

hbenl commented 1 year ago

The env proposal would add information about scopes and bindings in the original source to a sourcemap. That proposal satisfies all the scenarios listed here but there is one scenario it doesn't support: variable shadowing, i.e. a variable declared in an outer scope that is shadowed by another variable with the same name in an inner scope. This scenario is quite common because javascript minifiers reuse variable names aggressively.

To recap: The env proposal would add an env field encoding the scopes and bindings:

for each scope, its start and end locations, type (function or block scope), name (in case of a function scope), bindings and child scopes are encoded
for each binding, its type, original variable name and an expression that can be evaluated by the debugger to get the current value are encoded; in most cases, this expression is simply the name of the generated variable that corresponds to the original variable but in some cases (e.g. when variables are inlined or with named imports where the transpiled code may only contain a variable for the imported module but not for the named import), there is no generated variable corresponding to the original variable

Imagine that this original code

function outer() {
  function inner(value) {
    const value_plus_one = value + 1;
    console.log(value_plus_one);
  }
  const num = 1;
  const num_plus_one = num + 1;
  inner(num_plus_one);
}
outer();

was transpiled to

function outer() {
  function inner(a) {
    const b = a + 1;
    console.log(b);
  }
  const a = 1;
  const b = num + 1;
  inner(b);
}
outer();

The scope information encoded according to the env proposal would contain the following binding information (I'm omitting the bindings for the functions here):

for the inner scope: value => a, value_plus_one => b
for the outer scope: num => a, num_plus_one => b

When the debugger is paused at console.log(b);, the scopes would be:

inner function scope: a => 2, b => 3
outer function scope: a => 1, b => 2

Now the debugger tries to reconstruct the original scopes: for num_plus_one in the outer function scope it would evaluate b and get 3 but the correct value is 2. To get the correct value, the debugger would need to understand that it should take the value of b in the outer function scope. In this example it seems rather obvious from which scope the value of b should be taken but in general, with deeper nesting of scopes and the transpiler adding and/or removing scopes there's no way for the debugger to know which generated scope it should use. This could be fixed by adding information about the generated scopes to the sourcemap (in addition to the information about original scopes): to find the value for a variable in a given original scope the debugger would translate the start and end locations of that scope to generated locations and then find the innermost generated scope that contains those locations. It would then evaluate the expression from the binding information in that scope. Note, however, that debuggers usually don't support evaluating in an outer scope of the one that execution is currently paused in, so the debugger would have to "emulate" this evaluation. This would probably only be feasible for very simple expressions, e.g. x, obj.prop and arr[idx], but this would cover a lot of ground already and go beyond the current capabilities of javascript debuggers I've worked with (in the best case they support what can be done by only supporting variable names as binding expressions). Note that the information about generated scopes would only need to contain the start and end locations of the scopes, but not their type, name and bindings (although I would optionally keep the type information because I imagine it could be useful).

hbenl commented 1 year ago

I've had to revise my proposal after realizing something I had overlooked: I wanted to attach binding information (mapping original variables to generated expressions) to the original sources but (the body of) an original function can appear multiple times in the generated source due to function inlining:

original:

function foo(condition) {
  log(condition ? "yes" : "no");
}
foo(true);
foo(bar);

generated:

log("yes");
log("no");

When you're paused in the first line of the generated source you'd expect the debugger to show that you're paused in the second line of the original source and an original scope with a variable condition with the value true but when you're paused in the second line of the generated source, condition should be false. So we need to add two sets of binding information for the scope of foo but where do we put them? The solution is to add them to the generated source and allow "virtual scopes": ranges in the generated source that correspond to original scopes that have been removed by the transpiler. In this example we'd add two "virtual scopes", covering line 1 and 2 of the generated source respectively, with one of them mapping condition to true and the other to false.

So my revised proposal is to add scope information to the generated source with the following information for each scope:

the range of the scope in the generated source
whether this scope appears only in the original source (i.e. it was removed by the transpiler), only in the generated source (i.e. it was added by the transpiler) or both
an optional name (the original function name for function scopes that appear in the original source)
optional binding information, mapping original variable names to generated expressions (only for scopes that appear in an original source)

Points to discuss:

how to specify locations:
- using an index into the mappings array
- using (relative) line and column numbers
where to store the strings from the binding information
- in the names property
- in a new property (e.g. scopeNames)
- this was already discussed, I think we've settled on the second option
how to encode the scopes
- the env proposal specifies an extensible encoding inspired by DWARF
- or a simpler, more "traditional" encoding (e.g. a list of scopes separated by ",", each described by a tuple of VLQ values)

TBD:

what are the implications for sourcemap size?
what are the implications for sourcemap producers and consumers and merging of sourcemaps?
describe the algorithms for symbolizing stack traces and for computing original scopes in a debugger
provide examples covering the scenarios lined out here as well as variable shadowing

jkup commented 1 year ago

Some takeaways from the naming meeting:

We should look into how DWARF handles this
Some discussion around expressions and how maybe we should only support a subset like property access?
@JSMonk mentioned also constant folding as another example.
@hbenl requested some good examples showing source and generated code so we can discuss specifics

JSMonk commented 1 year ago

I mentioned the next case:

// ORIGINAL SOURCE:

const foo = 5
const bar = 6

console.log(foo + bar)

// GENERATED SOURCE:

/* after a constant folding could looks like this: */
console.log(11) 
/*          ^ at this point of debugging we don't have any information about the original values of `foo` and `bar` variables */

The question is, do we want to have the ability to describe debug information for such cases?

hbenl commented 1 year ago

@JSMonk In this case the sourcemap could specify the bindings as foo -> 5, bar -> 6. That's the power of allowing arbitrary expressions as binding values (although this power may create some headaches).

hbenl commented 1 year ago

Here are some more examples (I specify the ranges for the scopes as \:\ - \:\, with 1-based lines and columns)

Removed scopes

Original:

1 {
2   let x = 1;
3   console.log(x);
4   {
5     let x = 2;
6     console.log(x);
7   }
8   console.log(x);
9 }

Generated:

1 {
2   var x1 = 1;
3   console.log(x1);
4   var x2 = 2;
5   console.log(x2);
6   console.log(x1);
7 }

Scopes:

module scope, 1:1 - 7:2
- appears in both original and generated source
- no name
- no bindings
1:1 - 7:2
- appears in both original and generated source
- no name
- bindings:
- x -> x1
4:1 - 5:20
- appears only in original source
- no name
- bindings:
- x -> x2

Inlined functions

Original:

1 function foo(condition) {
2   log(condition ? "yes" : "no");
3 }
4 foo(true);
5 foo(false);

Generated:

1 log("yes");
2 log("no");

Scopes:

module scope, 1:1 - 2:12
- appears in both original and generated source
- no name
- bindings:
- foo -> ??? (I think the sourcemap should specify this binding and somehow mark its value as not available)
1:1 - 1:13
- appears only in original source
- name foo
- bindings:
- condition -> true
2:1 - 2:12
- appears only in original source
- name foo
- bindings:
- condition -> false

Variable shadowing

Original:

1 function outer(num) {
2   function inner(value) {
3     const value_plus_one = value + 1;
4     console.log(value_plus_one);
5   }
6   const num_plus_one = num + 1;
7   inner(num_plus_one);
8 }
9 outer(1);

Generated:

1 function f(a) {
2   function g(a) {
3     const b = a + 1;
4     console.log(b);
5   }
6   const b = a + 1;
7   g(b);
8 }
9 f(1);

Scopes:

module scope, 1:1 - 10:5
- appears in both original and generated source
- no name
- bindings:
- outer -> f
1:1 - 8:2
- appears in both original and generated source
- name outer
- bindings:
- inner -> g
- num -> a
- num_plus_one -> b
2:3 - 5:4
- appears in both original and generated source
- name inner
- bindings:
- value -> a
- value_plus_one -> b

hbenl commented 1 year ago

I've created an example implementation here for computing the original scopes using the scope information encoded in the sourcemap and added two of the examples above.

JSMonk commented 1 year ago

Btw, I've just realized that with this proposal it's also possible to "map" values. This is what I'm trying to achieve with the #51 The potential of this metadata to improve debug experience is huge.

hbenl commented 1 year ago

I have added a decodeScopes() function to this repo showing how scope information could be encoded in a VLQ string: each scope is encoded as a list of VLQ numbers, scopes are separated by ,

the first number encodes the scope type and whether the scope appears in the original and generated sources
if the scope type is ScopeType.NAMED_FUNCTION, the second number is the index of the scope name in the scopeNames array
the next four numbers are startLine, startColumn, endLine and endColumn
for each binding we add the indices of the variable name and the expression in the scopeNames array

JSMonk commented 1 year ago

@hbenl I believe there should also be a source index to indicate the file containing the source scope.

hbenl commented 1 year ago

@JSMonk The scope locations are in the generated source and there is exactly one generated source for a sourcemap. The sourcemap's mappings will tell you the corresponding locations in the original sources (including the source index).

jkup commented 1 year ago

@hbenl @JSMonk would it help if I added the spec notes that Tobias took as well as the screenshots from the Munich meetup to this issue? Or maybe somewhere else?

hbenl commented 1 year ago

@jkup Sure, add them to this issue.

hbenl commented 1 year ago

Tobias' spec notes: https://gist.github.com/sokra/97a53a869b9a421accadbc9681cb26f3

hbenl commented 1 year ago

Here's a particularly tricky example, merging two modules into one and then inlining a function from one module into the other module (produced with rollup and terser and then pretty-printed):

Original files:

one.js

1 import { f } from "./two";
2 let num = 42;
3 f(num);
4 console.log(num++);

two.js

1 let increment = 1;
2 export function f(x) {
3   console.log(x + increment++);
4 }

Generated file:

1 let l = 1;
2 let o = 42;
3 var e;
4 (e = o),
5 console.log(e + l++),
6 console.log(o++);

When paused at the console.log(e + l++) statement, without sourcemaps the debugger shows one frame with two scopes:

the module scope containing l and o
the global scope containing e.

With sourcemaps it should show two frames:

the top frame for a call to f() with three scopes:
- the function scope for f() containing x
- the module scope for two.js containing increment and f
- the global scope
the second frame for the execution of module one.js with two scopes:
- the module scope for one.js containing f and num
- the global scope

The scopes information in the sourcemap would be:

module scope, 1:1 - 6:18
- appears only in generated source
- no name
- no bindings
module scope (for two.js), 1:1 - 1:11
- appears only in original source
- no name
- bindings:
- increment -> l
- f -> (unavailable)
module scope (for one.js), 2:1 - 6:18
- appears only in original source
- no name
- bindings:
- f -> (unavailable)
- num -> o
module scope (for two.js), 3:1 - 5:22
- appears only in original source
- no name
- bindings:
- increment -> l
- f -> (unavailable)
function scope, 3:1 - 5:22
- appears only in original source (i.e. this function was inlined)
- name f
- bindings:
- x -> e

I've also added this example to my example implementation but will have to update the implementation to correctly support function inlining (i.e. recreate the frame that was "lost" due to inlining).

szuend commented 1 year ago

I stumbled over two examples where the current "bindings per scope" approach is not that great for debuggers. The first one is the following:

original:

(function foo() {
  let someVar = 'hello';
  console.log('hello');
  someVar = 'world';
  console.log('world');
})();

generated:

console.log('hello');
console.log('world');

The current proposal would call for one entry in bindings for someVar, but we don't have a value since there are 2 different possible values in the scope of foo. We could split the scope with two different bindings and let them both point back to the same foo original scope, but that feels a bit unwieldy if we use full scope entries to model variable live ranges.

The second example is taken from https://szuend.github.io/scope-proposal-examples/01_simple/simple.html

original:

function addWithMultiply(arg1, arg2, arg3) {
  const intermediate = arg1 + arg2;
  if (arg3 !== undefined) {
   const result = intermediate * arg3;
   return result;
  }
  return intermediate;
}

addWithMultiply(2, 3, 4);

generated:

function n(n,t,e){const r=n+t;if(e!==undefined){const n=r*e;return n}return r}n(2,3,4)

The current proposal calls for the following binding sections for the function and block scope respectively:

"bindings": [["arg1", "n"], ["arg2", "t"], ["arg3", "e"], ["intermediate", "r"]],
"bindings": [["result", "n"]]

Let's say we pause on the return n statement. arg1 is unavailable to us given that result also got renamed to n. What makes this harder for debuggers is that the right-hand side of any binding can be an arbitrary expression. The debugger has to do some parsing to find out which generated variables are actually used in a bindings expression and whether that variable was shadowed. This could be solved if the block scope would mark arg1 as unavailable (["arg1", -1]).

Both of these examples together with issue #61 lead me to believe that we should consider using live ranges instead of "per scope bindings". With live ranges we would specify all the generated ranges where a original variable is available and what expression to use for each range. This would make generators more complicated since they have to figure out liveness, but debuggers become "trivial" since for any given generated position they only have to check the source map for all live ranges at that position to figure out the set of available original variables.

hbenl commented 1 year ago

About the second example:

arg1 is unavailable to us given that result also got renamed to n

Not quite: arg1 is unavailable to code in the inner scope because its generated variable n is shadowed. But it is available to the debugger. Here's a screenshot of Firefox paused at return n:

So the debugger just needs to look up n in the outer scope to get the value for the original variable arg1. This example shows how that would work (although it implements a slightly outdated version of this proposal). Being able to reconstruct the values of original variables that got shadowed by the transpiler is one of the design goals of this proposal.

szuend commented 1 year ago

Sorry for the confusion. I was a bit hung up on how all of this currently works in Chrome DevTools and V8. There we need to distinguish between a debugger's scope view and arbitrary expression evaluation at a breakpoint. The former is also already possible in Chrome DevTools (similar to how you pointed out it works in Firefox). It'll even source map the name already and show you arg1 and the right value.

For expression evaluation Chrome DevTools currently only does a search&replace of variable names from original to compiled, which runs in the aforementioned shadowing problem. We don't provide V8 yet with source mapped scope information (but I aim to fix this once the proposal is further along). With only a search&replace scheme it is currently tricky to figure out what is a valid substitution due to liveness.

Note though that any tool that does constant propagation can't properly express that with "per scope" bindings.

hbenl commented 1 year ago

For expression evaluation Chrome DevTools currently only does a search&replace of variable names from original to compiled, which runs in the aforementioned shadowing problem.

Exactly. The debugger would need to be able to evaluate an expression "in an outer scope" but I haven't seen any debuggers that offer this functionality. It can be emulated though: to evaluate an expression <expr> in a given scope with bindings [[arg1, val1], [arg2, val2], ...], we'd evaluate (arg1, arg2, ...) => (<expr>) and then apply that function to the arguments val1, val2, .... I've included example code for this here.

szuend commented 1 year ago

I was thinking we could modify CDPs Debugger#evaluateOnCallFrame to provide the variable name information. V8's implementation of debug evaluate is basically an eval with with scopes. We can extend those with scope contexts to also include the source mapped names.

hbenl commented 1 year ago

Sure, I just wanted to point out that this can be implemented without adding special support to the debugger protocol.

tc39 / source-map