Open hbenl opened 1 year ago
I've had to revise my proposal after realizing something I had overlooked: I wanted to attach binding information (mapping original variables to generated expressions) to the original sources but (the body of) an original function can appear multiple times in the generated source due to function inlining:
original:
function foo(condition) {
log(condition ? "yes" : "no");
}
foo(true);
foo(bar);
generated:
log("yes");
log("no");
When you're paused in the first line of the generated source you'd expect the debugger to show that you're paused in the second line of the original source and an original scope with a variable condition
with the value true
but when you're paused in the second line of the generated source, condition
should be false
.
So we need to add two sets of binding information for the scope of foo
but where do we put them? The solution is to add them to the generated source and allow "virtual scopes": ranges in the generated source that correspond to
original scopes that have been removed by the transpiler.
In this example we'd add two "virtual scopes", covering line 1 and 2 of the generated source respectively, with one of them mapping condition
to true
and the other to false
.
So my revised proposal is to add scope information to the generated source with the following information for each scope:
Points to discuss:
mappings
arraynames
propertyscopeNames
)TBD:
Some takeaways from the naming meeting:
I mentioned the next case:
// ORIGINAL SOURCE:
const foo = 5
const bar = 6
console.log(foo + bar)
// GENERATED SOURCE:
/* after a constant folding could looks like this: */
console.log(11)
/* ^ at this point of debugging we don't have any information about the original values of `foo` and `bar` variables */
The question is, do we want to have the ability to describe debug information for such cases?
@JSMonk In this case the sourcemap could specify the bindings as foo
-> 5
, bar
-> 6
. That's the power of allowing arbitrary expressions as binding values (although this power may create some headaches).
Here are some more examples (I specify the ranges for the scopes as \
Original:
1 {
2 let x = 1;
3 console.log(x);
4 {
5 let x = 2;
6 console.log(x);
7 }
8 console.log(x);
9 }
Generated:
1 {
2 var x1 = 1;
3 console.log(x1);
4 var x2 = 2;
5 console.log(x2);
6 console.log(x1);
7 }
Scopes:
x
-> x1
x
-> x2
Original:
1 function foo(condition) {
2 log(condition ? "yes" : "no");
3 }
4 foo(true);
5 foo(false);
Generated:
1 log("yes");
2 log("no");
Scopes:
foo
-> ??? (I think the sourcemap should specify this binding and somehow mark its value as not available)foo
condition
-> true
foo
condition
-> false
Original:
1 function outer(num) {
2 function inner(value) {
3 const value_plus_one = value + 1;
4 console.log(value_plus_one);
5 }
6 const num_plus_one = num + 1;
7 inner(num_plus_one);
8 }
9 outer(1);
Generated:
1 function f(a) {
2 function g(a) {
3 const b = a + 1;
4 console.log(b);
5 }
6 const b = a + 1;
7 g(b);
8 }
9 f(1);
Scopes:
outer
-> f
outer
inner
-> g
num
-> a
num_plus_one
-> b
inner
value
-> a
value_plus_one
-> b
I've created an example implementation here for computing the original scopes using the scope information encoded in the sourcemap and added two of the examples above.
Btw, I've just realized that with this proposal it's also possible to "map" values. This is what I'm trying to achieve with the #51 The potential of this metadata to improve debug experience is huge.
I have added a decodeScopes()
function to this repo showing how scope information could be encoded in a VLQ string: each scope is encoded as a list of VLQ numbers, scopes are separated by ,
ScopeType.NAMED_FUNCTION
, the second number is the index of the scope name in the scopeNames
arraystartLine
, startColumn
, endLine
and endColumn
scopeNames
array@hbenl I believe there should also be a source index to indicate the file containing the source scope.
@JSMonk The scope locations are in the generated source and there is exactly one generated source for a sourcemap. The sourcemap's mappings
will tell you the corresponding locations in the original sources (including the source index).
@hbenl @JSMonk would it help if I added the spec notes that Tobias took as well as the screenshots from the Munich meetup to this issue? Or maybe somewhere else?
@jkup Sure, add them to this issue.
Tobias' spec notes: https://gist.github.com/sokra/97a53a869b9a421accadbc9681cb26f3
Here's a particularly tricky example, merging two modules into one and then inlining a function from one module into the other module (produced with rollup and terser and then pretty-printed):
Original files:
1 import { f } from "./two";
2 let num = 42;
3 f(num);
4 console.log(num++);
1 let increment = 1;
2 export function f(x) {
3 console.log(x + increment++);
4 }
Generated file:
1 let l = 1;
2 let o = 42;
3 var e;
4 (e = o),
5 console.log(e + l++),
6 console.log(o++);
When paused at the console.log(e + l++)
statement, without sourcemaps the debugger shows one frame with two scopes:
l
and o
e
.With sourcemaps it should show two frames:
f()
with three scopes:
f()
containing x
increment
and f
f
and num
The scopes information in the sourcemap would be:
increment
-> l
f
-> (unavailable)f
-> (unavailable)num
-> o
increment
-> l
f
-> (unavailable)f
x
-> e
I've also added this example to my example implementation but will have to update the implementation to correctly support function inlining (i.e. recreate the frame that was "lost" due to inlining).
I stumbled over two examples where the current "bindings per scope" approach is not that great for debuggers. The first one is the following:
original:
(function foo() {
let someVar = 'hello';
console.log('hello');
someVar = 'world';
console.log('world');
})();
generated:
console.log('hello');
console.log('world');
The current proposal would call for one entry in bindings
for someVar
, but we don't have a value since there are 2 different possible values in the scope of foo
. We could split the scope with two different bindings and let them both point back to the same foo
original scope, but that feels a bit unwieldy if we use full scope entries to model variable live ranges.
The second example is taken from https://szuend.github.io/scope-proposal-examples/01_simple/simple.html
original:
function addWithMultiply(arg1, arg2, arg3) {
const intermediate = arg1 + arg2;
if (arg3 !== undefined) {
const result = intermediate * arg3;
return result;
}
return intermediate;
}
addWithMultiply(2, 3, 4);
generated:
function n(n,t,e){const r=n+t;if(e!==undefined){const n=r*e;return n}return r}n(2,3,4)
The current proposal calls for the following binding sections for the function and block scope respectively:
"bindings": [["arg1", "n"], ["arg2", "t"], ["arg3", "e"], ["intermediate", "r"]],
"bindings": [["result", "n"]]
Let's say we pause on the return n
statement. arg1
is unavailable to us given that result
also got renamed to n
. What makes this harder for debuggers is that the right-hand side of any binding can be an arbitrary expression. The debugger has to do some parsing to find out which generated variables are actually used in a bindings expression and whether that variable was shadowed. This could be solved if the block scope would mark arg1
as unavailable (["arg1", -1]
).
Both of these examples together with issue #61 lead me to believe that we should consider using live ranges instead of "per scope bindings". With live ranges we would specify all the generated ranges where a original variable is available and what expression to use for each range. This would make generators more complicated since they have to figure out liveness, but debuggers become "trivial" since for any given generated position they only have to check the source map for all live ranges at that position to figure out the set of available original variables.
About the second example:
arg1
is unavailable to us given thatresult
also got renamed ton
Not quite: arg1
is unavailable to code in the inner scope because its generated variable n
is shadowed. But it is available to the debugger. Here's a screenshot of Firefox paused at return n
:
So the debugger just needs to look up n
in the outer scope to get the value for the original variable arg1
.
This example shows how that would work (although it implements a slightly outdated version of this proposal). Being able to reconstruct the values of original variables that got shadowed by the transpiler is one of the design goals of this proposal.
Sorry for the confusion. I was a bit hung up on how all of this currently works in Chrome DevTools and V8. There we need to distinguish between a debugger's scope view and arbitrary expression evaluation at a breakpoint. The former is also already possible in Chrome DevTools (similar to how you pointed out it works in Firefox). It'll even source map the name already and show you arg1
and the right value.
For expression evaluation Chrome DevTools currently only does a search&replace of variable names from original to compiled, which runs in the aforementioned shadowing problem. We don't provide V8 yet with source mapped scope information (but I aim to fix this once the proposal is further along). With only a search&replace scheme it is currently tricky to figure out what is a valid substitution due to liveness.
Note though that any tool that does constant propagation can't properly express that with "per scope" bindings.
For expression evaluation Chrome DevTools currently only does a search&replace of variable names from original to compiled, which runs in the aforementioned shadowing problem.
Exactly. The debugger would need to be able to evaluate an expression "in an outer scope" but I haven't seen any debuggers that offer this functionality. It can be emulated though: to evaluate an expression <expr>
in a given scope with bindings [[arg1, val1], [arg2, val2], ...]
, we'd evaluate (arg1, arg2, ...) => (<expr>)
and then apply that function to the arguments val1, val2, ...
.
I've included example code for this here.
I was thinking we could modify CDPs Debugger#evaluateOnCallFrame
to provide the variable name information. V8's implementation of debug evaluate is basically an eval
with with
scopes. We can extend those with
scope contexts to also include the source mapped names.
Sure, I just wanted to point out that this can be implemented without adding special support to the debugger protocol.
The env proposal would add information about scopes and bindings in the original source to a sourcemap. That proposal satisfies all the scenarios listed here but there is one scenario it doesn't support: variable shadowing, i.e. a variable declared in an outer scope that is shadowed by another variable with the same name in an inner scope. This scenario is quite common because javascript minifiers reuse variable names aggressively.
To recap: The env proposal would add an
env
field encoding the scopes and bindings:Imagine that this original code
was transpiled to
The scope information encoded according to the env proposal would contain the following binding information (I'm omitting the bindings for the functions here):
value
=>a
,value_plus_one
=>b
num
=>a
,num_plus_one
=>b
When the debugger is paused at
console.log(b);
, the scopes would be:a
=>2
,b
=>3
a
=>1
,b
=>2
Now the debugger tries to reconstruct the original scopes: for
num_plus_one
in the outer function scope it would evaluateb
and get3
but the correct value is2
. To get the correct value, the debugger would need to understand that it should take the value ofb
in the outer function scope. In this example it seems rather obvious from which scope the value ofb
should be taken but in general, with deeper nesting of scopes and the transpiler adding and/or removing scopes there's no way for the debugger to know which generated scope it should use. This could be fixed by adding information about the generated scopes to the sourcemap (in addition to the information about original scopes): to find the value for a variable in a given original scope the debugger would translate the start and end locations of that scope to generated locations and then find the innermost generated scope that contains those locations. It would then evaluate the expression from the binding information in that scope. Note, however, that debuggers usually don't support evaluating in an outer scope of the one that execution is currently paused in, so the debugger would have to "emulate" this evaluation. This would probably only be feasible for very simple expressions, e.g.x
,obj.prop
andarr[idx]
, but this would cover a lot of ground already and go beyond the current capabilities of javascript debuggers I've worked with (in the best case they support what can be done by only supporting variable names as binding expressions). Note that the information about generated scopes would only need to contain the start and end locations of the scopes, but not their type, name and bindings (although I would optionally keep the type information because I imagine it could be useful).