Closed fitzgen closed 2 months ago
Here are some cases that I think the source map's environment rematerialization should support:
When there are scopes in the original source language that do not correspond to any scope in the generated JS code. For example, one could imagine an ES6 to ES3 compiler transforming this ES6 code:
{
let x = 1;
console.log(x);
{
let x = 2;
console.log(x);
}
console.log(x);
}
Into this ES3 code:
{
var x1 = 1;
console.log(x1);
var x2 = 2;
console.log(x2);
console.log(x1);
}
Note the nested block scope in the ES6 source that does not exist in the ES3 source. We should be able to recreate this scope.
We should support simple variable renaming.
For example, a Scheme-to-JS compiler might emit variables with !
replaced as
_bang
: set-cdr!
becomes set_cdr_bang
.
Another example: an ES6 to ES3 compiler might bind the this
outside of an
arrow function to a variable and close over it:
var myArrow = () => this.x;
|
V
var _this = this;
var myArrow = function () { return _this.x; };
The rematerialized scope inside the myArrow
function should have a this
binding that points to the _this
variable.
We should support rematerializing bindings in the original source language that do not have any corresponding bindings in the generated JS code.
First example (Python to JS):
result = [x + 1 for x in list]
|
V
var result = [];
for (var i = 0; i < list.length; i++) {
// Note: no `x` binding in this generated JS code.
result.push(list[i] + 1);
}
Second example (C++ ish to JS, psuedo-code for brevity):
class Point {
int x;
int y;
// `Point lhs += Point rhs` == `lhs.x += rhs.x; lhs.y += rhs.y;`
}
void moveDiagonal(Point &a) {
Point offset(1, 1);
a += offset;
}
|
V
function moveDiagonal(a) {
// Note: `offset` not only doesn't exist as a binding in this generated JS
// code, its members have been exploded and inlined!
a.x += 1;
a.y += 1;
}
Awesome, that's pretty damn comprehensive and covers the vast majority of the use cases developer will see.
Few questions:
- Do we need to call out a use case for sources languages that make use of a types. Kinda of a nuanced renaming variable case. Just want to make sure fields are considered in the design.
Types are interesting, especially when the generated JS representation of multiple source-level types are the same. An example where this is true is with emscripten's pointers and integers. It seems to me that the only way a pretty printer could do the right thing in this case is if it had the source-level type information. So yes, I agree it is very important.
There's lots of information we could (and I hope to) encode in source maps about variables:
However, trying to bite off everything at once is less than ideal. We'd risk either getting stuck trying to over-engineer the perfect format, or we would ship the wrong things and have no good story for fixing it in future iterations.
I'd prefer if we could figure out the (a) bare minimum set of data points needed to recreate the source environment, and (b) how to ensure that we can extend the format in the future to add the bells and whistles.
My hope is that we can initially add pretty printing functions that take only the value, and independently add environments without optional source-level type information. After we've agreed upon those things, we can add optional type information to the environment, and pass that as a second parameter to the pretty printing functions. In this way, we can continually and incrementally ship improvements to the format without trapping ourselves in a dead end by making future improvements impossible.
2 Are we considering non imperative languages that might be source mapped? Specifically CSS.I assume, but just wanted to be clear.
I have much less of an understanding of how compilers targeting CSS use source maps than I do of compilers targeting JS and JS debuggers consuming source maps.
My understanding is that they work pretty alright, and there were much less deficiencies than with the to-JS case. It would be great if someone who understands this subdomain really well stepped up and took responsibility for ensuring that we provide for the to-CSS needs as well.
I wrote a little bit about how DWARF solves this problem: http://fitzgeraldnick.com/weblog/62/
How does this proposal address or complicate the issue of transitivity? With source maps it's currently trivial to merge transformations between distinct and unrelated JavaScript compilation technology. Once you start encoding scopes it seems to me transitivity becomes increasingly more difficult to preserve. I could be wrong about that and happy to hear that there is prior art or that this is fundamentally a non-issue.
How does this proposal address or complicate the issue of transitivity? With source maps it's currently trivial to merge transformations between distinct and unrelated JavaScript compilation technology. Once you start encoding scopes it seems to me transitivity becomes increasingly more difficult to preserve. I could be wrong about that and happy to hear that there is prior art or that this is fundamentally a non-issue.
The good news is that adding these scopes and bindings doesn't make it more difficult to compose a source map's location mappings, which as you point out many tools do now. A tool could easily ignore the environment information and nothing would be any different from the situation now. In general, it is a goal of future extensions to be 100% backwards compatible so that existing tooling doesn't break; that tooling just won't take advantage of the shiny new features enabled by such extensions.
If some tool in the pipeline does not modify the environment in any way, then all it need do is apply the same transformation of locations that it does to each mapping to the start and end bounds of each scope.
As far as prior art goes, unfortunately I'm not aware of anything directly
related, nor are the ex-gdb
folks I asked. Traditional compilers don't really
have this issue because there isn't usually any post-processing (such as
minification) of the resulting executables. Either libraries are statically
compiled into the binary, in which case the compiler generates debug info along
with the main program's debug info, or they are dynamically loaded, in which
case they already have their own separate debug info.
The next closest tools are things like Valgrind and DTrace which instrument the
executable with additional probes after the fact. Valgrind instruments the
binary to jump to its own JIT'd code which records its traces and then jumps
back to the normal program. If you want to debug with gdb
while using
valgrind, it actually implements its own gdb
server and internally translates
whatever shifted offsets happened because of the instrumentation. On the other
hand,
DTrace bends over backwards to avoid shifting offsets whatsoever.
Neither approach seems too relevant to our discussion.
Yacc emits #line
pragmas, which is fairly similar to composing source map
location information, but punts on scopes/bindings. dwz
is a commandline tool
to compress DWARF debugging info, but doesn't actually modify the executable.
I'd be interested if you know of any bytecode instrumenters in JVM-land that both modify the environment and maintain source-level debugging of the environment. That certainly seems relevant, but I am ignorant of JVM bytecode instrumentation.
The minifier, or any other tool that takes JS for further processing and changes the environment, is the only thing that understands the changes it makes. Therefore, it would have to propogate that information via the source map, by doing some translation for each scope and binding. I've sketched out an algorithm below:
1) For each scope S:
1.1) For each binding B in S:
1.1.1) Parse the JS snippet for locating B's value
1.1.2) Walk the resulting AST and create a map M mapping from old JS
bindings that snippet relies on to their new, renamed binding
1.1.3) Generate a new JS snippet that first defines `var <old> = <new>;`
for each of the entries in M
1.1.4) Append the original JS snipet for locating B's value to that
snippet
1.1.5) Use that new JS snippet for locating B's value in the new
source map
1.2) Adjust the start and end bounds of S the same way location mappings
are adjusted during source map composition now
Note again that if the tool does not modify the environment, then it can skip step 1.1 and only do 1.2 for each scope. When this is the case, composing source maps is not harder than without scopes and binding information.
When the tool does modify the environment (eg, a minifier shortening variable names), while I wouldn't say this process is super straightforward, it is far from impossible. Furthermore, I don't see a way to encode any environment information without having some kind of process like this when composing source maps and maintaining the ability to rematerialize the source-level environment. And that's regardless of how the environment data is encoded: whether it be purely data, a 100% JS reflection API, or some custom opcode language like what DWARF has.
At the end of the day, I think the benefits outweight the drawbacks, especially because things can only get better, and not worse, if we maintain backwards compatibility.
I've always thought that it might one day be replaced with something with a different structure more inline with a traditional debug format (a binary format) but something would need to be proposed.
On Mon, Mar 5, 2018 at 7:34 AM, Daniel Ehrenberg notifications@github.com wrote:
It's great to see this investigation. The lack of encoding of variables and scopes was cited https://github.com/WebAssembly/design/pull/1051/files#diff-8e85308ab5cc1e83e91ef59233648be2R338 in the draft WebAssembly/source-map integration as a downside of source-maps, and it seems like there is frequent discussion of this feature in the mailing list (e.g., @rbuckton https://github.com/rbuckton's post https://groups.google.com/forum/#!topic/mozilla.dev.js-sourcemap/NVuynvaFQDY ).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/source-map/source-map-rfc/issues/2#issuecomment-370457522, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMDKvkvc5D7itStxCg8HS3z-GFmcJ8Yks5tbVrXgaJpZM4Dh5o4 .
@concavelenz Now that we have the motivation from both @dschuff's WebAssembly integration and the continued widespread use of minimizers and transpilers, should we start this effort to create this new structure?
I believe this is subsumed by the current scopes proposal. If there is anything missing, we should add it as a follow up to the main proposal.
Source maps should be able to encode the original source language's environment, scopes, and variables. They should be encoded in such a way that if a debugger is paused at a given generated JS location, it can restore and display the original source language's variables and parent scopes that are in scope at the paused location.