Encode scopes and variables in source map

fitzgen commented 9 years ago

Source maps should be able to encode the original source language's environment, scopes, and variables. They should be encoded in such a way that if a debugger is paused at a given generated JS location, it can restore and display the original source language's variables and parent scopes that are in scope at the paused location.

fitzgen commented 9 years ago

Here are some cases that I think the source map's environment rematerialization should support:

When there are scopes in JS that do not correspond to any scopes in the original source language. For example, the compiler emitted an immediately-invoked-function-expression as an implementation detail that doesn't reflect any nested function or scope in the original source language.
When there are scopes in the original source language that do not correspond to any scope in the generated JS code. For example, one could imagine an ES6 to ES3 compiler transforming this ES6 code:
```
{
  let x = 1;
  console.log(x);

  {
    let x = 2;
    console.log(x);
  }

  console.log(x);
}
```
Into this ES3 code:
```
{
  var x1 = 1;
  console.log(x1);
  var x2 = 2;
  console.log(x2);
  console.log(x1);
}
```
Note the nested block scope in the ES6 source that does not exist in the ES3 source. We should be able to recreate this scope.
We should support simple variable renaming.

For example, a Scheme-to-JS compiler might emit variables with ! replaced as _bang: set-cdr! becomes set_cdr_bang.

Another example: an ES6 to ES3 compiler might bind the this outside of an arrow function to a variable and close over it:
```
var myArrow = () => this.x;

        |
        V

var _this = this;
var myArrow = function () { return _this.x; };
```
The rematerialized scope inside the myArrow function should have a this binding that points to the _this variable.
We should support hiding bindings in the generated JS code that do not correspond to any bindings in the original source language. These might be gensyms, or temporary variables, or any implementation detail of the compiler's emitted JS code.

We should support rematerializing bindings in the original source language that do not have any corresponding bindings in the generated JS code.

First example (Python to JS):

result = [x + 1 for x in list]

        |
        V

var result = [];
for (var i = 0; i < list.length; i++) {
  // Note: no `x` binding in this generated JS code.
  result.push(list[i] + 1);
}

Second example (C++ ish to JS, psuedo-code for brevity):

class Point {
  int x;
  int y;

  // `Point lhs += Point rhs` == `lhs.x += rhs.x; lhs.y += rhs.y;`
}

void moveDiagonal(Point &a) {
  Point offset(1, 1);
  a += offset;
}

        |
        V

function moveDiagonal(a) {
  // Note: `offset` not only doesn't exist as a binding in this generated JS
  // code, its members have been exploded and inlined!
  a.x += 1;
  a.y += 1;
}

andysterland commented 9 years ago

Awesome, that's pretty damn comprehensive and covers the vast majority of the use cases developer will see.

Few questions:

Do we need to call out a use case for sources languages that make use of a types. Kinda of a nuanced renaming variable case. Just want to make sure fields are considered in the design. 2 Are we considering non imperative languages that might be source mapped? Specifically CSS.I assume, but just wanted to be clear.

fitzgen commented 9 years ago

Do we need to call out a use case for sources languages that make use of a types. Kinda of a nuanced renaming variable case. Just want to make sure fields are considered in the design.

Types are interesting, especially when the generated JS representation of multiple source-level types are the same. An example where this is true is with emscripten's pointers and integers. It seems to me that the only way a pretty printer could do the right thing in this case is if it had the source-level type information. So yes, I agree it is very important.

There's lots of information we could (and I hope to) encode in source maps about variables:

Source-level type (should be optional, since not every compile-to-js language is statically typed)
Declaration location
Whether it is a formal parameter vs. constant vs. local definition

However, trying to bite off everything at once is less than ideal. We'd risk either getting stuck trying to over-engineer the perfect format, or we would ship the wrong things and have no good story for fixing it in future iterations.

I'd prefer if we could figure out the (a) bare minimum set of data points needed to recreate the source environment, and (b) how to ensure that we can extend the format in the future to add the bells and whistles.

My hope is that we can initially add pretty printing functions that take only the value, and independently add environments without optional source-level type information. After we've agreed upon those things, we can add optional type information to the environment, and pass that as a second parameter to the pretty printing functions. In this way, we can continually and incrementally ship improvements to the format without trapping ourselves in a dead end by making future improvements impossible.

fitzgen commented 9 years ago

2 Are we considering non imperative languages that might be source mapped? Specifically CSS.I assume, but just wanted to be clear.

I have much less of an understanding of how compilers targeting CSS use source maps than I do of compilers targeting JS and JS debuggers consuming source maps.

My understanding is that they work pretty alright, and there were much less deficiencies than with the to-JS case. It would be great if someone who understands this subdomain really well stepped up and took responsibility for ensuring that we provide for the to-CSS needs as well.

fitzgen commented 9 years ago

I wrote a little bit about how DWARF solves this problem: http://fitzgeraldnick.com/weblog/62/

swannodette commented 9 years ago

How does this proposal address or complicate the issue of transitivity? With source maps it's currently trivial to merge transformations between distinct and unrelated JavaScript compilation technology. Once you start encoding scopes it seems to me transitivity becomes increasingly more difficult to preserve. I could be wrong about that and happy to hear that there is prior art or that this is fundamentally a non-issue.

fitzgen commented 9 years ago

How does this proposal address or complicate the issue of transitivity? With source maps it's currently trivial to merge transformations between distinct and unrelated JavaScript compilation technology. Once you start encoding scopes it seems to me transitivity becomes increasingly more difficult to preserve. I could be wrong about that and happy to hear that there is prior art or that this is fundamentally a non-issue.

The good news is that adding these scopes and bindings doesn't make it more difficult to compose a source map's location mappings, which as you point out many tools do now. A tool could easily ignore the environment information and nothing would be any different from the situation now. In general, it is a goal of future extensions to be 100% backwards compatible so that existing tooling doesn't break; that tooling just won't take advantage of the shiny new features enabled by such extensions.

If some tool in the pipeline does not modify the environment in any way, then all it need do is apply the same transformation of locations that it does to each mapping to the start and end bounds of each scope.

As far as prior art goes, unfortunately I'm not aware of anything directly related, nor are the ex-gdb folks I asked. Traditional compilers don't really have this issue because there isn't usually any post-processing (such as minification) of the resulting executables. Either libraries are statically compiled into the binary, in which case the compiler generates debug info along with the main program's debug info, or they are dynamically loaded, in which case they already have their own separate debug info.

The next closest tools are things like Valgrind and DTrace which instrument the executable with additional probes after the fact. Valgrind instruments the binary to jump to its own JIT'd code which records its traces and then jumps back to the normal program. If you want to debug with gdb while using valgrind, it actually implements its own gdb server and internally translates whatever shifted offsets happened because of the instrumentation. On the other hand, DTrace bends over backwards to avoid shifting offsets whatsoever. Neither approach seems too relevant to our discussion.

Yacc emits #line pragmas, which is fairly similar to composing source map location information, but punts on scopes/bindings. dwz is a commandline tool to compress DWARF debugging info, but doesn't actually modify the executable.

I'd be interested if you know of any bytecode instrumenters in JVM-land that both modify the environment and maintain source-level debugging of the environment. That certainly seems relevant, but I am ignorant of JVM bytecode instrumentation.

The minifier, or any other tool that takes JS for further processing and changes the environment, is the only thing that understands the changes it makes. Therefore, it would have to propogate that information via the source map, by doing some translation for each scope and binding. I've sketched out an algorithm below:

1) For each scope S:
  1.1) For each binding B in S:
    1.1.1) Parse the JS snippet for locating B's value
    1.1.2) Walk the resulting AST and create a map M mapping from old JS
           bindings that snippet relies on to their new, renamed binding
    1.1.3) Generate a new JS snippet that first defines `var <old> = <new>;`
           for each of the entries in M
    1.1.4) Append the original JS snipet for locating B's value to that
           snippet
    1.1.5) Use that new JS snippet for locating B's value in the new
           source map
  1.2) Adjust the start and end bounds of S the same way location mappings
       are adjusted during source map composition now

Note again that if the tool does not modify the environment, then it can skip step 1.1 and only do 1.2 for each scope. When this is the case, composing source maps is not harder than without scopes and binding information.

When the tool does modify the environment (eg, a minifier shortening variable names), while I wouldn't say this process is super straightforward, it is far from impossible. Furthermore, I don't see a way to encode any environment information without having some kind of process like this when composing source maps and maintaining the ability to rematerialize the source-level environment. And that's regardless of how the environment data is encoded: whether it be purely data, a 100% JS reflection API, or some custom opcode language like what DWARF has.

At the end of the day, I think the benefits outweight the drawbacks, especially because things can only get better, and not worse, if we maintain backwards compatibility.

littledan commented 6 years ago

It's great to see this investigation. The lack of encoding of variables and scopes was cited in the draft WebAssembly/source-map integration as a downside of source-maps, and it seems like there is frequent discussion of this feature in the mailing list (e.g., @rbuckton's post).

concavelenz commented 6 years ago

I've always thought that it might one day be replaced with something with a different structure more inline with a traditional debug format (a binary format) but something would need to be proposed.

On Mon, Mar 5, 2018 at 7:34 AM, Daniel Ehrenberg notifications@github.com wrote:

It's great to see this investigation. The lack of encoding of variables and scopes was cited https://github.com/WebAssembly/design/pull/1051/files#diff-8e85308ab5cc1e83e91ef59233648be2R338 in the draft WebAssembly/source-map integration as a downside of source-maps, and it seems like there is frequent discussion of this feature in the mailing list (e.g., @rbuckton https://github.com/rbuckton's post https://groups.google.com/forum/#!topic/mozilla.dev.js-sourcemap/NVuynvaFQDY ).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/source-map/source-map-rfc/issues/2#issuecomment-370457522, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMDKvkvc5D7itStxCg8HS3z-GFmcJ8Yks5tbVrXgaJpZM4Dh5o4 .

littledan commented 6 years ago

@concavelenz Now that we have the motivation from both @dschuff's WebAssembly integration and the continued widespread use of minimizers and transpilers, should we start this effort to create this new structure?

jkup commented 2 months ago

I believe this is subsumed by the current scopes proposal. If there is anything missing, we should add it as a follow up to the main proposal.

tc39 / source-map

Encode scopes and variables in source map #2