musictheory / NilScript

Objective-C-style language superset of JavaScript with a tiny, simple runtime
Other
50 stars 5 forks source link

Architecture: Modify the AST directly #44

Open iccir opened 9 years ago

iccir commented 9 years ago

The oj compiler currently modifies lines of code directly (via the Modifier class), rather than operating on the AST. This is due to historical reasons: before source maps became widely supported, I relied on preserving line numbers between oj source and the generated js source to aid debugging.

Now that source maps are common, and escodegen supports them, the compiler should modify the oj AST directly: converting the oj AST into an ECMAScript 5 AST. We can then pass that AST into escodegen to generate the resulting js source.

We can also pass that AST into ESLint (see #40) or 6to5 (http://6to5.org) without having to go through a codegen/reparse pass. More importantly, we can expose the AST to clients and allow them to write custom modifiers before passing into escodegen.

The oj->TypeScript code generator will still be string-based, as the TypeScript compiler doesn't allow AST input, and there is no real specification for a TypeScript AST.

The long term plan: 1) modifier.js and generator.js are only used by the typechecker. 2) A new replacer.js (or transformer.js ?) is responsible for replacing/transforming the oj AST to a js AST.

Strategy: 1) Don't touch modifier.js and generator.js 2) Clone generator.js to replacer.js. Add debug option to use the replacer rather than the generator 3) Command line tool to spin up ojc with the generator, then with the replacer, and then compare the output 4) Ensure the outputs of our source base (and AL's source base) are functionally identical. 5) Remove ES5 output from generator

iccir commented 9 years ago

This may take awhile, development will occur in the ast branch

iccir commented 8 years ago

One major issue here is that the TypeScript AST is not the same as the Esprima AST. Hence, we still would have to have generator.js around and maintain two very different code paths.

I think this is out of scope for 2.0.

IngwiePhoenix commented 8 years ago

What does the TS AST have to do with the Esprima JS AST? just to make sure I get this right.

iccir commented 8 years ago

There was a lot of frustration that occurred in the hours leading up to that comment :) None of this is infeasible from an implementation standpoint, it's an architectural question of "what's the simplest thing to do that accomplishes our goals" and "how much work should ojc do, how much can we leverage existing ES tools".

I'm hesitant on the long term plan of having two very different paths (one AST based for ES5/6 generation, one string based for type-checking). Right now, generator.js is used by both paths. Although, over time, the number of if (language === LanguageTypechecker) checks inside of generator has grown. Ideally, everything would be AST-based and use the ESTree specification. TypeScript would use an superset of the ESTree specification. transformer.js would be used for both paths.

There's nothing wrong with the current method of generation. It just results in string output, and that string output needs to be reparsed by Babel/Uglify/ESLint/JSHint/etc.

My original comment of: "More importantly, we can expose the AST to clients and allow them to write custom modifiers before passing into escodegen." is still true, but it's also possible to accomplish the same by sending the string output of ojc into Babel and writing a Babel transformer.

IngwiePhoenix commented 8 years ago

I see what you are getting at. In fact, many people and modules seem to use a similar technique. For instance, WebPack 2 knows that Babel emits ES6 and does some more reading to do dead-code stripping - but it doesn't really pick up the AST. So, the concept is similar.

So you used TypeScript for the Type Checker?

iccir commented 8 years ago

Yep, generator.js will spit out TypeScript for the Type Checker. Note that this TypeScript doesn't actually work (don't run the output of tsc), it's just used to get warnings/type hints.

iccir commented 8 years ago

Another issue with the AST approach: JSHint integration relies on its input being a string, as JSHint uses it's own JS parser.

iccir commented 8 years ago

Modified strategy: 1) transformer.js is used for both the ES5 and TypeScript generation. 2) The output TypeScript AST will have a few non-ESTree-standard node types, for dealing with type annotations and casts. 3) The TypeScript AST will be handed into escodegen, which will output a JS string and a sourcemap. 4) We'll use this sourcemap to map errors from tsc back to the original line number.

IngwiePhoenix commented 8 years ago

Sounds solid to me. But have you taken ES6 into account, as in, how big will the efford be to change the ES5 handling to be ES6 capable? Otherwise there doesn't seem to be any problem.

iccir commented 8 years ago

The beauty of this (combined with procrastination) is that TypeScript/escodegen/ESTree/estraverse all handle ES6 now :)

IngwiePhoenix commented 8 years ago

Okay, that super-long name-chain convinced me ;) But hey, that’s quite the good news! Can’t wait to get started using OJ with ES6.

iccir commented 8 years ago

Sadly, escodegen with source maps enabled is incredibly slow (~2 seconds to generate our source bases, compared to ~200ms with 1.1's generator/modifier). That's going to be a deal-breaker for us (at least for now).

The problem may be with source-map rather than escodegen (or specifically, escodegen's approach of wrapping the Array IR with source-map's SourceNode and then having source-map generate the string). I should recheck this once escodegen implements their Dumper (https://github.com/estools/escodegen/issues/214).

iccir commented 8 years ago

(Note: this shouldn't prevent you from using ES6 with OJ. Using the AST vs. using the modifier/compiler has always been about avoiding additional parse/generate steps. Right now you can still use ES6 in the 2.0-wip branch, and you can also pass the result to Babel to transpile back to ES5).

IngwiePhoenix commented 8 years ago

I see. Well, I'd rather wait for a stable 2.0 release so I can update my oj-loader or the extended version, OhSoJuicy with ES6 support.

But I will get to test oj-2.0-alpha soon to see how it'll go. :)

iccir commented 8 years ago

Without this, I can't think of any reason why I would need to bump semver major. So the next release will probably be 1.2 (with Esprima 2.7 et al.)

IngwiePhoenix commented 8 years ago

That works for me. :) Ill I wish is to just make sure that when I update my modules with the new support, that I don't run into too many bugs :)

iccir commented 6 years ago

I've been pondering this issue again recently. There have been speed improvements to the source-map project. We could also use babel-generator rather than escodegen.

However, one glaring issue is that oj has to output Typescript in addition to JS. The current string-based solution of Modifier.js allows us to easily do this.