Proposal: Ecosystem: CoffeeScript in Prettier

jashkenas / coffeescript

Unfancy JavaScript

https://coffeescript.org/

MIT License

16.49k stars 1.98k forks source link

Proposal: Ecosystem: CoffeeScript in Prettier #4984

Closed coffeescriptbot closed 4 years ago

coffeescriptbot commented 6 years ago

From @GeoffreyBooth on 2017-11-25 07:12

An oft-requested improvement to the CoffeeScript ecosystem is support for the language in Prettier. Our own @lydell is also a maintainer of that project, so I asked him what would be required to make it happen. He boiled it down to two major tasks:

Produce a detailed abstract syntax tree (AST)

Something would need to be able to produce a JSON representation of the nodes of the abstract syntax tree (AST). An AST is a representation of all the parts of syntax of a program, like AssignmentExpression; the site astexplorer.net has great examples. You can see a simplified version of CoffeeScript’s AST by running coffee --nodes test.coffee. A fuller version can be seen by going to http://asaayers.github.io/clfiddle/ and clicking the AST tab, then one of the nodes in the tree.

Since the CoffeeScript compiler itself already has the --nodes option, it seems logical to me to extend it to produce this JSON-based output. Currently the Node API for the coffeescript module doesn’t support a nodes option, so we could add one, and have its output be plain JavaScript objects that could be JSON.stringify’ed.

That wouldn’t be the end of the job, however. We would also need to ensure that this AST output is complete, with the same amount of information as the original source code, such that you could reconstruct the original source using nothing but this AST. In the CoffeeScript compiler, some simplifications are made at the lexer stage, before the nodes get generated: numbers lose their original 0x, 0o or 0b prefix (if any), whitespace is lost in multiline strings, multiline regexes are turned into a RegExp() call, etc. These changes would need to be refactored to happen in nodes.coffee, or added detail about the node would need to be saved as a property on the node (like we currently tack on the source maps location data or comments). The goal is that this JSON representation of the source code could then be used to output new source code, formatted as Prettier deems it should be formatted. Which leads us to:

Write a CoffeeScript code generator

Once a JSON version of the AST is available, we’ll need some function that takes it as input and produces a string of CoffeeScript source code as output. You’ve probably seen one of these already: js2coffee takes an AST produced by a JavaScript parser and creates CoffeeScript source code from those nodes. The function that does this is called a code generator, and js2coffee’s is here. With dependencies, it’s over a thousand lines of code. There’s one other CoffeeScript code generator that I’m aware of, cscodegen produced by the CoffeeScriptRedux effort, but it hasn’t been updated since 2012.

Prettier is itself a code generator. If it were to support CoffeeScript, a new code generator would need to be written as part of Prettier itself. Within the Prettier codebase, the code generators for supported languages are in src/printer*.js. One code generator supports all of JavaScript plus TypeScript and Flow, and it’s plain printer.js. It’s 5,000 lines of code. Writing a similar generator for CoffeeScript might not be much simpler, but you would be able to use js2coffee and cscodegen’s codebases as reference (not to mention Prettier’s JavaScript code generator) so you’re not starting from scratch.

So . . .

I would be willing to tackle the first task, outputting a detailed JSON AST, if one or more volunteers were up for the second task. Does anyone desire CoffeeScript support in Prettier strongly enough to invest the time in writing a quality CoffeeScript code generator?

coffeescriptbot commented 6 years ago

From @GeoffreyBooth on 2017-11-25 07:15

By the way, either of these tasks are also investments in the extensibility of CoffeeScript in the future. The js2coffee project was possible in the first place because JavaScript has several excellent parsers that produce detailed ASTs. If js2coffee’s CoffeeScript code generation part could be replaced with a better code generator, js2coffee would be able to support the latest CoffeeScript features (and be more adaptable to future improvements). Coffeelint would be capable of greater things if it had a better AST to work with. And on and on.

CoffeeScriptRedux got so many things right, it’s a shame it never got completed. One of its insights was that code generation should be its own module that took an AST as input. (It supported both cscodegen to generate CoffeeScript, or escodegen to generate JavaScript.) This is also how Babel works, with babel-generator taking an AST and producing JavaScript. This modularity is one of the keys to Babel’s success, and the growth of the ecosystem around it. If the CoffeeScript compiler produced an AST compatible with Babel, the CoffeeScript compiler could outsource the JavaScript code generation to that and therefore jettison nodes.coffee’s 4,000 lines—a quarter of CoffeeScript’s entire codebase!

coffeescriptbot commented 6 years ago

From @lydell on 2017-11-25 10:41

Well summarized!

I forgot to mention that every src/printer*.js file in Prettier basically just defines a single function with a big switch statement in it – with one case for every AST node type. In other words, all the function is doing is saying “Here is how you print a number, here is how you print a string, here is how you print an array, here is how you print an if statement, etc.”. Some cases call this function recursively, such as the array case for every item of the array.

One way to go about this would be to start writing a src/printer-coffeescript.js file, and see where you bump into problems. Then, go and improve the CoffeeScript parser around all those problems.

coffeescriptbot commented 6 years ago

From @vendethiel on 2017-12-29 02:21

CSR did get a lot right, yes. We probably need Concrete Syntax Tree, but our lexer/rewriter code is... hairy to say the least.

coffeescriptbot commented 6 years ago

From @GeoffreyBooth on 2017-12-29 02:45

What’s a “concrete” syntax tree?

coffeescriptbot commented 6 years ago

From @vendethiel on 2017-12-29 11:21

https://eli.thegreenplace.net/2009/02/16/abstract-vs-concrete-syntax-trees/

A CST is very similar to an AST, but it keeps more parse information around (and doesn't remove seemingly useless nodes).

coffeescriptbot commented 6 years ago

From @GeoffreyBooth on 2017-12-29 15:01

What I was thinking we would do is generate an AST add similar to Babel's as possible. Then we can crib code from the main JavaScript code path in Prettier to parse it. It'll also be useful for working with other tools to have as “standard” of an AST as possible. We would add extra properties to the AST nodes to preserve the info we would need to generate CoffeeScript again from the tree.

Are you interested in helping tackle this?

coffeescriptbot commented 6 years ago

From @vendethiel on 2017-12-29 15:43

The project then becomes pretty much "rewrite the compiler" ... or "upgrade CSR to support all the things CS2 now supports", which is an insane amount of work. Removing JS code generation from the CS compiler is a very noble goal, but a tremendous time-consuming one.

coffeescriptbot commented 6 years ago

From @GeoffreyBooth on 2017-12-29 16:13

That's more ambitious than I had in mind. I was thinking of just doing what is proposed at the top of this thread: create a way for the compiler to output an AST as JSON, similar to how it currently outputs nodes data as text via --nodes; and create a printer file in Prettier for CoffeeScript, similar to its existing ones for JavaScript, TypeScript, Markdown and so on.

coffeescriptbot commented 6 years ago

From @vendethiel on 2017-12-29 16:22

We're discard too much info imho. Just for implicit objects etc.

coffeescriptbot commented 6 years ago

From @GeoffreyBooth on 2017-12-29 16:43

Right, that's what would need to be added to the nodes as extra info. Stuff like whether a boolean was written as true versus on or yes, etc. would all need to be added to the AST.

coffeescriptbot commented 6 years ago

From @GeoffreyBooth on 2018-01-02 06:56

Okay, I’ve taken the first few baby steps in getting CoffeeScript to produce an AST. Check out this branch, then create a test.coffee at the root of the repo with whatever CoffeeScript code you want to see an AST of, then run:

coffee -e "console.log require('util').inspect require('./lib/coffeescript').compile(require('fs').readFileSync('./test.coffee').toString(), nodes: yes), {colors: yes, depth: 10}"

You should see pretty-printed JSON like this:

{ type: 'Block',
  loc: { start: { line: 0, column: 0 }, end: { line: 0, column: 10 } },
  expressions:
   [ { type: 'Assign',
       loc: { start: { line: 0, column: 0 }, end: { line: 0, column: 10 } },
       variable:
        { type: 'IdentifierLiteral',
          loc: { start: { line: 0, column: 0 }, end: { line: 0, column: 5 } },
          value: 'answer' },
       value:
        { type: 'NumberLiteral',
          loc: { start: { line: 0, column: 9 }, end: { line: 0, column: 10 } },
          value: '42' } } ] }

This is the same output as the CLI’s --nodes, in JSON form, with:

the node class names as type, which is how the Babel AST has them
the location data structured the way Babel does it
any primitive (boolean, number, string) properties included, as these are serializable as is into JSON
children included recursively

This was all done by adding just one method on the base node class, and for many nodes this is all the data we need. For more complicated nodes, the next step is to override this method to add additional serializable properties to flesh out the objects for those nodes; and in some of those cases, we’ll have to reach back to the lexer to make sure that currently-discarded data is forwarded along into nodes.coffee. Eventually, all objects for all node types should contain complete enough data that we can recreate the original source from this AST. There’s a ways to go to get from here to there, but it’s very doable. The above took less than 50 lines of code.

coffeescriptbot commented 6 years ago

From @rattrayalex on 2018-01-06 20:59

This is exciting. I may be able to help with some of the prettier parts.

Could we write a test suite to ensure the ast being generated this way is always accurate?

coffeescriptbot commented 6 years ago

From @GeoffreyBooth on 2018-01-06 22:18

@rattrayalex I would love the help. I’ve started a branch in a local copy of the Prettier repo, I’ll clean it up and post it soon and give you access.

We can certainly write tests around this. We could add a test/nodes.coffee in the CoffeeScript repo that calls CoffeeScript.compile(someCode, nodes: yes) and compares the response to some expected object. We should probably strip out stuff like the loc keys before comparing, so the tests don’t break for reasons we don’t care about as the compiler evolves over time.

coffeescriptbot commented 6 years ago

From @rattrayalex on 2018-01-06 22:30

Babylon has a nice ast snapshot suite, might be worth checking out. But may not be needed for this.

loc information should probably be tested somewhere, but I’m not actually sure it’s needed for prettier anyway. Can’t recall for sure.

coffeescriptbot commented 6 years ago

From @GeoffreyBooth on 2018-01-08 07:39

@rattrayalex I’ve created a repo of my Prettier fork and branch here, and added you to it. If anyone else would like to contribute, please let me know and I would be happy to add you. In my fork, the default branch is coffeescript, so you can work in other branches and submit pull requests against coffeescript.

There was some major reorganization of the Prettier codebase in the last few months since I initially started my fork, so some of my work needed to be thrown out; but I had only just barely gotten started anyway. Look in the src/language-coffeescript folder, the files in there are where we’ll want to build out CoffeeScript support. See src/language-js and src/language-vue as points of reference.

coffeescriptbot commented 6 years ago

From @GeoffreyBooth on 2018-01-08 07:49

Once you’ve checked out the Prettier CoffeeScript branch and run yarn to install dependencies, you can see the CoffeeScript code path in action by creating a test.coffee at the root of the repo and then running:

./bin/prettier.js --parser coffeescript test.coffee

Currently I’m just printing the AST, but you have to start somewhere 😄. This is using my nodes branch, so updates to that branch should affect this; you’ll probably want to link the CoffeeScript module installed by Prettier to a local copy, so you can develop in the two in tandem.

coffeescriptbot commented 6 years ago

From @rattrayalex on 2018-01-29 03:29

Coming back to this, I might be able to help a bit in March but probably not earlier than that 😕 any progress in the last few weeks?

coffeescriptbot commented 6 years ago

From @GeoffreyBooth on 2018-01-29 04:30

No, but I hope to have some time before March. I’ll push commits into both branches. Feel free to work in those repos, either in the same branches or other branches we can merge into them.

helixbass commented 6 years ago

@GeoffreyBooth I just came across this issue. I came at it from a slightly different angle but have been working on something similar - I've used Prettier and eslint on ES6-based projects and started thinking about how if we targeted a compatible AST we could lean on Prettier for outputting JS and could also run eslint directly against Coffeescript source (I see the inability to use JS ecosystem toolset like eslint as one of the more legitimate reasons people might object to using Coffeescript)

If the CoffeeScript compiler produced an AST compatible with Babel, the CoffeeScript compiler could outsource the JavaScript code generation to that

So have been hacking on the backend to get it to target the babylon AST and then pass that to Prettier to generate (nicely formatted) JS (I hadn't thought of being able to use Babel for that task like you mention - haven't looked at whether it eg relies on original source for formatting?). I've got the entire Coffeescript test suite passing (with a few caveats/tweaks) when using (a minimally modified version of) Prettier to generate JS

Not sure exactly how what I've been working on relates to feeding a Coffeescript AST into Prettier (for formatting as Coffeescript code) - the basic approach I've been following is to have a compileToBabylon() method on each of our node types (similar to compileNode()) that's responsible for generating the corresponding (piece of the) babylon AST (so basically doing whatever AST transformations are necessary to get from our nodes "AST" to the desired babylon AST)

and therefore jettison nodes.coffee’s 4,000 lines—a quarter of CoffeeScript’s entire codebase!

I think there'd always be the need (if we target a JS AST for outputting JS and want to be able to eg prettier-ify Coffeescript source) for two different ASTs, one which can be used to regenerate Coffeescript source and a transformed one which can be used to generate JS output - but it's interesting to think about how being able to target a legitimate Coffeescript AST might then make the "stages" of what I've been working on more clear-cut - it already feels like I'm replacing an intermixing of AST transformations and code generation that we do in compileNode() with a more "pure" AST transformation, but it still feels like I'm intermixing "structural" AST transformations that are unwindings of our syntactic sugar with "renaming"-style AST transformations to get our node types into the right shape of an object that prettier expects

And then as far as eslint integration, I'm currently doing a rough audit of all standard eslint rules for which ones are compatible with Coffeescript source out-of-the-box (so far >75% of the ones I've looked at are, ignoring the formatting-related rules that eslint-config-prettier suppresses)

Once I'm done with that pass I'll open a preliminary PR to raise specifics about what I've implemented, what realistically would still need to be done to move forward, etc. Hopefully by the end of this week (or if you're curious you can check out my prettier branch). But it's been really fun to dig into the backend of the compiler, learn a bit more about some of the JS ecosystem AST-based tools. And now exciting to see that you've been thinking along similar lines!

GeoffreyBooth commented 6 years ago

Hi @helixbass, this is great to hear. If you could provide links to your branch, and your fork of Prettier, that would be great to have. @zdenko and @rattrayalex have expressed interest in this as well.

I think the near-term goal is just to get the CoffeeScript compiler to take CoffeeScript source code input and generate a detailed AST as output. This is what I’ve started in my nodes branch that you’re all welcome to contribute to. The output AST should have enough detail that you could then pass it to Prettier which could generate new pretty-printed CoffeeScript source; like https://github.com/jashkenas/coffeescript/issues/4984#issuecomment-366673922, but for all node types. We should focus on the AST side first though, before worrying much about Prettier, as the AST is the first step.

Refactoring the compiler such that we don’t need nodes.coffee anymore is fun, but doesn’t really solve any immediate problems. It might make the codebase easier to maintain in the long term, as it would be smaller, but that’s the only advantage that I can see. So I would make this a far lower priority than simply getting a good AST out of the compiler. (Getting a good AST is a prerequisite to removing nodes.coffee anyway.)

Once we get a good AST out of the compiler, the world opens up for ecosystem possibilities: not just Prettier, but easy integrations with linters, a new JS2Coffee, etc. The great ASTs produced by Babylon and Acorn and others are a big reason why the JavaScript ecosystem has bloomed.

helixbass commented 6 years ago

@GeoffreyBooth sure here is my Coffeescript branch which can output JS using this fork of prettier. Should be able to (from top-level coffeescript repo dir):

yarn install dependencies and run bin/cake test:prettier to run Coffeescript test suite using prettier for JS generation
eg yarn run eslint ../path/to/some/file.coffee to run eslint (using committed .eslintrc.yml to specify correct parser: and configure some rules) directly against Coffeescript source

The changes to prettier are minimal and may not all be used, they are basically just to:

slightly tweak the __debug endpoints that are currently the only way to pass an AST to prettier to format
look for a couple "overriding" properties to help with formatting of comments, which is one of the few places prettier seems to look at all at the original source code's formatting

Once I get through a pass of all eslint rules, I'll look at your nodes branch. Feels like perhaps we can come at it from both sides, for me personally eslint integration is a much bigger win than being able to prettify Coffeescript source (I'm getting a huge kick out of being able to actually run eslint against Coffeescript source, looking forward to running against the compiler source code itself once I've gotten through all the eslint rules and fleshed out that .eslintrc.yml with all viable rules). But I see the draw in being able to prettify Coffeescript source. And seems like if we're going to overhaul the backend to target a JS AST, we should include the ability to "target" a legitimate Coffeescript AST as part of that. For one thing, I haven't looked closely at coffeelint but my general thought so far was that for eslint rules that we can't just use directly, we should provide a comparable rule that works for Coffeescript source, and it'd seem cleaner if we could write those against a Coffeescript AST rather than a (transformed, eg including constructs that we generate) JS AST

Was still planning on opening a preliminary PR of my branch soon to allow for some discussion of specifics, but would you rather try and have some of that discussion here?

Eg one thing that comes to mind: it seems like having our Coffeescript AST types stick as close as possible to the corresponding JS AST types is sure to be a win in various regards, but one big question from what I've done so far is which JS AST it makes sense to target. I've been targeting the babylon/Babel AST since it seems to be Prettier's preferred parser/AST. And then I'm using an old version of babel-eslint to transform the Babel AST to the espree/ESTree AST expected by eslint (since newer versions of babel-eslint rely on babylon's estree plugin to directly generate ESTree-style AST rather than doing much actual transformation itself from Babel-style AST to ESTree-style AST). But from reading, it seems like the Flow parser (which Prettier can also use) generates ESTree, so perhaps we could just target ESTree (rather than Babel) AST throughout? Here (under Output) is a brief summary of the differences between Babel and ESTree AST formats

GeoffreyBooth commented 6 years ago

Hey @helixbass, your branch has a tremendous amount of work in it! I wish we had discussed before you got so far in, not necessarily because you did any wasted work, but so that we could potentially collaborate (and avoid duplicative work).

My nodes branch is only 100 lines of code changed, so you should probably look at it sooner than later. In particular, look at the toPlainObject method: https://github.com/jashkenas/coffeescript/compare/master…GeoffreyBooth:nodes#diff-d149b4f007169f06a8ad463f9b2f277bR271. This is essentially doing what you’re doing in a lot of the nodes, of creating an AST object to represent that node (in a generic sense, to be overridden by individual nodes that need more detail). A lot of the work you did in your compileToBabylon methods is the same work I would need to do to extend this method for specific node classes.

I don’t think Prettier and ESLint should be integrated into the CoffeeScript compiler itself. I think we should be following the general design of my nodes branch, where if you pass --nodes (CLI) or nodes: true (Node API) you get a JSON response of the AST. Then that can be passed into Prettier or ESLint or whatever other tool you want to use.

As for the tree formats, you’ve done more research into it, please advise what you think we should target. We’re going to have a unique tree format anyway because we have node types that don’t exist in JavaScript, like Existence; but we might as well follow Bablyon or ESTree’s format as closely as possible for the many other types that overlap.

helixbass commented 6 years ago

I wish we had discussed before you got so far in

@GeoffreyBooth ya I was taking the approach of wanting to have a realistic sense of how viable the Prettier/ESLint integrations were before sharing (and taking it as an excuse to do a pretty deep solo dive into the compiler backend, which I wasn't that familiar with) - just finished an initial pass through the standard ESLint rules so am pretty much at the point I was wanting to get to

In particular, look at the toPlainObject method

This is a nice way to start thinking about how to generically traverse the nodes to generate AST objects. I'm not very well-versed in tree traversal patterns (either in the compiler codebase or in general) but I can kind of picture what we're going for as far as being able to generate both a Coffeescript AST (that can be used to reconstruct/lint Coffeescript source) and a transformed JS AST, where ideally the two AST generations share code and are as simple of a transformation as possible from our nodes to the corresponding AST objects

So then one interesting direction would be to try and start aligning our node class and child naming with the target JS AST's naming figuring that that would greatly reduce the amount of custom toPlainObject() logic required. But that does beg that question of which AST format to target - I may spend some time trying to convince myself that Prettier can in fact consume ESTree, but it may not be the biggest deal if we just target Babylon for now and then switch if needed later, as the differences don't seem massive (and I've run into zero problems using the old babel-eslint for converting Babylon -> ESTree in order to feed to ESLint)

My biggest question as far as thinking about how to work on the Coffeescript AST generation is what to use as a reference point for making progress/having things "working" - with what I've been working on, there were very clear-cut things to measure against: getting existing tests to pass when using Prettier to generate JS, checking if ESLint rules run, etc. I'm not super anxious to dive deeply enough into Prettier to be able to start piecing together a Coffeescript formatting backend side-by-side with getting the Coffeescript compiler to feed it more node types to format, but if Prettier formatting of Coffeescript source is the primary/only target for the generated Coffeescript AST, then I guess that's what makes most sense

What other targets might there be? A linter? Were you envisioning that the Coffeescript compiler would itself know how to reserialize a Coffeescript AST into Coffeescript source? I was picturing that as Prettier's domain, that's sort of the beauty of what I've been working on is that Prettier doesn't care almost at all about your original formatting, so if you can feed it a correctly structured AST your job is pretty much done. To run a linter against Coffeescript AST, you'd clearly need pretty complete source location/text info (as well as a structurally correct AST)

I don’t think Prettier and ESLint should be integrated into the CoffeeScript compiler itself

I figured that the ESLint integration code would belong in a separate package/repo a la babel-eslint, I've just been including things as dependencies for now to get them wired up in some way that's useable in dev. But I wonder whether we could avoid having Prettier as a dependency (when using Prettier for outputting JS) - I see how in theory you could have Coffeescript just finish at dumping a JS AST and then having something else be responsible for wiring it up to pass it as input to Prettier, but it would seem like a big loss to not be able to just install coffeescript and be able to fully compile to JS (I guess that's assuming that we'd tear out the existing JS-generation compileNode() functionality and make the default compiling codepath be to use Prettier)

GeoffreyBooth commented 6 years ago

The compiler already generates JavaScript source code, and there are plenty of tools like Babylon that can take JS source as input and produce a JS AST as output. The only reason for the CoffeeScript compiler to generate a JS AST would be if we wanted to drop support for our compiler generating JS source as output, in order to have Prettier or babel-generator generate JS source instead.

Such an effort would be a lot of work, as you’ve already started to see. All the weird edge cases that make the current nodes.coffee 4,000 lines long would need to be accounted for. It reminds me of the discussion we had about swapping out our parser, where ultimately we decided to keep what we have, warts and all, because the alternative is something that might have endless compatibility problems for people trying to run current code.

And even if we went to all that trouble, we still need a CoffeeScript AST to do the things we want to do. You can use a JS AST to use ESLint, sure, but that will only lint the things that make sense in JavaScript. Look at some of the rules in Coffeelint, like no_implicit_braces; there’s no way to do a check like that from a JS AST. To generate CoffeeScript source, which we need to do both for Prettier and for JS2Coffee, we need AST nodes for CoffeeScript-specific things like Existence and Range.

What I mean by a “CoffeeScript AST” would be basically what you’ve generated, JSON that follows the style of the Babel or ESTree ASTs, but with a handful of CoffeeScript-specific node types included. So there would be a great amount of overlap, and maybe it would be possible to additionally have our compiler generate a fully Babel-compatible JS AST (like with extra methods to convert CS-only node types like Existence into the equivalent sets of JS nodes). But I think we need to start with the CoffeeScript AST first.

helixbass commented 6 years ago

@GeoffreyBooth I agree with a lot of that. I'd just encourage you to be open to adjusting your ideas about what the path forward might look like as a result of the work I've already done. There'd be a lot to do to get from where my branch is to production, but I've already handled the "weird edge cases" (a lot of which was able to effectively just be copied from the corresponding compileNode() logic) to the extent of having the entire test suite passing (the biggest caveat being that there would be potential breakage of weird edge uses of backticked JS syntax since I'm actually parsing it into JS AST rather than just passing it through as text directly into the output)

because the alternative is something that might have endless compatibility problems for people trying to run current code

I think there are huge differences between retargeting the backend and swapping out the frontend as far as compatibility etc

The only reason for the CoffeeScript compiler to generate a JS AST would be if we wanted to drop support for our compiler generating JS source as output

So I understand the attitude that "all we get by generating JS AST is the ability to generate JS, which we can already do", but the reality is that I'm leaning on Prettier for generating JS (more nicely than we'll ever be able to, because that's what it does) and leaning on ESLint (yes, formatting-related rules belong in a Coffee linter, but you turn those off anyway when you're using Prettier, and a lot of the big structural rules work - no-undef, no-unused-vars, no-unused-expressions, etc) today

And I'd argue that in the same way that being able to generate Coffeescript AST just seems like a good idea in terms of ecosystem possibilities, the same holds true for being able to generate JS AST (yes, you can just re-parse generated JS into an AST, but the ESLint integration only works because we generate it directly, with original location info). And targeting it has forced me to clean up a lot of the backend messiness, where we eg create Literals and shuffle things together at the level of string concatenation, into "cleaner" AST transformations

So just bear with me if I keep wanting to pursue it from where I've already gotten with it, and I'll for sure be trying to figure out how to move forward with making the Coffeescript AST a reality

GeoffreyBooth commented 6 years ago

The ESLint integration only works because we generate it directly, with original location info

That’s a very good point. That’s a good reason to have the compiler potentially output an ESLint-compatible “JS AST” as an alternative to its native “CoffeeScript AST.” We should perhaps design this “output AST” feature so that it takes an argument of which type of AST is being requested.

I’m not opposed to the compiler outputting a JS AST. I think you’ll possibly get a lot of pushback if you want to introduce Prettier or babel-generator as a dependency of CoffeeScript in order to outsource our JS source generation, as lots of people including @jashkenas feel very strongly that CoffeeScript should have no dependencies (though I’m not quite clear on why that’s so important). Up till now, at least, CoffeeScript making do with no dependencies has been a hard rule that this project has followed.

If you look at it another way, you could be tackling your effort in stages:

Get the CoffeeScript compiler to output a CoffeeScript-native AST
Get the CoffeeScript compiler to alternatively output a Babel or ESTree-style AST that can be fed as input into Prettier or ESLint
Integrate Prettier into the compiler so that Prettier does the JS source generation

Steps 1 and 2 are uncontroversial, and we can certainly work as a team to achieve them. Step 3 . . . well, either we can convince everyone to drop the “no dependencies” rule or it could be achieved via some kind of wrapper around the CoffeeScript compiler, like a new project called prettier-coffeescript that was as simple as this (psuedocode):

CoffeeScript = require 'coffeescript'
Prettier = require 'prettier'

exports.compile = (coffeeSource) ->
  ast = CoffeeScript.compile coffeeSource, {nodes: 'babel'} # Babel-spec AST
  jsSource = Prettier.format ast
  return jsSource

It’s just that your branch does all three steps at once that feels a bit overwhelming to me. That’s where I think it goes a bit far. If we can break it up into chunks, it’ll be easier both to develop and to reach consensus about.

helixbass commented 6 years ago

@GeoffreyBooth ok that makes total sense. So then I'm inclined for the moment to try and move in two directions (corresponding to steps 1 and 2) from the current state of my branch:

I think I've accepted that getting into the inner workings of Prettier such that I can start getting Prettier to spit back formatted Coffeescript source while I get the compiler to feed it more AST types is the way to move forward here. I'll look closely at what you've done so far as I'm getting started The other upside of starting to dig into Prettier is that the biggest remaining chunk of work that I'm aware of to legitimately use Prettier for JS generation is that it needs to be able to generate sourcemaps itself (thus far I've gotten our sourcemap-related tests passing using the hacky approach outlined here). It's clear from the comments in that and some other Prettier issues that they're open to implementing sourcemaps and other things needed for Prettier to function as the code-generation part of a toolchain, but that it's not necessarily a high priority - so being prepared to implement it myself (inside Prettier) would be the best starting point
means a period of coexistence of having compiler-internal JS-generation code and ability to output JS AST. I haven't been actively trying to share code between compileNode() and compileToBabylon(), as I figured compileNode() might disappear. So there's a lot of duplication there currently. Rather than try to refactor those to live side-by-side (which I think would be pretty messy to maintain regardless of how successfully the duplicated logic could be shared), I'd be inclined to try and more or less feed the one into the other. Not sure exactly the type of code restructuring that'd be involved in delineating these separate "phases", but like I said in my previous comment I've basically cleaned up a lot of the JS-generation logic into stricter node transformations, so seems like you should basically be able to go (initial nodes tree) -> (transformed nodes tree that's shaped like JS output) which can generate either (internally-generated JS) or (JS AST). While that approach would involve significantly more reworking of the existing JS-generation code, it seems much cleaner

I know you will likely push to separate out step 1 from step 2. That may well be how it lands as far as getting stuff merged. But for one thing I think there's actually significantly more trickiness in achieving a full-blown Coffeescript AST (since as has been alluded to it'd seem to involve a lot of restructuring of how we currently pass/consume/transform stuff through the rewriter/parser/node initialization in order to achieve full "preservation"/reconstructability of original source) than there has been in retargeting to a JS AST, where I've had to do a bit of pushing through/preserving additional source info (eg turning on source location ranges, which are expected by the JS AST formats) but for the most part have been fine using what was already available to the initialized nodes tree

GeoffreyBooth commented 6 years ago

This could also be broken down into even smaller chunks. See https://github.com/jashkenas/coffeescript/issues/5019#issuecomment-376751225, where we discuss moving as much of the string logic from the lexer into the nodes. Currently the lexer is doing things like replacing newlines with \n escapes, so it’s not possible to output ES2015 template literals with real newlines in them as opposed to quoted strings with escapes, like "line 1\nline 2". If we ever want to be able to output prettified CoffeeScript source that matches the original string input, we need to preserve everything about the original string, including newlines. Ditto for outputting an AST node that has the newlines as the user originally typed them.

And that’s just one node type. This kind of cleanup work is even more of a prerequisite, and can happen in small PRs (one for strings, one for classes, etc.). It’s probably better that it does, and that in general we have lots of small PRs rather than one giant one, so that it’s easier for others to review.

helixbass commented 6 years ago

A little progress update: reached a nice milestone (with the help of #5079) that I can reformat the whole Coffeescript test suite (ie tests/*.coffee) with Prettier and the tests still pass. That should give you a sense of the coverage of language constructs as far as outputting usable AST nodes. And they're formatted rather nicely (sections below ~~~~~~~~ lines are the reformatted versions)

This is using my prettier-plugin-coffeescript repo and prettier Coffeescript branch

In order to achieve rules like "call parens are optional if the enclosing parent breaks", eg this being ok:

f(
  g h
  i
)

but this g() requiring parens:

f(g(h), i)

, I had to introduce a new formatting primitive to Prettier (opened PR). But with that primitive in place, I'm able to have a rather sophisticated awareness of eg when it's ok to omit parens/braces, which I see as imperative for an opinionated Coffeescript formatter

The biggest remaining chunk of both AST generation and Prettier formatting is comments

reubano commented 4 years ago

@helixbass I created a CLI version of your plugin.

GeoffreyBooth commented 4 years ago

Done via 2.5.0 and prettier-plugin-coffeescript. Amazing work @helixbass! 🎉 🎉 🎉

vendethiel commented 4 years ago

Great work! That's amazing.

helixbass commented 4 years ago

@reubano cool!

Now that v2.5.0 is released, you can bump the dependencies per the updated README (I released as v0.1.4 of the Prettier plugin)

mrmowgli commented 4 years ago

Seriously awesome work! I've been following along and I know a ton of effort has gone into making this happen.

Thank you @helixbass and @geoffreybooth and everyone who has contributed to this release!

What a great way to start off the new year! 🍾🍻🎉

On Tue, Dec 31, 2019, 10:37 PM Geoffrey Booth notifications@github.com wrote:

Done via 2.5.0 https://coffeescript.org/#2.5.0 and prettier-plugin-coffeescript https://github.com/helixbass/prettier-plugin-coffeescript. Amazing work @helixbass https://github.com/helixbass! 🎉 🎉 🎉

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jashkenas/coffeescript/issues/4984?email_source=notifications&email_token=AAOLVRYFF5WTD22COTBKKUDQ3Q223A5CNFSM4ERJ2F62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH465BQ#issuecomment-570027654, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOLVR7EGCU37WU4SVRAPGLQ3Q223ANCNFSM4ERJ2F6Q .

ghost commented 3 years ago

It would be really great if CoffeeScript emitted an ESTree format AST like js2coffee/js2coffee@47d3159 but with extra types. The js2coffee implementation actually provides some guidance on how this could be done. Would make it much easier to convert CoffeeScript to TypeScript, JSONC, YAML etc.

For example, the following node types are exclusive to estree's CoffeeScript parser:

"CoffeeDoExpression" | "CoffeeEscapedExpression" | "CoffeeListExpression" | "CoffeeLoopStatement" | "CoffeePrototypeExpression"