Discussion: Supporting plugins

Barista, who can build new flavors of coffee for you.

It looks like we agree that is interesting to have a pluggable coffee compiler in the "TypeScript Output" discussion, for a health evolution of the language and the CoffeeScript ecosystem.

As I understand, the CoffeeScript source is a elaborated Jison configuration, and the compiler itself is a parser code generated by Jison. Make this generated code pluggable can be hard to do and harder to maintain.

I believe, the right path to make a pluggable coffee compiler is to have jison as a execution dependency and the compiler itself must to be build at runtime.

Well it obviously will make the coffee compiler heavier and slower.

So I propose a new build task on the Cakefile to create the "barista compiler". It will be the same CoffeeScript compiler source, but not generated by Jison, its API will have plugin related configs, as its bin's CLI will have some plugins args. And, sure, it must be shipped with a different package.json with "jison" in the "dependencies" list.

How a plugin may work

A plugin must implement methods to parameterize, extend or replace the transpiler units:

Lexer.tokenize(): Looks like it will be the harder process to become pluggable. The method itself may stay untouchable by plugins, but they must to be able to change or parameterize its specialized tokenizers (identifierToken(), stringToken(), ...) or add new ones on any position of the tokenizers call line.
grammar must be exported as it is and freely modifiable by the plugin code.
nodes module looks like the most important is already exported and can be changed by the plugin. But it need to be clear where a new collection of node classes must to be attached to be used by the extended transpiler. Some nodes.coffee helpers must to be exported to become usable by new node classes, and it must export a method to enable the plugins to replace this helpers if needed.

┌────────────────────────────────────────────────────────┐
│ barista + plugin1 + plugin2 + user-src.flavored-coffee │
└────────────────────────────────────────────────────────┘
                          ⇩
              ╔════════════════════════╗
              ║    Barista Compiler    ║
              ╟────────────────────────╢
              ║  ┌──────────────────┐  ║
              ║  │ CoffeeScript Src ├┐ ║
              ║  └┬─────────────────┘│ ║
              ║   └──────────────────┘ ║
              ║           ⇩            ║
              ║  ┏━━━━━━━━━━━━━━━━━━┓  ║
              ║  ┃     Plugin 1     ┃  ║
              ║  ┗━━━━━━━━━━━━━━━━━━┛  ║
              ║           ⇩            ║
              ║  ┏━━━━━━━━━━━━━━━━━━┓  ║
              ║  ┃     Plugin 2     ┃  ║
              ║  ┗━━━━━━━━━━━━━━━━━━┛  ║
              ║           ⇩            ║
              ║  ╔══════════════════╗  ║
              ║  ║  Cache Flavored  ║  ║
              ║  ║   CoffeeScript   ║  ║
              ║  ║    Transpiler    ║  ║
              ║  ╚══════════════════╝  ║
              ╚════════════════════════╝
                          ⇩
              ┌───────────────────────┐
              │  user-src.transpiled  │
              └───────────────────────┘

What you think? What is missed? Is there a better path? Do you know specific details that need to be described?

As an (hopefully interesting, though not useful) historical tidbit; CoffeeScript had "extensions" up until the middle of 2010. Michael Ficarra's CoffeeScriptRedux project also was meant to be (more) extensible.

So something that's been discussed over the years (particularly by @lydell if I remember correctly) has been potentially refactoring out the need for jison. Basically look at grammar.coffee and the resulting parser.js; they're not one-to-one like what you'd get from compiling grammar.coffee into grammar.js. But parser.js is basically incomprehensible, a long machine-written switch statement; we would need some way of achieving the same result while still being human-comprehensible. And on top of that, it would need to be not significantly worse performance-wise than what we have now. This is no small task, but it would enable us to provide hooks at any step of the process: the lexing, the rewriting, the parsing, and the output generation (lexer.coffee, rewriter.coffee, grammar.coffee/parser.js, and then nodes.coffee). We could also get rid of the ugly hacks we currently have for “stowing away” extra data properties like comments “through” the parser, since we currently have so little control over the generated parser.js file, but would have total control once we dropped jison.

I say this up front because I think a basic requirement of any plugin architecture is that plugins need to be able to be loaded at runtime, without necessitating a rebuild of CoffeeScript in order to execute. Look at Babel for comparison: you can add and remove Babel plugins via a configuration, and that just causes Babel to load and execute a few more functions (or not) at specified points in its flow, but the Babel code itself never changes. CoffeeScript needs to be the same way. Especially considering how easy it is to screw up the grammar, creating grammars with inconsistencies that jison refuses to build, we need to keep the CoffeeScript core static.

I haven't dug through Babel's code, but that would probably be the blueprint for us to follow. I would assume our version would be something like this:

Load source code for an input (such as a file)
Run any registered plugins to transform source code just after load
Lex the source code, including any additional lexer functions registered by plugins
Rewrite the lexed tokens, including any additional rewriter functions registered by plugins
Parse the rewritten tokens, including any additional grammar rules registered by plugins (this is why I mention dropping jison)
Generate output JS and output AST, including any additional node classes registered by plugins
Run any registered plugins to transform final output
Save/emit output

This is what I'm familiar with from other contexts as a plugin hooks model: CoffeeScript provides a method where plugins register functions with hooks. Like the first hook I mention above could be called something like onSourceLoad and if a plugin defines a function to be run for that hook, CoffeeScript runs the custom function when CoffeeScript's flow gets to that point. A robust plugin architecture would support multiple plugins each registering functions for the same hook, and CoffeeScript running them each in turn. For example, a plugin could look like this:

# This plugin adds a naughty message at the bottom of every source file,
# by defining a function to be run within the `onSourceLoad` hook.
CoffeeScript.registerPlugin
  onSourceLoad: (source) -> "#{source}\nconsole.log 'not!'"

The anonymous function passed to onSourceLoad would be registered, and CoffeeScript would execute it at that point in its process. One thing to keep in mind is that CoffeeScript.compile is synchronous, and can't become async without that being a breaking change for any downstream tools, so all plugins need to also be synchronous. That shouldn't be a problem, I don't think (most things you'd want to do, like new grammars etc., should be sync operations) but it's something to keep in mind.

Wow! Let me get my leaking noob brain back.

Jison looks like a good tool to me, however makes CoffeeScript static looks better. After my first contribution to the compiler i wold like to have something more solid in my hands. How hard will be to accomplish this decoupling?

I agree that lex/grammar functions should be sync, however limit pre and post processing to be sync may be very bad for many plugins. As this feature can push a major release, I believe it will be ok to break compatibility with CoffeeScript module users.

How hard will be to accomplish this decoupling?

The naïve approach would be to just port parser.js into CoffeeScript, and replace grammar.coffee with that; then update Cakefile to include grammar.coffee compilation like all the other files and get rid of the separate “compile parser” exception. Especially if you used a tool like js2coffee or https://github.com/helixbass/es2coffee, you could do this in an evening.

However the resulting grammar.coffee would be practically unreadable; a switch statement with over a hundred cases. A much better version would be a grammar.coffee that looks at least remotely like what we have today, where the grammar rules are defined in logically-related blocks. Perhaps this can be achieved by defining the blocks first and then using a recursive function with a long nested if statement to drill through them, where the if is ordered like the switch from most complex to least. Alternatively you could do what jison does but on startup at runtime, where the rules are parsed and organized into a data structure similar to that switch statement but in memory. The challenge here would be doing so in a performant way. I'm sure there are probably other approaches; looking at Babel might be useful here.

So the short version is that to do it right, it would be a significant challenge. It doesn't require too much CoffeeScript codebase knowledge, so if you're an experienced developer and want a big task to bite off, it's self-contained albeit challenging.

Alternatively, you could implement a plugin architecture that just avoids the grammar, at least at first. Then you could tackle the grammar refactor as a second stage, or try to find more workarounds like we've already been doing for passing data through the parser. One workaround could be adding more places where PASSTHROUGH_LITERAL is allowed and then having your plugin define new tokens that are passed through as literal JavaScript and then picked up again by the node classes and output as something other than their literal text.

jashkenas / coffeescript

Discussion: Supporting plugins #5320

Barista, who can build new flavors of coffee for you.

How a plugin may work