mdx-js / mdx

Markdown for the component era
https://mdxjs.com
MIT License
17.72k stars 1.14k forks source link

Babel or estree? #1384

Closed wooorm closed 3 years ago

wooorm commented 3 years ago

Subject of the discussion

With https://github.com/mdx-js/mdx/pull/1382, we now have a JavaScript syntax tree.

The tree starts out in estree: as markdown + mdx.js is parsed simultaneously, I needed a JavaScript parser in micromark-extension-mdxjs, and I chose a small and fast one: acorn. Which comes with estree. Acorn is small, 30kb minzipped. acorn-jsx is 4kb. astring (a generator) is also 4kb.

Previously, in this project, we used Babel for plugins. Babel is giant. @babel/core, which has methods to run Babel plugins, is like 220kb minzipped. @babel/generator is 63kb. @babel/parser is 60kb.@babel/traverse is 165kb (it includes both the parser and the generator).

Estree has the drawback of being a fragmented ecosystem: there are no nice parsers that support comments; there are no tree-wakers or compilers that support JSX. And importantly, as as we use JSX, we’d want to turn JSX into function calls (React/preact/vue), but those are all Babel plugins. We could use estree but then users would still need to run Babel afterwards.

Babel has the drawback of being giant and slow. But the good thing is that the JSX -> JS compilers all live there.

Problem

What should we go with? We can’t turn JSX -> JS unless we’re using Babel (well, we could, the babel plugin to turn JSX -> _jsx() / React.createElement is 800l). Most users probably want to use Babel plugins to turn their fancy features into whatever. An estree-only system as a base for MDX would be ✨✨✨. @mdx-js/runtime is now 350kb minzipped. That could go down to 100kb or less?

ChristianMurphy commented 3 years ago

Estree has the drawback of being a fragmented ecosystem: there are no nice parsers that support comments; there are no tree-wakers or compilers that support JSX

ESLint's parser and walker have solid ESTree + Comment + JSX support https://github.com/eslint/espree https://github.com/eslint/eslint-visitor-keys

Prettier has espree with Comment + JSX support for code gen https://github.com/prettier/prettier/blob/902d524d2f1776efe0b110c1a24813d4d7fcb9d0/src/language-js/printer-estree.js escogen is close to having ESTree + JSX support https://github.com/estools/escodegen/pull/391

ChristianMurphy commented 3 years ago

Coming from the perspective of personally using MDX more as a build tool than as a runtime component, and liking both using proposals and typescript features. I'm drawn more towards babel, having the ability to parse new syntax, having the option to support typescript syntax, and the broad support for babel within node/javascript tools are a draw. Because of mostly using it as a build tool, bundle size is less of a priority for me.

If we have to pick just one, I'd lean babel.

That being said, do we need to pick just one? Could the JavaScript parsing strategy be made pluggable?

ChristianMurphy commented 3 years ago

Offering another consideration, if bundle size is the primary goal. Acorn may not be the smallest option, wasm can pack smaller than JS, for example https://bundlephobia.com/result?p=@swc/core@1.2.40 and still allows for custom transforms if needed https://swc.rs/docs/usage-plugin or other estree like javascript based parsers such as https://github.com/meriyah/meriyah and https://github.com/KFlash/seafox

/cc @ChristopherBiscardi since this approach has some potential tie ins to https://github.com/mdx-js/rust


edit: correction bundlephobia ignores wasm, the library may be faster, but it is not smaller https://unpkg.com/browse/@swc/wasm@1.2.40/

johno commented 3 years ago

Thanks for all this research folks! I'd lean towards something smaller than Babel but I'm not very opinionated there. There are lots of client-side usages of MDX that won't go away, and Babel is pretty huge and pretty slow in comparison to other options. Considering we're mostly only using Babel for internals we could port it away without users really needing to know the difference.

Also, with wooorm's new JSX parsing, we can drop a bunch of the internals we use and manipulate the AST directly!

ChristopherBiscardi commented 3 years ago

@ChristianMurphy I definitely wouldn't hold up any changes here based on the work in /rust. If our priority is small, then wasm is probably not the answer at the moment. swc is what I'm planning to use for /rust's js parsing and we could invest there more in the future but it's not a solution for today's in-browser use cases IMO.

that said, swc is hella faster than babel in my experience from working with it in toast (via the Rust APIs), and will work well for node-backed stuff if we're looking for a speed boost at some point in the future (TBD, caveats apply, /rust is an experiment, etc)

wooorm commented 3 years ago

ESLint's parser and walker have solid ESTree + Comment + JSX support [...] escogen is close to having ESTree + JSX support [...] — @ChristianMurphy

espree seems to be a tiny wrapper around acorn and acorn-jsx 🤔 And a year old stalled PR is not really “close” 😅 Those visitor keys are great btw! Especially as espree is ± the same ast as acorn + acorn.jsx!

Porting our internals from Babel to estree is not a lot of work. Three small plugins: https://github.com/mdx-js/mdx/blob/68ff02c8129e2922f48b59bf51f4b967d248f397/packages/mdx/mdx-hast-to-jsx.js#L6-L8.

For a nice JSX serializer, we could look into adding that to either escodegen/astring/or whatever else is nice. But as we’re thinking of compiling JSX away, that’s not needed. Rather, forking babel-helper-builder-react-jsx-experimental for estree seems to be the way to go (not sure about Vue though...).