[idea] Disambiguate JS expressions with fenced code blocks

shawnbot commented 5 years ago

The ambiguity of "naked" JS(X) blocks in MDX is, I think, a serious issue; and it seems really hard to parse properly! What if there was a special kind of fenced code block in which JS statements were simply inlined, and which could be syntax highlighted by existing Markdown renderers? E.g.

# Buttons

```js inline
import {Box, Button} from 'my-ui-lib'

This is a button:


Then all the transform would need to do is "unwrap" the `code[inline]` nodes and hoist any imports to the top to produce something like:

```jsx
import {MDXTag} from '@mdx-js/mdx'
import {Button} from 'my-ui-lib'

export default props => [
  <MDXTag tag="h1">Buttons</MDXTag>,
  <p>This is a button:</p>,
  <Button>Click me</Button>
]

Whether or not inline is the right prop for the fenced code block "metastring" is up for debate, obviously. But have y'all considered making the imperative parts of MDX — the import and export statements, and any other JS statements that ride along with them — more explicit? I saw https://github.com/mdx-js/mdx/issues/118, and agree that requiring fenced code blocks for JSX expressions is too much; but this feels like something that would dramatically simplify the parser (and eliminate a lot of existing bugs) and "fix" rendering on GitHub.

johno commented 5 years ago

I can definitely dig this and is something that's been thought about. It comes with the benefit of being a solution for handling edge case parsing issues folks run into. When that happens, throw it in an evaluated block : ). This could even be achieved in userland today!

That said, I've personally grown pretty accustomed to the elegance/terseness of not having to wrap JSX syntax in code blocks and consider that a key feature (though there's no reason both approaches can't live together happily). Though there is overhead when there are two ways to do the same thing.

Generally speaking, the difficulty in parsing is predominantly in the JSX blocks/expression rather than import/exports.

All that said, let's think on this a bit and see what other folks think as well!

A side note: It would need more "first-class" support on GitHub, though, because it'd be impossible to discern what's a code block and what's JSX syntax for MDX since metastrings aren't currently displayed.

johno commented 5 years ago

Also, I wonder if a good start for considering this is writing a plugin to achieve this? I could hack that out sometime this week so there'd be something folks could actually try out.

That way, regardless of whether it enters core, the option is available as a remark plugin.

shawnbot commented 5 years ago

It comes with the benefit of being a solution for handling edge case parsing issues folks run into.

Yes! :boom:

That said, I've personally grown pretty accustomed to the elegance/terseness of not having to wrap JSX syntax in code blocks and consider that a key feature (though there's no reason both approaches can't live together happily).

Yeah, definitely. Since the MDX parsing isn't (typically) done at runtime, there isn't necessarily any need to keep the code lean. But this might be one way to ship a slimmer runtime that works as a more "pure" AST transform.

I guess another thing to consider is handling literal <script> tags, which GitHub just throws out for security purposes? 🤔

A side note: It would need more "first-class" support on GitHub, though, because it'd be impossible to discern what's a code block and what's JSX syntax for MDX since metastrings aren't currently displayed.

Well, maybe while we're working on semantic code analysis and navigation we can look into that 😁

wooorm commented 3 years ago

Hey oldtimers in this discussion! This is the longest currently open issue. 🙂

I’m closing this. With the recent RC for MDX 2 (see https://v2.mdxjs.com, please try it out and give feedback), we have a new AST that allows for such things. Arbitrary import/exports and expressions can be written in MDX, which are exposed as separate and detailed AST nodes. That means a plugin could walk the tree, look for these code blocks (in this example, with inline as meta), parse the JavaScript, and replace the code block with said JavaScript. I’m personally open to experiment more and perhaps land it in core, but this issue isn’t very active so perhaps it can start out in userland?

I’ve made a head start on pseudo-code for a plugin:

import * as jsx from 'acorn-jsx'
import {Parser} from 'acorn'
import {visit} from 'unist-util-visit'

const parser = Parser.extend(jsx())

const lang = new Set([
  'js',
  'jsx',
  'javascript'
])

const esm = new Set([
  'ExportAllDeclaration',
  'ExportDefaultDeclaration',
  'ExportNamedDeclaration',
  'ImportDeclaration'
])

export default function remarkMdxEvalCodeBlock() {
  return (tree) => {
    visit(tree, 'code', (node, index, parent) => {
      if (lang.has(node.lang) && node.meta === 'eval') { // Accept `~~~js eval`
        const program = parser.parse(value, {ecmaVersion: 2020})
        const esm = []
        const replacement = []
        let expression

        // Divide ESM and an expression that could yield a value.
        program.body.forEach((child) => {
          if (allowedAcornTypes.has(child.type)) {
            esm.push(child)
          } else if (child.type === 'ExpressionStatement') {
            if (expression) {
              // Two expressions? Can only eval one value, throw an error?
            } else {
              expression = child
            }
          }
        })

        // Add ESM if there was any as a subtree of an ESM node.
        if (esm.length > 0) {
          replacement.push({type: 'mdxjsEsm', value: '', data: {estree: {type: 'Program', body: esm}}})
        }

        // Add the expression if there was one.
        if (expression) {
          replacement.push({type: 'mdxFlowExpression', value: '', data: {estree: {type: 'Program', body: [expression]}}})
        }

        // Replace the code.
        parent.children.splice(index, 1, ...replacement)
      }
    })
  }
}

👋

mdx-js / mdx

[idea] Disambiguate JS expressions with fenced code blocks #701