wooorm / xdm

Just a *really* good MDX compiler. No runtime. With esbuild, Rollup, and webpack plugins
http://wooorm.com/xdm/
MIT License
594 stars 18 forks source link

How to compile MDX string to JS function, rather than MDX file to JS module #23

Closed lunelson closed 3 years ago

lunelson commented 3 years ago

I just found this project and I'm excited about moving past some of the issues that seem to have stalled at @mdx-js, but I'm not sure how to work with MDX source code that doesn't come from a file.

I'm working with MDX content from a CMS, so I'm doing a two-step process:

  1. compile the MDX at the data-fetching stage to a string (requires both @mdx-js/mdx and @babel/core)
  2. receive this string as props, and use an MDXRenderer component (which I also wrote myself, but the API is modelled on gatsby-plugin-mdx), which takes the code (as well as optional components prop), to execute it as a function.

Can I do this with xdm? The documentation seems aimed at .mdx files which are imported and thus compiled to modules, but I need a component function. My MDX source strings won't contain any import or export rules, but they will contain components that I'll need to pass in through a context provider or to their components prop.

lunelson commented 3 years ago

Essentially, I think I'm asking: can I compile to a string which has no import or export rules, and just supply these in a new Function() statement, similar to this function in mdx-bundler?

ChristianMurphy commented 3 years ago

Yes. As noted in the docs https://github.com/wooorm/xdm#compilefile-options which returns javascript as a string https://github.com/wooorm/xdm#evaluatefile-options returns a runnable function/component the first parameter file can either be a virtual file or a string

lunelson commented 3 years ago

Thanks @ChristianMurphy but evaluate sounds like it will also run the entire compilation chain, including remark and rehype, in my front end. I'd like to compile in my server-side process, and only evaluate the compiled code on the client. The thing is I want to compile to a function body string, not a module body string. I also need to be able to inject components or provide them via context. Would this be possible?

wooorm commented 3 years ago

evaluate sets a _contain: true option. If you use compile and also use that semi-hidden option, you get a function body out.

lunelson commented 3 years ago

Boom! Thanks @wooorm that's what I was looking for!!

wooorm commented 3 years ago

This is a somewhat interesting use case that’s not yet support in the API, but the tools are there.

Take a look at how evaluate works: https://github.com/wooorm/xdm/blob/main/lib/evaluate.js. You don’t need to use such a splitOptions function of course, as you’re passing stuff yourself. Make sure to pass the runtime (as documented in the readme, for evaluate) to the generated function body when wrapping it. You should also probably use an AsyncFunction on the client to eval your function body: https://github.com/wooorm/xdm/blob/f9c3108bc86590b6f2ee20a5d93bc27107a8455d/lib/run.js#L10

wooorm commented 3 years ago

export does work in xdm btw. If you want imports on the client (e.g., from unpkg or whatever), that can be supported if you pass a baseUrl (see readme)

lunelson commented 3 years ago

Cool! I'll take a deeper look in to this this week. Maybe I can contribute to your README eventually, for other users who have this use-case 😄 !

wooorm commented 3 years ago

Would appreciate that!

The reason that option starts with an underscore is that I do not yet know how this interface should look. The parts are there (compile and run), but I don't know how to make an intuitive api for it (yet). So I'd appreciate your feedback

stevejcox commented 3 years ago

@lunelson i have the exact same use case coming up this week. Did you have any luck?

lunelson commented 3 years ago

@wooorm —CC @stevejcox— so for my Next.js use-case I ended up with the following two functions: the first runs in getStaticProps (which is removed from the front-end bundle), the second runs in the Page component on the received data.

A few questions and observations:

Thoughts?

// /lib/markdown.js

import { useMemo } from 'react';
import * as runtime from 'react/jsx-runtime.js';
import { useMDXComponents } from '@mdx-js/react';
import { runSync } from 'xdm/lib/run';
import { compile } from 'xdm';
import remarkGfm from 'remark-gfm';

export function compileMDXFunction(mdx) {
  return compile(mdx, {
    format: 'mdx',
    _contain: true,
    providerImportSource: '@mdx-js/react',
    remarkPlugins: [remarkGfm],
  }).then((buf) => buf.toString());
}

export function useMDXFunction(code) {
  return useMemo(() => {
    const { default: Component } = runSync(code, {
      ...runtime,
      useMDXComponents,
    });
    return Component;
  }, [code]);
}

FYI, the test Next.js page component:

// /pages/index.jsx

import { compileMDXFunction, useMDXFunction } from '../lib/markdown';
import { MDXProvider } from '@mdx-js/react';

export default function Page({ code }) {
  const MDXContent = useMDXFunction(code);
  return (
    <div>
      <MDXProvider
        components={{
          Foo({ children }) {
            return (
              <p>
                this is the foo component with <span>{children}</span>
              </p>
            );
          },
          wrapper(props) {
            return <div style={{ backgroundColor: 'lightblue' }} {...props} />;
          },
        }}
      >
        <h1>xdm testing</h1>
        <h2>rendered</h2>
        <MDXContent />
        <h2>function body</h2>
        <pre>
          <code>{code}</code>
        </pre>
      </MDXProvider>
    </div>
  );
}

export async function getStaticProps() {
  const code = await compileMDXFunction(
    `
h1 hello mdx

This is ~~some GFM content~~

<Foo>content</Foo>

  `
  );
  return {
    props: {
      code,
    },
  };
}

The resulting view:

image

wooorm commented 3 years ago

Nice!

  1. You don’t need to use the MDX provider in this case. You can pass components in directly:
// …
<div>
  <h1>xdm testing</h1>
  <h2>rendered</h2>
  <MDXContent
    components={{
      Foo({ children }) {
        return (
          <p>
            this is the foo component with <span>{children}</span>
          </p>
        );
      },
      wrapper(props) {
        return <div style={{ backgroundColor: 'lightblue' }} {...props} />;
      },
    }}
  />
  <h2>function body</h2>
  <pre>
    <code>{code}</code>
  </pre>
</div>
// …
  1. Not that important, but might help your understanding: if you really want that provider, that the value it’s set to doesn’t matter. It does still matter that it’s set tho: providerImportSource: '#', would be fine.

  2. wrapper also get components in its props, so you might want to pick that out to fix <div … components="[object Object]">

  3. I wasn't sure if I was using the right runtime

    You explicitly load import * as runtime from 'react/jsx-runtime.js';, what other runtime could it be 😅

  4. I couldn't figure out how to use the asyn run function here

    https://stackoverflow.com/questions/61751728/asynchronous-calls-with-react-usememo

  5. All together, your compile function could look like:

export async function compileMDXFunction(mdx) {
  return String(await compile(mdx, {
    _contain: true,
    providerImportSource: #',
    remarkPlugins: [remarkGfm],
  }))
}
  1. minification

    Terser seems to be able to work with estrees (https://github.com/terser/terser#estree--spidermonkey-ast), which we’re using here (through the new recma ecosystem). So it should definitely be possible to make a recma plugin that minifies using terser.

lunelson commented 3 years ago

if you really want that provider ... '#', would be fine.

Yep, that's what I figured from looking at the output, I just put providerImportSource: true 👍

wrapper also get components in its props, so you might want to pick that out to fix <div … components="[object Object]">

Good tip, I missed that one! 😄

Terser seems to be able to work with estrees...should definitely be possible to make a recma plugin that minifies using terser.

That would be a really nice addition. I'm not familiar with how this would work but maybe I'll find time to dig in to it at some point.

Anyway thanks again, this worked really well. FWIW I found a couple of caveats with Next.js, because you have to tell it to transpile ESM dependencies specifically (I had to use the next-transpile-modules package, and include both xdm and unist-util-position-from-estree in the list), and you have to be careful that you don't end up with Node packages in your client-side bundle (at first I ended up with acorn in the bundle, until I copied the runSync function out to my own file instead of importing it).

As for the API, I think the _contain option could perhaps be called asFunctionBody (?), and that for integration with certain frameworks you might exporting hooks like the useMDXFunction one that I made...although perhaps this is a bit too opinionated at this level. If you did decide to do this, you'd have to be careful about not server-side dependencies end up in the client bundle, probably good to export from a completely separate path like xdm/client, to mitigate this possibility, though in all fairness Next.js needs to resolve their non-support of ESM dependencies at this point

lunelson commented 3 years ago

P.S. Let me know, if you'd like me to contribute to the README about this use-case

wooorm commented 3 years ago

That would be a really nice addition. I'm not familiar with how this would work but maybe I'll find time to dig in to it at some point.

You can also probably use terser outside of unified/xdm. Take the string, use terser and probably configure it to support top-level return statements (if possible), and get a minified output.

Anyway thanks again, this worked really well. FWIW I found a couple of caveats with Next.js, because you have to tell it to transpile ESM dependencies specifically (I had to use the next-transpile-modules package, and include both xdm and unist-util-position-from-estree in the list)

That’s an issue that Next needs to solve. The ecosystem is moving soon (https://github.com/unifiedjs/unified/issues/121#issuecomment-780320962), and they don’t support it yet.

and you have to be careful that you don't end up with Node packages in your client-side bundle (at first I ended up with acorn in the bundle, until I copied the runSync function out to my own file instead of importing it).

RSC, which is far from ready but Next is also working on, solves this. Also sounds like a Next bug. They should be able to tree shake. (reading the rest of the comment, yep, what you said with “though in all fairness Next.js needs to resolve their non-support of ESM dependencies at this point”)

As for the API, I think the _contain option could perhaps be called asFunctionBody (?)

It definitely needs a better name. I somewhat like asFunctionBody because it describes what it does. But on the other hand I’m not sure users will understand what it means. Maybe outputFormat: 'file' | 'function-body'?

I also need to figure out how to make baseUrl work in both output formats. That’s not related to how you’re using xdm, but does relate to solving this nicely.

and that for integration with certain frameworks you might exporting hooks like the useMDXFunction one that I made...although perhaps this is a bit too opinionated at this level.

Aside: I think the function you have now is more complex that needed. You’re including 1kb of JS to get a provider, so you can do <MDXProvider components={{…}}><MDXContent /></MDXProvider> instead of the shorter <MDXContent components={{…}} />? It doesn’t make sense to me.

Also, I don’t get the useMemo, assuming you still have it. Upon some further reading, why not use useEffect such as described here: https://github.com/facebook/react/issues/14326.

Other than these two thought, I think those functions can live in userland!

lunelson commented 3 years ago

Maybe outputFormat: 'file' | 'function-body'?

How about outputFormat: 'module' | 'function' then, or outputFormat: 'module-body' | 'function-body'—since this is essentially the difference right?

You’re including 1kb of JS to get a provider

Yes I probably don't need it. I guess I was aiming for parity with existing solutions/patterns, Gatsby etc. Maybe I'll make this an option in my compiler function which defaults to false.

Also, I don’t get the useMemo, assuming you still have it.

I took this from KCD's README for mdx-bundler, he shows usage of his getMDXComponent function this way, so it seemed like a good idea. 🤷‍♂️

Upon some further reading, why not use useEffect such as described here: facebook/react#14326.

That's an interesting thought: so you mean write a hook that uses runAsync in combination with useState and useEffect? Would that allow multiple components to run compiled MDX more-or-less-concurrently with better performance then?

Other than these two thought, I think those functions can live in userland!

For sure. I'm thinking about writing a post on dev.to about this because I know this use-case is a thing for Next.js users, and there's a need for a really up-to-date solution for both file-/(module-) and string/(function-)based MDX sources.

wooorm commented 3 years ago

How about outputFormat: 'module' | 'function' then, or outputFormat: 'module-body' | 'function-body'—since this is essentially the difference right?

That’s a great idea, much better! Taking it further, how about outputFormat: 'program' | 'function-body'?

The word “program” is used by estree (the JS AST used by Firefox, Babel, ESLint, much more) to represent the whole. The difference between whether such a program is a module or a script, depends on the environment: .mjs or .cjs; type="module" or type="text/javascript" on <script> elements, and is added on that program node (as program.sourceType: 'module' | 'script')

I also think that program is explicit enough, -body is not needed there. On the other hand, function sounds like it includes function (args) { ... } or so, which it doesn’t, so I think I prefer that to be an explicit function-body.

Then the next thing to do would be to split baseUrl, which currently both turns import statements into a dynamic import() and also resolves them, into two things.

import -> import() is most useful in function-body, but because dynamic import() is available in scripts too, and assuming top-level await (stage 3 proposal) lands, then program could yield a a file that can work in .cjs files! This could either be a) outputType: 'script' | 'module' or b) importStatements: false (defaulting to true)

Then baseUrl needs to work on both import statements and dynamic import().

Would that allow multiple components to run compiled MDX more-or-less-concurrently with better performance then?

I think so. It could be its own little module. You can publish it, too 😅. It gets such a “function-body” from xdm as a code parameter, then it asynchroneously runs it. Maybe something like this: https://github.com/streamich/react-use/blob/master/src/usePromise.ts. Async is always slower than sync, but async is sometimes better. Still: I’m not a React developer.

For sure. I'm thinking about writing a post on dev.to about this because I know this use-case is a thing for Next.js users, and there's a need for a really up-to-date solution for both file-/(module-) and string/(function-)based MDX sources.

Nice! Yeah, maybe it’s a small hook. A couple lines. Then you don’t need to publish it, people can just copy-paste it in.

wooorm commented 3 years ago

For minification, I landed a PR in terser to add support for accepting and yielding our AST (ESTree).

import {compile} from './index.js'
import {minify} from 'terser'

var code = `export var Thing = () => <>World!</>

# Hello, <Thing />
`

console.log(String(await compile(code)))

console.log(String(await compile(code, {recmaPlugins: [recmaMinify]})))

function recmaMinify() {
  return transform
  async function transform(tree) {
    return (
      await minify(tree, {
        parse: {spidermonkey: true},
        format: {spidermonkey: true, code: false}
      })
    ).ast
  }
}

Yields:

/*@jsxRuntime automatic @jsxImportSource react*/
import {Fragment as _Fragment, jsx as _jsx, jsxs as _jsxs} from "react/jsx-runtime";
export var Thing = () => _jsx(_Fragment, {
  children: "World!"
});
function MDXContent(props) {
  const _components = Object.assign({
    h1: "h1"
  }, props.components), {wrapper: MDXLayout} = _components;
  const _content = _jsx(_Fragment, {
    children: _jsxs(_components.h1, {
      children: ["Hello, ", _jsx(Thing, {})]
    })
  });
  return MDXLayout ? _jsx(MDXLayout, Object.assign({}, props, {
    children: _content
  })) : _content;
}
export default MDXContent;
import {Fragment as _Fragment, jsx as _jsx, jsxs as _jsxs} from "react/jsx-runtime";
export var Thing = () => {
  return _jsx(_Fragment, {
    children: "World!"
  });
};
function MDXContent(n) {
  const t = Object.assign({
    h1: "h1"
  }, n.components), {wrapper: MDXLayout} = t, s = _jsx(_Fragment, {
    children: _jsxs(t.h1, {
      children: ["Hello, ", _jsx(Thing, {})]
    })
  });
  return MDXLayout ? _jsx(MDXLayout, Object.assign({}, n, {
    children: s
  })) : s;
}
export default MDXContent;

Note that this minifies props and such. This is not a formatter. If you also want to format, it becomes a bit more complex.

A nice alternative is running esbuild after xdm, which is super fast and can do all that too

lunelson commented 3 years ago

@wooorm thanks for this update! Interesting that you mention esbuild:I keep thinking about the best way to use this with Next.js (because of Next's poor support for ESM packages); do you think it's simpler to just use mdx-bundler in that case (it sounds like it handles the minification concern as well as others...)?

Otherwise, I was thinking of doing a package specifically for the Next.js use-case (something like "next-xdm"), which would be a Next.js plugin, exporting the webpack config but also the exports of xdm itself. I would have it built with esbuild using "node10" as a target.

wooorm commented 3 years ago

You can use esbuild both to build xdm into a CJS bundle, and to run it on the results of xdm. mdx-bundler does the last, plus provides some other things. But doing a Next-specific thing might be nice too?

vikie1 commented 2 years ago

This thread is a saviour. Congrats @wooorm and @lunelson.