withastro / roadmap

Ideas, suggestions, and formal RFC proposals for the Astro project.
292 stars 29 forks source link

Markdoc support in content collections #496

Closed matthewp closed 1 year ago

matthewp commented 1 year ago

Body

Summary

This is a proposal to add Markdoc support to content collections.

Background & Motivation

We've received multiple user requests for Markdoc support since its public launch. In fact, we've seen early community projects bringing Markdoc to robust project themes.

Markdoc is also designed to solve existing limitations of MDX in Astro.

  1. Performance suffers at scale. Unlike plain Markdown that outputs a string of HTML, MDX outputs a JavaScript module of JSX components. This requires Astro and Babel preprocessing for even the simplest MDX documents. Notably, this required a bump to our maximum memory usage when building / deploying docs.astro.build after migrating to MDX.
  2. Your content is tied to your UI. MDX can import styles and components directly, which is convenient from a developer standpoint. However, this causes issues when content needs to be reused in multiple contexts. A common example is RSS, where you may want a component-rich version from your blog and a simplified HTML output for your RSS feed.

Markdoc is built to solve (2) by separating content from the components, styles, and assets you choose to render. You can use an Astro component renderer when using on your site, Markdoc's own html renderer for RSS, and even write your own renderer to traverse Markdoc pages yourself. (1) Is something we're excited to test, requiring a thorough performance benchmark.

The content collections API was built generically to support this future, choosing format-agnostic naming like data instead of frontmatter and body instead of rawContent. Because of this, introducing new authoring formats is possible without breaking changes.

Goals

Non-goals

Example implementation

Markdoc will be introduced as an integration. To standardize our process for adding new collection teams, we may experiment with a (private) integration helper internally. This example shows an addContentEntryType hook to setup the .mdoc extension, and attach logic for parsing the data and body properties:

// @astrojs/markdoc/index.ts
export const markdoc: AstroIntegration = () => ({
        'astro:config:setup'({ addContentEntryType }) {
            addContentEntryType({
                extensions: ['.mdoc'],
                parser: '@astrojs/markdoc/contentEntryParser',
            });
        }
    }
});

// @astrojs/markdoc/contentEntryParser.ts
import parseFrontmatter from 'gray-matter';
export default {
    getEntryInfo({ contents }) {
        const parsed = parseFrontmatter(contents);
        return {
            // The unparsed data object that can be passed to a Zod schema.
            data: parsed.data,
            // The body of the data file. This should be the raw file contents with metadata (i.e. frontmatter block) stripped
            body: parsed.content,
            // (Optional) The untouched frontmatter block with newlines and formatting preserved. Used for computing error line hints.
            rawData: parsed.matter,
        }
    }
}

// astro.config.mjs
import markdoc from '@astrojs/markdoc';

export default {
    integrations: [markdoc()],
}

Example Usage

Say you've authored a collection of blog posts using Markdoc. You can store these entries as a blog collection, identically to Markdown or MDX:

src/content/
    blog/
        # Could also use `.md`
        post-1.mdoc
        post-2.mdoc
        post-3.mdoc
...

Then, you can query entry frontmatter with the same getCollection() and getEntryBySlug() APIs:

---
import { getCollection, getEntryBySlug } from 'astro:content';

const blog = await getCollection('blog');
const firstEntry = await getEntryBySlug('blog', 'post-1');
---

Users should also be free to render Markdoc contents using a Content component. This will be exposed from the render() result, and feature two props:

---
import Title from '../components/Title.astro';
import Marquee from '../components/Marquee.astro';
import { getEntryBySlug } from 'astro:content';

const mdocEntry = await getEntryBySlug('blog', 'test');
const { Content } = await mdocEntry.render();
---

<html lang="en">
    <body>
        <Content
            config={{
                variables: { underlineTitle: true },
            }}
            components={{
                h1: Title,
                marquee: Marquee,
            }}
        />
    </body>
</html>

Sharing config

This solution is flexible, but we expect users to reuse config and components across their project. For this, we will recommend creating a utility component to encapsulate that config. Here is one example that can render any blog collection entry with an {% aside /%} shortcode:

---
// src/components/BlogContent.astro
import Aside from './Aside.astro';
import type { CollectionEntry } from 'astro:content';

type Props = {
    entry: CollectionEntry<'blog'>;
};

const { entry } = Astro.props;
const { Content } = await entry.render();
---

<Content
    config={{
        tags: {
            aside: {
                render: 'Aside',
                attributes: {
                    type: { type: String },
                    title: { type: String },
                },
            },
        },
    }}
    components={{ Aside }}
/>

Now, you can pass any blog collection entry to render the result with this config:

---
import { getEntryBySlug } from 'astro:content';
import BlogContent from '../components/BlogContent.astro';

const mdocEntry = await getEntryBySlug('blog', 'test');
---

<h1>{intro.data.title}</h1>
<BlogContent entry={mdocEntry} />

See this example video for more.

Advanced use case: component prop mapping

Component renderers can also include a props() function to map Markdoc attributes and AST entries to component props. This is useful when:

This example maps Markdoc's generated data-language attribute for code blocks to the lang prop used by Astro's Code component, and stringifies the contents to HTML for use with Shiki:

---
import { Code } from 'astro/components';
import { Title } from '../components/Title.astro';
import Markdoc from '@markdoc/markdoc';
...
---

...
<Content
    components={{
        h1: Title,
        pre: {
            component: Code,
            props({ attributes, getTreeNode }) {
                return {
                    lang: attributes['data-language'],
                    code: Markdoc.renderers.html(getTreeNode().children),
                };
            },
        },
    }}
/>
delucis commented 1 year ago

Moving my comment from the discussion now that this has moved on to the next stage.


Questions about the DX of custom components

How does a developer know they are passing the right components to <Content />?

I noticed there was no autocomplete [in Ben's Loom walkthrough], and I can imagine it’s hard or impossible to statically analyse the components each Markdoc document is using to generate those types (or maybe I’m wrong and their compiler extracts that somehow!)

So that made we wonder:

How does a developer/author know they are using the right shortcodes in a document?

Is there editor tooling etc. that allow an author to know that Aside is a valid component shortcode, but Adise is not? This seems like something that the local import model of MDX/AFMD made relatively straightforward because each component usage documented the source of the component. We’re obviously intentionally moving away from that with Markdoc but curious how that DX story translates.

bholmesdev commented 1 year ago

Good questions @delucis! To address each of those:

How does a developer know they are passing the right components to ?

TL;DR: We will follow the convention of Markdoc's React renderer, where undefined component throw a runtime error. This could be server runtime (not build-time) when using SSR!

First, I should probably explain Markdoc's 2 step rendering process:

This separation offers some flexibility in how elements and Markdoc tags are mapped to components. The first option is most straightforward: a tag is mapped to the name of a component, without an HTML element to fall back on. This example maps a {% callout %} tag to a <Callout /> Astro component:

transform: {
  tags: {
    Callout: {
      render: "Callout",
    }
  }
}
components: { Callout }

If we omit a components reference, we should follow the convention of Markdoc's existing React renderer: throw a runtime error saying this component is undefined. Markdoc actually throws a cryptic React error today, and I think Astro can give a more streamlined message guiding the user to define the component correctly.

These errors will be raised at server runtime rather than build-time given our runtime rendering strategy. So, you may wonder: "how can I safely guard against undefined components?" The answer is rendering to HTML.

Markdoc supports rendering to custom elements as a baseline by passing a lowercase element by name:

transform: {
  tags: {
    Callout: {
      // Render the `{% callout %}` tag to an aside element
      render: "aside",
    }
  }
}

You can then map HTML elements to components, also by name:

components: { aside: Callout }

This falls in-line with MDX's components prop, allowing you to render any HTML element to a component of your choosing.

I'm not sure which strategy we'll recommend in our docs, since there's definitely a gray area depending on usage. For instance, always rendering to an HTML element is preferable when using RSS so we can easily render to a plain HTML string. Curious to hear your thoughts!

How does a developer/author know they are using the right shortcodes in a document?

This definitely comes down to editor tooling, which is still in its infancy for Markdoc's open source offering. There may be more robust tools used by Stripe internally that could unlock autocomplete and even validation in the future. Today, there is runtime validation to throw when using invalid tags (i.e. Adise instead of Aside) or invalid props (i.e. passing type="banana" to an Aside when only note | warning are valid).

matthewp commented 1 year ago

@bholmesdev would it be possible to create types for the markdoc components via codegen?

bholmesdev commented 1 year ago

@matthewp Hm, curious what you mean by "types for markdoc components." The VS Code extension for Markdoc doesn't have any way to hook into types currently without writing our own plugin.

delucis commented 1 year ago

Thanks for the extra detail @bholmesdev!

So if I follow the two rendering pathways correctly —

  1. With render: "Callout": if I mess up my components (e.g. I typo Calloot in my document or in my config), my SSR server crashes when a user accesses a page that contains the typo or the component that was misconfigured. This error is “late” in that I’ve already deployed the site.

  2. With render: "aside": if I typo in my document, same as above because that would still be an unknown component. If I typo components: { adise: Callout }, that would fail silently because aside would render out and Markdoc would never find an <adise> to map to Callout. This error is “late” in that it’s probably silent until you spot that it’s happened via some manual or automated testing.

I think 2. bothers me less — a configuration error seems less likely to sneak through and easier to locate/fix. Maybe it could be nice to have something like render: { element: 'aside', component: Callout } to protect against errors, but that goes against the decoupling in Markdoc I guess?

1. bothers me more because it’s an error in content (much more likely to let accidental errors through) and will only be caught when visiting the page. (Assuming SSR, I think in SSG this is all moot.)

I do wonder if during transform Astro can gather the various “render-able nodes” somehow to be able to fail fast if it detects nodes we’re not configuring for? That might kind of be heading in the direction of what @matthewp meant by “types for markdoc components”.

manzt commented 1 year ago

This is really exciting, nice work! I don't want to distract from this conversation, but I'm wondering if there is anything that can be generalized from this work to more readily support other file types for content collections/pages.

Can't seem to find the RFC I'd submitted a while ago to Astro (not sure if a repo was re-named?), but I'd recently created a custom Astro page integration for Jupyter notebooks (.ipynb) and needed to copy lots of the mdx integration internals. I wonder if there would be a mechanism by which file-types that could render to markdown/mdx and more readily hook into content collections/page integrations:

import { defineConfig } from "astro/config";
import { createMarkdownIntegration } from "astro";

const ipynb = createMarkdownIntegration({
  extension: ".ipynb",
  renderToMarkdown(contents: str) {
    // render to markdown/mdx
    // reuse Astro's markdown/mdx integrations for output
  }
})

export default defineConfig({
  integrations: [ipynb()],
}) 

Perhaps this is related to the addContentEntryType?

bholmesdev commented 1 year ago

@manzt Ah yes, definitely related to that! A secondary goal of the Markdoc integration was to experiment with a generic API to add any file extension as a content collection type. We already have a community member trying out this (undocumented!) hook for a Cooklang integration. Code may be useful to experiment for yourself! Note this API could change without warning so use at your own risk. Still very curious to here your feedback 🙌

manzt commented 1 year ago

lovely! so excited to play around with this. thanks for the response.

Princesseuh commented 1 year ago

This proposal is now in Stage 3! https://github.com/withastro/roadmap/pull/508