withastro / roadmap

Ideas, suggestions, and formal RFC proposals for the Astro project.
311 stars 30 forks source link

Content Layer #982

Closed ascorbic closed 1 month ago

ascorbic commented 3 months ago

Summary

Creates a successor to content collections with expanded use cases, including remote data sources and improved performance.

// src/content/config.ts
import { defineCollection, z } from "astro:content";
import { glob, file } from "astro/loaders";
import { feedLoader } from "@ascorbic/feed-loader";

// Loaders can be defined inline 
const countries = defineCollection({
  loader: async () => {
    const response = await fetch("https://restcountries.com/v3.1/all");
    const data = await response.json();
    // Must return an array of entries with an id property, or an object with IDs as keys and entries as values
    return data.map((country) => ({
      id: country.cca3,
      ...country,
    }));
  },
});

// Loaders can also be distributed as packages
const podcasts = defineCollection({
  loader: feedLoader({
    url: "https://feeds.99percentinvisible.org/99percentinvisible",
  }),
});

// The `glob()` loader loads multiple files, with one entry per file
const spacecraft = defineCollection({
  loader: glob({ pattern: "*.md", base: "src/data/spacecraft" }),
  // A schema is optional, but provides validation and type safety for data.
  // It can also be used to transform data before it is stored.
  schema: ({ image }) =>
    z.object({
      title: z.string(),
      description: z.string(),
      heroImage: image().optional(),
    }),
});

// The `file()` loader loads multiple entries from one file
const dogs = defineCollection({
  loader: file("src/data/dogs.json"),
  schema: z.object({
    id: z.string(),
    breed: z.string(),
    temperament: z.array(z.string()),
  }),
});

export const collections = { spacecraft, dogs, countries, podcasts };

Links

krishna-santosh commented 2 months ago

@ascorbic can we define a schema when fetching data from a remote URL?

ascorbic commented 2 months ago

@krishna-santosh yes. You can either define it at the top level (inside the defineCollection object) in the same way as now, or a loader can define it. In that case the loader can generate it dynamically, e.g. by introspecting the API.

simonswiss commented 2 months ago

Multiple entries per file is great — but how about multiple files per entry?

There are many cases where an entry has a heroIntro paragraph which wants to be a rich text field, as well as the main entry body.

The ability to store multiple files part of the same entry would be great!

Alternatively, storing other Markdown (Markdoc/MDX) fields in the frontmatter (we do this with Keystatic's markdoc.inline field) and providing a similar .render() method on those would be awesome.

Apologies if this is not the right place for this feedback 🤗

joelvarty commented 2 months ago

Let me know if this is the wrong place for this, but I wanted to provide feedback on what I feel would be important for the success of this layer from a CMS perspective:

Dependancy Tracking

Function Support

ascorbic commented 2 months ago

@joelvarty yes, we'd love to do dependency tracking at some point. We are in a good place to do that, but it's not in scope for this version.

The data is read-only at runtime, and even though it's implemented as files on disk it's loaded as a virtual module, so it's all handled by rollup and works on serverless runtimes. I haven't tested it with Cloudflare yet though, so we'll need to ensure that works.

joelvarty commented 2 months ago

@ascorbic It sounds like we would need to do a build to get new data into the project. Am I correct in that?

ascorbic commented 2 months ago

@joelvarty Yes, there's no way to update a deployed site without a build. Locally you can sync by running astro build, astro sync or when running astro dev you can use the s + enter hotkey

Suven commented 2 months ago

@ascorbic there is no chance of exporting globalContentLayer right? I am generally doing SSG, but am currently using an instance of npm run dev for live-previews of the CMS. If I had access to the global instance, I could do some trickery like resyncing every n pageviews or all X seconds.

ascorbic commented 2 months ago

@Suven No (and please don't try: you will break things), but I will be exposing a refresh method in astro:server:start for integrations to use

ascorbic commented 2 months ago

I have a proposed addition to the RFC, which I'd welcome feedback on.

Integration support for content layer

The is a proposal to add support for syncing the content layer to integrations. It would allow integrations to trigger a sync during dev, optionally of just certain loaders. It would also allow them to pass metadata such as a webhook body to the loader

Use cases

API

Adds a syncContent function to the astro:server:setup hook options, with the following signature:

async syncContent(options: { 
   loaders?: Array<string>,
   context?: Record<string, any>
})

loaders is an optional array of loader names. If set, only those loaders will be synced. This allows integrations to selectively sync their own content.

context is an optional object with arbitrary data that is passed to the loader's load function as syncContext.

Usage

This shows an integration that creates a refresh webhook endpoint:

export default function() {
    return {
        name: '@astrojs/my-integration',
        hooks: {
            'astro:server:setup': async ({ server, refreshContent }) => {
                server.middlewares.use('/_refresh', async (req, res) => {
                    if(req.method !== 'POST') {
                      res.statusCode = 405
                      res.end('Method Not Allowed');
                      return
                    }
                    let body = '';
                    req.on('data', chunk => {
                        body += chunk.toString();
                    });
                    req.on('end', async () => {
                        try {
                            const webhookBody = JSON.parse(body);
                            await refreshContent({
                              // Include the parsed request body. `webhookBody` is an arbitrary name
                              context: { webhookBody },
                              // Only refresh a particular loader
                              loaders: ['my-loader']
                            });
                            res.writeHead(200, { 'Content-Type': 'application/json' });
                            res.end(JSON.stringify({ message: 'Content refreshed successfully' }));
                        } catch (error) {
                            res.writeHead(500, { 'Content-Type': 'application/json' });
                            res.end(JSON.stringify({ error: 'Failed to refresh content' }));
                        }
                    });
                });
            }
        }
    }
}

Inside the loader:

import { type Loader } from "astro/loaders"
export function myLoader(): Loader {
  return {
    name: "my-loader",
    load: async ({ store, logger, syncContext, meta }) => {
      if(syncContext?.webhookBody?.action) {
        logger.info("Received incoming webhook")
        // do something with the webhook body
      }
      // this is a normal sync...
    }
  }
}

Questions

How should we handle the case where a sync is already in progress? Should it be queued? Should it be skipped?

matthewp commented 2 months ago

Outside of you using refreshContent in one of the examples (which I kind of like better...) I think the idea sounds reasonable and astro:server:setup is the place to do it.

ascorbic commented 2 months ago

Yeah, maybe refreshContent is better!

modestaspruckus commented 2 months ago

Hello, nice feature Regarding this The store is persisted to disk between builds, so loaders can handle incremental updates.

Can we control the path and what is the location of store in disk? We'd like to mount it as PVC on k8.

florian-lefebvre commented 2 months ago

It's currently stored to the cacheDir (configurable)

NotWoods commented 2 months ago

Is it possible to set up deferred rendering by passing a function/promise to rendered? The current deferredRender property seems like it's designed around local files, but loaders that pull from a third-party API could get a performance boost by just fetching metadata then fetching the HTML only when needed.

ematipico commented 2 months ago

Is it possible to set up deferred rendering by passing a function/promise to rendered? The current deferredRender property seems like it's designed around local files, but loaders that pull from a third-party API could get a performance boost by just fetching metadata then fetching the HTML only when needed.

Did you have anything in particular in mind? The deferred renderers are designed around virtual modules, so you can pull your content from them?

Suven commented 2 months ago

That proposal sounds great! Regarding your question: I guess triggering a sync while syncing is very likely in CMS-Preview-contexts as the user is continuously making changes. In those I guess the user is only interested in the latest version of his change and this, cancelling the current sync and starting a new one would make sense.

NotWoods commented 2 months ago

Is it possible to set up deferred rendering by passing a function/promise to rendered? The current deferredRender property seems like it's designed around local files, but loaders that pull from a third-party API could get a performance boost by just fetching metadata then fetching the HTML only when needed.

Did you have anything in particular in mind? The deferred renderers are designed around virtual modules, so you can pull your content from them?

The virtual module API is a lot of overhead if you're not working with files on a file system. I'm writing a loader for the Notion REST API and it feels weird to make a virtual module per API call. After chatting with folks on the Discord I get the impression virtual modules isn't design for this use case.

From an ease-of-loader-implementation perspective a function that can use dynamic imports feels much simpler. I don't know if that makes it harder to optimize the file loader use case.

werfred commented 2 months ago

Hi everyone,

I recently started using Astro and wanted to achieve SSG with caching so that pages wouldn't rebuild if nothing had changed. I have an API where all the data comes from, and before the ContentLayer feature, I noticed that Astro has a content collection cache. So, I generated a bunch of JSON files for every blog post into a collection. However, unfortunately, it still rebuilds every page during each build and says: 'generating static routes.'

I then converted my collection to try using ContentLayer, but it seems to work the same way. Am I missing something? I was expecting it to NOT rebuild existing pages if the corresponding JSON file hadn't changed.

Sorry if this isn't the right place to ask about it.

yeehaa123 commented 2 months ago

Maybe I'm doing something wrong, but references don't seem to work with this API. Is this correct? Is this something that will be implemented down the line? Is there any way I can help out?

ematipico commented 2 months ago

@werfred this isn't the right place, please come to discord and open a new help thread

twodft commented 2 months ago

I'm wondering if the render also works for remote MDX, if not, how can I implement my own loader to correctly render our remote MDX content from APIs

ascorbic commented 2 months ago

Maybe I'm doing something wrong, but references don't seem to work with this API. Is this correct? Is this something that will be implemented down the line? Is there any way I can help out?

They should work in the same way as existing collections. If you can't get the working, can you ask in Discord

matthewp commented 2 months ago

@ascorbic what happens if there's a conflict between the schema the loader is providing and the schema the user provides in defineCollection?

ascorbic commented 2 months ago

@matthewp the user-defined config will override any loader-defined one

ascorbic commented 2 months ago

I've added a section on integration support to the RFC, and have a PR with an implementation.

HiDeoo commented 2 months ago

Tiny feedback on the Content Layer loader API, not sure if it's on purpose or not in this context: if you throw an AstroError with an hint in a loader, the hint is never displayed to the user. I think it would be nice to display it, e.g. to help user with obvious configuration mistakes, etc.

ascorbic commented 1 month ago

I've made a few small changes to the RFC, with the only API change being that the type for the data store is now DataStore. We're now ready for a call for consensus, with a goal to ship this as stable in 5.0. If you have any final comments on the RFC please make them here now. This process will last for a minimum of three days before this PR is merged. Please make any bug reports in the main astro repo.

There will be follow-up RFCs for future features, particularly the libSQL backend.

jcayzac commented 1 month ago

Some user with a lot of content recently asked about it on Discord, if I recall correctly: how about making the loaders generator functions that yield entries, rather than async functions that have to return entire collections?

Edit: actually, anything that returns an AsyncIterable would do the trick?

hfournier commented 1 month ago

The comment on the built-in file loader example is:

// The file loader loads a single file which contains multiple entries. The path is relative to the project root, or an absolute path.
// The data must be an array of objects, each with a unique `id` property, or an object with IDs as keys and entries as values.

So, a data structure that may already have some other unique identifier, must also have an id property, which seems redundant. It results in data that looks like this:

[
  {
    id: 'abc',
    data: {
      id: 'abc',
      myUniqueId: 'abc',
      name: 'Abc'
    },
    filePath: 'src/myFolder/myData.json',
    collection: 'myData'
  },
  ...

with 3 properties with the same value. Would it be possible to add an optional 2nd param to the file() loader that indicates which property to use as the id? This would eliminate one redundant property and not require existing json files to be altered with an additional id for each entry.

ascorbic commented 1 month ago

@hfournier that's an interesting idea. I'm not sure about having it as a second argument, but it is something I'll look at

ascorbic commented 1 month ago

This is now included in Astro 5 beta

jurajkapsz commented 1 month ago

Not sure if this is still the right place to give feedback - I've followed a link from docs - as Astro 5 beta recently came out, but I've tried out this API and my page styles were afterwards rendered somehow broken. I have the latest Astro v4 release v4.15.6 and I use mdx in content collections.

Does this have something to do with @astrojs/mdx, which is mentioned in Astro 5 beta docs to be of v4, while with Astro v4 its v2.3.1?

Looking forward for this API.

ascorbic commented 1 month ago

@jurajkapsz can you open an issue on the main astro repo please. I'm investigating styles with MDX in content layer at the moment, and it would be helpful if you had a minimal reproduction of the problem

jurajkapsz commented 1 month ago

@ascorbic OK, will do, atm I am doing some tests on my side to better understand what happened and to eventually write a more precise bug report.

I've noticed that what visually broke pages after switching to the new Content Layer API render was a different final order of processed CSS styles, which changed style cascade, making eg CSS BEM modifiers unusable.

I happen to have certain style definitions in one component and modifying style definition of given component in another component. I'd say it is correct to have it that way, but I give it a second thought; anyhow it worked before, and the final styles where somehow in correct order.