bholmesdev commented 1 year ago

Start Date: 2022-10-03
Status: Draft

Summary

Content Collections are a way to fetch Markdown and MDX frontmatter in your Astro projects in a consistent, performant, and type-safe way.

This is paired with Render Content, a solution to script and style bleed when importing globs of Markdown and MDX Content.

Both proposals are presented below, and are meant to be reviewed and accepted as a pair.

https://user-images.githubusercontent.com/51384119/200002517-c7871c22-74fc-4931-a029-ccb62d8c3e4e.mp4

Links

Content Collections Full Rendered Proposal
Render Content Full Rendered Proposal

natemoo-re commented 1 year ago

Overall I am super thrilled by this, great work!

I have one very nitpicky bikeshed, which is that the ~schema.ts file probably doesn't need the ~ prefix? If we do want to use a prefix, perhaps _schema.ts is better?

JLarky commented 1 year ago

I like the changes that you described in the video. The part that is unclear for me right now is if zod stuff is optional or not.

The ~schema.ts and .astro remind me autoimporting in Vue, so I actually wonder if you can generate schema from frontmatter automatically? At least in a "field X exists in at least one file, let's call it x?: any or like you can do with cli in rails rails g model NameOfModel column_name:datatype column_name2:datatype2 :) see

bholmesdev commented 1 year ago

@JLarky, yes schema files are totally optional! You can still call fetchContent for entries in a collection, but you'll get an array of type any. You'll need a Zod schema to get type checking.

And that's a good suggestion! It was avoided since types like Date or email are hard to infer. That said, we could offer some sensible defaults for strings and numbers. I'd also love to lean into our CLI to generate schema starters / recipes for you, similar to that Ruby example you presented there.

bholmesdev commented 1 year ago

@natemoo-re Fair point! the ~ was a loose preference more than anything, since it:

Indicates schema is "magic"
Future-proofs us if we support collections of ts or js files in the future
Sorts schema definitions to the top of the directory. Always bothered me to have a folder of posts and a schema.ts in sorted alphabetically in the middle.

I understand ~ is not an Astro convention though. I may shy away from _ since it tells Astro to ignore files in src/pages. I've heard @schema.ts floated as well, which would be consistent with CMS conventions I've seen.

matthewp commented 1 year ago

Note that this comment is only about the Render Content proposal.

Also note that this is a critique of the proposed implementation and not of the external user-facing API. My alternative idea would not need to change the API.

The Render Content proposal has a few drawbacks that exist due to the fact that it relies on an implicit dependency (the $$result object) that is added by the compiler. That means that Render Content, as proposed, has the following downsides:

You cannot call renderContent() within a non-Astro file such as a JavaScript or TypeScript module.
- Since the API is an imported module, there's nothing to tell a dev that they couldn't do this.

You can't conditionally use the result of a renderContent() call, for ex. you can't do:

---
const { Content: First } = renderContent(firstNewsletter);
const { Content: Second } = renderContent(secondNewsletter);

const FeaturedNewsletter = Math.random() > .5 ? First : Second;
---

<FeaturedNewsletter />

There's nothing visual about the API that would tell you that you can't do this.

You can't use aliases to import the .astro component since this proposal relies on the compiler statically linking the import statement.
The $$renderContent reassignment would be a bit tricky to implement correctly, but not impossible, as our compiler has traditionally not done deep JS AST transformations. IE you want to account for the possibility of reassignments in user code, const myRenderContent = renderContent.
You can't use renderContent() in a API route. Although it wouldn't be useful to do so today, with a future render(Component) API you might want to do so.
renderContent() only works when used in page components. If you use it in a layout or any nested component, the CSS will not be added.
- This is not obvious at all from the API.

Alternative proposal

Thinking more about this problem I've realized that conceptually we have 3 ways that CSS could be added to a page:

Import based side-effect - import Button from './components/Button.astro'; This is how most CSS is added today but has the flaw that it is added even if you don't use the component; ie this is the Astro.glob problem.
Frontmatter-based side-effect - That's what this proposal adds, CSS is added based on a side-effect of calling a function within a page component.
Completely dynamic - Today this is only if you use <style is:inline> within a component.

Template-based side-effect

My proposal is that instead of the Frontmatter-based side-effect which renders CSS based on calling the renderContent() function, we instead render CSS based on the fact that a component is used within a template at all.

What that looks like is this:

---
const { Content } = renderContent(id);
---

<Content />

This moves the mechanism of adding CSS out of renderContent() and puts it instead inside of <Content />.

You can think of this as "head bubbling" because that's exactly what it is. Today a component gets compiled to something like this:

import Other from '../components/Other.astro';

const Wrapper = createComponent((result, props) => {
  return render`<div>${renderComponent(Other)}</div>`;
});

With this change it would instead be compiled to something like this:

import Other from '../components/Other.astro';

const Wrapper = createComponent((result, props) => {
  return renderHeadAndBody(
    // An array of all components used within a template
    [Other],
    render`<div>${renderComponent(Other)}</div>`
  );
});

The Astro runtime would then recursively discovers the component dependencies array to see if any of them have lazy/dynamic CSS.

Head bubbling

The above provides the plumbing needed to implement head bubbling, but does not propose it as an API. In the future we would be able to design such an API, something like: <astro:head><title>My page</title></astro:head> on top of this work. This would also likely be able to support framework components that have head bubbling (such as Svelte).

Improvements on proposal

This new idea improves on the current proposal in the following ways:

renderContent() can be called within any Astro component, including page components or nested children.
renderContent() can be called in a TypeScript/JavaScript module and then re-exported if people like that pattern.
renderContent() can be called and its results discarded. CSS is not added as a side-effect of calling the function but only when the component is actually used on the page.
No special-casing of the import; the compiler isn't aware of the renderContent() API at all and only changes to add the component dependencies array as part of the render result.

Dynamic CSS usage

The above proposal does not allow you to use a component dynamically within the template, for ex if you tried:

---
const { Content } = renderContent(id);
---

{ someCondition && <Content /> }

The CSS would still be added, since we collect all CSS based on the component being in the template.

For the most part this is what you want as it allows you to get implicit CSS without needing to manually add that CSS to the page.

However, we can support completely dynamic usage by adding another API. For example this:

---
const HeadStuff, { Content } = renderContentHeadAndBody(id);

const = Math.random() > .5;
---

<html>
  <head>
    { shouldUse && <HeadStuff /> }
  </head>
  <body>
   { shouldUse && <Content /> }
  </body>
</html>

Not that renderContentHeadAndBody is not a real name for this API. It's an intentionally bad name to avoid bikeshedding. If this API is desired we should come up with a better name.

In this API, the Content variable does not include CSS as a side-effect of being in the template. Instead renderContentHeadAndBody splits the result into one component that contains the head parts (CSS specifically) and the content HTML.

This way you can choose to add the CSS only if you need it based on runtime logic.

louiss0 commented 1 year ago

Concerns When It Comes To Fetch Content

Hello Ben my Name is Shelton I'm new to Astro I tried reaching out to you guys about some things but did not get an answer at all. I'm here because you are trying to solve a problem that has been plaguing us all since the beginning of astro blog types. You created a new API that allows us to use schema's to define the types for our front matter. You created a function that allows us fetch markdown depending on our types. I have a few problems with what you want to do and I hope this post would convince you to take a different direction. This post has only two main parts Contentions and Suggestions.

Contentions

First of I don't want to be forced to define my folder structure inside of a /content folder.
- I can't just do /posts/anything when it comes to linking to a page or skip to just creating a folder that is about a certain subject if I do and put files in those folders as well.
- What about those who just want to slap the markdown on to the page.
  - I created a plugin that allows me to do just that. You can use it to apply a layout to each file in a folder
  - Here is the project with that file right there Astro Layout Url Injector Example
Second I don't want to have to put my schema in front of a set of pages I'd prefer to just separate them into a file called /schemas
- I can organize them better and they won't clog up my pages folders I could even use one schema to create another one with out ever having to extract one out the folder.
- Putting schemas in a file allows me to navigate a lot better.

Suggestions

You should probably look into the Nuxt Content Page for inspiration for how you should create the fetch content API.
1. Please give us all of those features
2. They would be just awesome I could control my landing pages with close to no trouble
When it comes to the fetch content API I want the power to tell the query how to format each result.
I think I should have the power to pass in the schema to fetchContent() as the second argument

Formatting

Formatting is basically just telling the query how to return the results. You will pass in an object to the fetchContent() by either calling .format() Or it could be passed in as the third argument.

Fetch Content with .format()


    fetchContent()
        .format(
        ({pubDate, slug, ...rest})=>
        ({
            params:frontmatter.slug,
            props: {
                 ...rest             
               localeDate:pubDate.toLocaleString()
               isoDate:pubDate.toISOString()

            }
         })
         )

Fetch Content with the format as the third argument


    const schema = z.object({
        title: z.string()
        description: Z.string().max(60)
    })

    fetchContent("laravel", schema, 
    ({slug, ...rest})=> {params:slug, props:{...rest}})

bholmesdev commented 1 year ago

@natemoo-re Following up on ~schema.ts: I'm coming around to dropping that ~ prefix. It's a pretty non-standard prefix that doesn't bring much added benefit. Given NextJS avoided special characters for their layout RFC, I see no harm in avoiding here as well. I'll change to just schema.ts in the proposal. If anyone objects, please reply with your thoughts!

bholmesdev commented 1 year ago

@louiss0 Thanks for the thorough feedback, and happy you're checking out Astro! To respond to each of these:

I agree Nuxt's content querying is super powerful. Love how they took inspiration from MongoDB, and I think we could explore this in a future RFC! I think implementing these features today would be a bit too much at once though. Still, I think the proposed API sets us up for this future.
Formatting was an early idea that's still on our radar! We referred to this as a transform, where you could take the output of a schema and either modify existing values or compute new values. Again, to keep the scope of this RFC smaller, we tabled this for future exploration. Glad to hear early feedback this is worth trying.
A schema argument seems powerful, but somewhat misses the point of how the collection + schema relationships should work. We modeled this feature off of headless CMSes like Contentful, where every collection entry (Markdown in this case) adheres to a single schema. This makes collections very predictable, where authors can say with confidence that every blog post in content/blog will have the fields specified in the directory's schema.ts. Moving schemas to a fetchContent argument, this is no longer true. Content could have a number of different shapes depending on how you import them, which is much less predictable.

I also understand the tradeoffs of separate schema.ts files as opposed to a single /schema. Some prefer colocation, and others (like yourself) prefer separation. Neither approach is necessarily wrong! But when proposing a large API change like this, we try to stick to one happy path to start. If enough users ask for it, we can give a second happy path to meet their needs, or scrap the first path entirely. I still feel that separate schema files will help users understand what each collection contains, and is in-keeping with file based conventions we use for routing today. I invite others to chime in if they disagree!

louiss0 commented 1 year ago

@louiss0 Thanks for the thorough feedback, and happy you're checking out Astro! To respond to each of these:

I agree Nuxt's content querying is super powerful. Love how they took inspiration from MongoDB, and I think we could explore this in a future RFC! I think implementing these features today would be a bit too much at once though. Still, I think the proposed API sets us up for this future.

Formatting was an early idea that's still on our radar! We referred to this as a transform, where you could take the output of a schema and either modify existing values or compute new values. Again, to keep the scope of this RFC smaller, we tabled this for future exploration. Glad to hear early feedback this is worth trying.

A schema argument seems powerful, but somewhat misses the point of how the collection + schema relationships should work. We modeled this feature off of headless CMSes like Contentful, where every collection entry (Markdown in this case) adheres to a single schema. This makes collections very predictable, where authors can say with confidence that every blog post in content/blog will have the fields specified in the directory's schema.ts. Moving schemas to a fetchContent argument, this is no longer true. Content could have a number of different shapes depending on how you import them, which is much less predictable.

I also understand the tradeoffs of separate schema.ts files as opposed to a single /schema. Some prefer colocation, and others (like yourself) prefer separation. Neither approach is necessarily wrong! But when proposing a large API change like this, we try to stick to one happy path to start. If enough users ask for it, we can give a second happy path to meet their needs, or scrap the first path entirely. I still feel that separate schema files will help users understand what each collection contains, and is in keeping with file-based conventions we use for routing today. I invite others to chime in if they disagree!

I'm glad we agree on these things good luck implementing the querying feature. I was going to either steal the source code from Nuxt Content or try to find a way to do this myself.

I hope that either way I won't have to be restricted to just using the content folder. I want to write my markdown files. Where ever I want to write them without any restriction. I'm ok with the _schema.ts file. Another thing that you introduced was the entry.data thing. Giving people access to the data immediately would reduce boilerplate. Having to do entry.data to access the values that are needed is too much work for the user.

matthewp commented 1 year ago

From the call: the .astro module specifier requires setting up a TypeScript alias to get editor support. This should be listed as a Drawback. A plan to make this automatic would be desired as well.

matthewp commented 1 year ago

For the Content Schemas proposal, the Testing strategy section is missing.

matthewp commented 1 year ago

These proposals are not following the structure of the template: https://github.com/withastro/rfcs/blob/main/template.md

I would like to see this be updated to follow that format. You can still create subheadings that follow your current structure, but I'm finding it hard to read with the Summary and Motivation sections being combined into one.

bholmesdev commented 1 year ago

@matthewp Updated each to match our proposal template. Just 2 differences to call out:

I did keep the Goals sections around per Fred's suggestion
I separated "Detailed design" from "Detailed implementation" for the Content Schemas proposal. There were enough moving parts in each that separating the user API breakdown from the developer breakdown made sense

zadeviggers commented 1 year ago

I love this RFC and idea! Lots of use cases and things that get simplified! There are just a couple of details that I have some thoughts on. Please remember that I've only had one brief read of the RFC so far, so I could very well have missed something or misunderstood how something works. If I have, please correct me! That being said, here are my initial thoughts:

My main concern is the requirement to use zod for the content schemas. I understand all of the reasons and functionality provided by zod, but it seems slightly unusual to me. Elsewhere, Astro uses standard Typescript types for validation (e.g. in component prop types), which feels inconsistent. In many cases, using zod would be very overkill. My suggestion is to have zod be the recommended option, but also the option to export a plain old type instead of using zod for cases when introducing a whole additional package to a project just for some fundamental type validation isn't an option.

My other concern is with the generated .astro directory - I don't like this kind of code generation, but I can see how it's necessary here. My issue is with the naming of the directory - I was baffled when I first saw it in the RFC because it's identical to the end of Astro component file names. This is primarily because it's used in a place where a dev would often see Astro file extensions. I think this naming introduces some unnecessary cognitive load on the developer. Maybe .astro-generated or something similar would be better.

Just my two cents. I love this idea and can't wait to see it land!

bholmesdev commented 1 year ago

@zadeviggers Thanks for the thorough feedback. These are great points! Touching on each of these:

Yes, we've definitely heard this response to Zod and are considering ways to ease adoption. Examples may be auto-generating your schema from your existing posts as a migration step, or a CLI helper with prebuilt schemas to choose from. Still, after refactoring our blog and docs examples to use content schemas, I realized I was using Zod features for even the simplest use cases. Setting default values for props is an especially major benefit, like setting a default language for documentation with .default('en-us') (see example). This is why I think sticking with zod alone for an experimental release is the best path for this RFC. Start with a single happy path, and add an alternative if community need is strong enough.
Yes, I've noticed this can mess with fuzzyfind in your editor as well! Right now, we're trying an idea to place generated types in the src/content/ directory itself (see example). This removes the need for a .astro directory entirely, and should be just as easy to gitignore. Curious to know your thoughts on this.

bholmesdev commented 1 year ago

From the call: the .astro module specifier requires setting up a TypeScript alias to get editor support. This should be listed as a Drawback. A plan to make this automatic would be desired as well.

Note: this should no longer be necessary! We've implemented and tried 2 alternatives:

Add a reference entry to your project's env.d.ts: /// <reference path="../.astro/content.d.ts" />
Remove the .astro directory entirely, and place generated types in the src/content/ directory (see example).

We are also looking to Nuxt v3's implementation, since they have a similar need for type generation. I will update the RFC if we explore this route.

bholmesdev commented 1 year ago

@louiss0 Thanks for the follow-up. To respond:

I hope that either way I won't have to be restricted to just using the content folder.

We will be restricting this by-design for our experimental release. We think that a consistent structure that all Astro projects can adopt will make this feature easiest to understand to new users. Still, we have heard requests to wire schemas to custom glob patterns for docs outside the src/content/ directory. This feels like a future exploration to keep scope to a minimum, but good to hear there's early interest!

Another thing that you introduced was the entry.data thing.

Yes, we've nested frontmatter properties in another object to avoid conflicting with our generated id, slug, and body parameters. This should make it clear which properties are based on your schema (anything inside of data), and which properties are generated by Astro (anything outside of data). We will keep this convention unless there's a strong reason to change it.

JLarky commented 1 year ago

my 5 cents (inflation joke):

I like the idea of default schema, which could be done similarly to tsconfig, so that people who are scared or don't care could rely on something like

import {blogSchema} from "astro/shemas"
export const schema = blogSchema

and

import {blogSchema} from "astro/shemas"
import {z} from "zod"
export const schema = z.object({...blogSchema, tags: z.array(z.string())})

and even

import {anySchema} from "astro/shemas"
import {z} from "zod"
export const schema = anySchema // Record<string, any>

matthewp commented 1 year ago

@JLarky I like that idea a lot. Could see it expanding to be config based:

import { defineConfig } from 'astro/config';

export default defineConfig({
  schemas: {
    // folder name -> schema
    posts: 'astro/schemas/blog'
  }
});

zadeviggers commented 1 year ago

These new changes look great! I especially love the idea of using a virtual module instead of having to import directly from a generated directory. Thanks for taking all this feedback onboard - it shows how much Astro cares about its users.

andersk commented 1 year ago

Is this expected to work with frontmatter that’s generated by a remark plugin, as in this calculated reading time example?

It didn’t work for me, but maybe I did something wrong.

What I tried

With this diff from withastro/astro@7181af51879ba70ac7c8bbc0797216abd94e9160 (current `draft/content` branch), the `{post.data.minutesRead ?? "missing minutesRead"}` only gives “missing minutesRead”. (An `Astro.glob` version of the index shows `{post.frontmatter.minutesRead}` correctly.) ```diff --- a/examples/with-content/astro.config.mjs +++ b/examples/with-content/astro.config.mjs @@ -1,6 +1,11 @@ import { defineConfig } from 'astro/config'; import mdx from '@astrojs/mdx'; +import { remarkReadingTime } from './remark-reading-time.mjs'; export default defineConfig({ + markdown: { + remarkPlugins: [remarkReadingTime], + extendDefaultPlugins: true, + }, integrations: [mdx()] }); --- a/examples/with-content/package.json +++ b/examples/with-content/package.json @@ -12,7 +12,9 @@ }, "dependencies": { "@astrojs/mdx": "^0.11.4", - "astro": "^1.5.2" + "astro": "^1.5.2", + "mdast-util-to-string": "^3.1.0", + "reading-time": "^1.5.0" }, "devDependencies": { "zod": "^3.17.3" new file mode 100644 --- /dev/null +++ b/examples/with-content/remark-reading-time.mjs @@ -0,0 +1,8 @@ +import getReadingTime from 'reading-time'; +import { toString } from 'mdast-util-to-string'; + +export function remarkReadingTime() { + return (tree, { data }) => { + data.astro.frontmatter.minutesRead = getReadingTime(toString(tree)).text; + }; +} --- a/examples/with-content/src/content/blog/index.ts +++ b/examples/with-content/src/content/blog/index.ts @@ -10,5 +10,6 @@ export default defineCollection({ // transform to another data type with `transform` // ex. convert date strings to Date objects publishedDate: z.string().transform((str) => new Date(str)), + minutesRead: z.string().optional(), // marking this required breaks the build }), }); --- a/examples/with-content/src/pages/[...slug].astro +++ b/examples/with-content/src/pages/[...slug].astro @@ -1,5 +1,5 @@ --- -import { getCollection, renderEntry, Collection } from 'astro:content'; +import { getCollection, renderEntry, CollectionEntry } from 'astro:content'; import Layout from '../layouts/Layout.astro'; export async function getStaticPaths() { @@ -11,7 +11,7 @@ export async function getStaticPaths() { } interface Props { - post: Collection<'blog'>; + post: CollectionEntry<'blog'>; } const { post } = Astro.props as Props; --- a/examples/with-content/src/pages/index.astro +++ b/examples/with-content/src/pages/index.astro @@ -15,6 +15,7 @@ const promo = await getEntry('blog', 'promo/launch-week.mdx'); {blog.map(post =>

{post.data.title}

+

{post.data.minutesRead ?? "missing minutesRead"}

{post.data.publishedDate}

)} --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -289,10 +289,14 @@ importers: specifiers: '@astrojs/mdx': ^0.11.4 astro: ^1.5.2 + mdast-util-to-string: ^3.1.0 + reading-time: ^1.5.0 zod: ^3.17.3 dependencies: '@astrojs/mdx': link:../../packages/integrations/mdx astro: link:../../packages/astro + mdast-util-to-string: 3.1.0 + reading-time: 1.5.0 devDependencies: zod: 3.19.1 ```

bholmesdev commented 1 year ago

@andersk Ah yes, injected frontmatter will not be available from the getCollection or getEntry results. This is by design; since we want to avoid the running the remark pipeline and MDX compilation for performance, we can't resolve injected frontmatter.

However, I'm working on a change to renderEntry to retrieve injected frontmatter like so:

---
import { getCollection, renderEntry } from 'astro:content';
const blogPosts = await getCollection('blog');
---

{blogPosts.map(post => {
  const {
    injectedFrontmatter, // all properties injected via remark
    headings, // result of `getHeadings`
  } = renderEntry(post);
  const { readingTime } = injectedFrontmatter;
  const h1 = headings.find(h => h.depth === 1);

  return <p>{h1} - {readingTime} min read</p>
})

renderEntry is where anything dependent on remark or rehype will live, including headings and frontmatter injection. This means injected frontmatter will not be type checked, so you'll need to bring your own types similar to Astro.glob today.

In a future RFC, I'd love a new solution to computed properties like readingTime that doesn't require frontmatter injection or even remark plugins. I've excluded that to keep this RFC's scope manageable!

andersk commented 1 year ago

Hmm, does putting it in renderEntry mean it (still) can’t be retrieved without triggering CSS/JS injection?

bholmesdev commented 1 year ago

@andersk We're working on a way to only inject CSS when the <Content /> component is used! Stay tuned.

bholmesdev commented 1 year ago

Alright, thanks for the feedback y'all! We've been taking all of your suggestions into account and made some pretty important changes.

One schema config file

First, we heard the pushback on individual schema.ts files in every collection (shoutout to @JLarky @matthewp and @christian-hackyourshack). This is why we're combining all schemas into a single src/content/config file. This offers some key benefits:

makes collections just about the content, not about the schema or other config surrounding that content. This means less clutter for copy writers that may browse collection directories with little programming background.
sharing schemas and other constants is much simpler. Where you may have needed a structure like this in the current system:
```
src/content/
blog/
schema.ts # imports shared-schemas
docs/
schema.ts # imports shared-schemas
shared-schemas.ts
```

You can now colocate all your shared config in a single file:

src/content/
  blog/
  docs/
  config.ts

See the new "Adding a Schema" section for full details.

`.astro` -> `astro:content`

We also found that importing from the .astro cache directly feels confusing, since .astro is also a file extension (shoutout to @zadeviggers). So, we've introduced a new virtual module: astro:content.

This contains all utilities for configuring and querying your content. It also includes Zod's z utility, so you no longer need to install zod yourself 🏆

 import { z, defineCollection, getCollection, getEntry } from 'astro:content';

See the new Detailed Design for full details.

This RFC is still evolving, so keep the feedback coming! For those that want to try these new APIs early, you can install a preview build using the next--content-schemas flag:

pnpm i astro@next--content-schemas

You can also browse our revamped docs and blog examples for real world examples:

louiss0 commented 1 year ago

I'm confused about what you did with schema but could you more importantly talk about define collection? What else does it do rather than allow you to define a schema? Does it have other configuration details? Oh and good luck. I hope this RFC doesn't hurt performance at all or slow down SSG

 defineCollection({
  schema: {
    title: z.string(),
    slug: z.string(),
    // mark optional properties with `.optional()`
    image: z.string().optional(),
    tags: z.array(z.string()),
    // transform to another data type with `transform`
    // ex. convert date strings to Date objects
    publishedDate: z.string().transform((str) => new Date(str)),
  },
});

bholmesdev commented 1 year ago

@louiss0 Ah yes, I should clarify: we may introduce collection config options other than schema in the future. One that's been raised is a custom slug mapper if you want to compute entry slugs yourself:

// not final code
const blog = defineCollection({
  slug: ({ id, data }) => data.customSlug ?? slugify(slug),
  schema: {...},
})

Nesting schema as its own key should keep doors open like this.

And thanks! Happy to share we're seeing perf gains from content schemas over Astro.glob if anything 👍

naiyerasif commented 1 year ago

Are there any plans to introduce a concept of relationships between collections? For example, a blog collection may have an array of authors which may be part of an author collection. Usually, maintaining such relationships manually is a huge pain and having some good DX around this might be helpful.

Another thing I'd really love is to have some search primitive akin to SQL. For example,

const allBlogPostsAfter2020 = await search(`
  blog.* from blog
  where publishedDate.year > 2022
  order by publishedDate asc
`);

where publishedDate.year gets resolved by a function defined in the schema (if the function does not exist on the primitive itself).

Furthermore, the search API can flatten the getCollection and getEntry into one API.

// this gives you all the blog posts
const allBlogPosts = await search(`blog.* from blog`)

// this gives you an entry
const firstBlogPost = await search(`blog.* from blog where title = "First Blog Post"`)

// this gives you the latest entry
const latestBlogPost = await search(`blog.* from blog order by publishedDate asc limit 1`)

This might also work nicely with relationships using joins.

const blogPostsByAstro = await search(`
  blog.* from blog, author 
  where blog.authorId = author.id 
  and author.name = "Astro"
`)

Lume does something similar using its search and relations plugins.

bholmesdev commented 1 year ago

@naiyerasif Ah, I love these ideas!

Relationships: we've definitely built schemas with relations in mind. Since you can define a type for every field, we could certainly introduce a "reference" type to refer to other collections. This was kept out of the RFC to keep our scope well-defined, but it's a feature we're very excited to explore.
Relational querying: this is an interesting thought, and matches our analogy of collections to database tables. I'd point to Nuxt's Content feature for some prior art here. They decided to use MongoDB's query language to treat content as document-based. I'd be hesitant to 100% mimic SQL querying per your example since it would be difficult to offer intellisense for a generic string vs. helper functions. Still, I'd love to find an answer here that's beginner-friendly, while still powerful enough for advanced users.

louiss0 commented 1 year ago

@naiyerasif Ah, I love these ideas!

Relationships: we've definitely built schemas with relations in mind. Since you can define a type for every field, we could certainly introduce a "reference" type to refer to other collections. This was kept out of the RFC to keep our scope well-defined, but it's a feature we're very excited to explore.

Relational querying: this is an interesting thought, and matches our analogy of collections to database tables. I'd point to Nuxt's Content feature for some prior art here. They decided to use MongoDB's query language to treat content as document-based. I'd be hesitant to 100% mimic SQL querying per your example since it would be difficult to offer intellisense for a generic string vs. helper functions. Still, I'd love to find an answer here that's beginner-friendly, while still powerful enough for advanced users.

How far have you gotten in the last few days? Did you fix the windows problem? Is the magic layouts feature going to go away after this feature is standardized?

naiyerasif commented 1 year ago

Relationships: we've definitely built schemas with relations in mind. Since you can define a type for every field, we could certainly introduce a "reference" type to refer to other collections. This was kept out of the RFC to keep our scope well-defined, but it's a feature we're very excited to explore.

This can be a separate RFC if you think the current RFC may become too big.

Relational querying: this is an interesting thought, and matches our analogy of collections to database tables. I'd point to Nuxt's Content feature for some prior art here. They decided to use MongoDB's query language to treat content as document-based. I'd be hesitant to 100% mimic SQL querying per your example since it would be difficult to offer intellisense for a generic string vs. helper functions. Still, I'd love to find an answer here that's beginner-friendly, while still powerful enough for advanced users.

Any fluent query DSL (like Nuxt's Content) should be fine. I agree that having helper functions for such an API would be immensely helpful. I think this should be a part of this RFC since you're already planning for something similar with getCollection and getEntry.

pilcrowonpaper commented 1 year ago

Is it possible for the collection argument for getCollection() to be a route as well? So if I have content/blog/en, I can use either of these?

const blogs = await getCollection("blog");
const englishBlogs = await getCollection("blog/en");

To not introduce complexity, schemas will still be limited to top-level (blog in this case).

pilcrowonpaper commented 1 year ago

Also, why not move renderEntry() inside the entry object? Does entry have to be a POJO?

// now
const { Content } = renderEntry(entry);
// idea
const { Content } = entry.render();

bholmesdev commented 1 year ago

@pilcrowOnPaper Good questions!

No admittedly. Since collections are considered one level deep, you can only query for the top-level blog collection with the collection argument. However, we do offer a filter function where you can check for /en at the front of each entry slug. More on that in the blue Tip section on the landing page example.
This has been suggested, and was avoided in the initial RFC due to technical limitations. But with @matthewp's recent changes to our content renderer, this may be possible! We're very close to an experimental release so I plan to table this for future refinement. I'm glad to hear there's interest though.

bholmesdev commented 1 year ago

Alright everyone, thank you so much for your input and excitement over these past few weeks. We plan to discuss Content Collections during the RFC call on our discord tomorrow (2pm ET), and hope to reach consensus for an experimental release!

I'll highlight 2 final tweaks that were made:

We now support slug configuration from our src/content/config. This is useful for generating slugs based on frontmatter, or mapping your preferred directory structure (ex. /content/blog/2022-05-10/post.md) to URLs on your site (ex. /content/blog/post). You can use the slug argument like so:

import { defineCollection } from 'astro:content';

const blog = defineCollection({
  slug({ id, data }) {
    return data.slug ?? myCustomSlugify(id);
  },
  schema: {...}
});

export const collections = { blog };

@matthewp has heroically made renderEntry more powerful and stable with some new head-hoisting internals! If that jargon has your head spinning, here's the big takeaway: MDX styles and scripts are only injected when the <Content /> component is used. This means you can safely call renderEntry for headings and injectedFrontmatter without worrying about a bloated bundle (cc @andersk). Read the updated Detailed Design for full details.

And that's it! Hope to see y'all on the call tomorrow 👋

matthewp commented 1 year ago

Things to figure out before unflagging:

How should users colocate related data that is not the content from the schema? Currently suggesting _ folders are ignored.
What about relative image links? Should those be treated differently from images in side of md files outside of the content/ folder.

louiss0 commented 1 year ago

aliases: [content collections criticism,] tags: [] note type: Main created: day: Friday December 16th 2022
time: 20:49:37

Content Collections Criticism

First off Id like to say a good job on this RFC. It is good enough for me to create a project around and good enough for me to maybe scale it. I like the fact that you have decided to put the render function on the entry instead of us having to import it. I like get collectionToPaths() function Please add it! I even like the fact that Zod was chosen for this RFC. You said that maybe in the future other formats could be supported. I hope the next one is JSON. But no tool is perfect. This is only the beginning so I have decided to talk about some big changes to consider.

RSS Feeds

Right now with content collections, I can't seem to get RSS feeds working not that it's that important. But I want to have that power the issue is Can't use RSS with content folder I feel like this issue needs to be fixed immediately. I even copy and pasted the stack trace onto the issue so that the error can be addressed quickly.

Magic Properties

In Astro there is so far two magic frontmatter properties that are available. The first one is draft: and the second one is layout. I believe that both of them should be removed since people are expected to create their blogs by using Content Collections. Or they should at least not be able to be used in the /content folder. The reason why is that these properties are just not needed anymore. If the layout property was to be used it would lead to bad design and coupling. When you use the layout key the render function activates the mechanism responsible for rendering layouts. This RFC is better of being used to make the developer have to import the layout that the person wants to use inside of the page he or she wants to use it. Having magic layouts in the /content folder can only lead to bad behavior.

I could argue about removing the magic properties completely but then when it comes to pages the user would have to write lots of boilerplate code inside of pages. Since the /content folder exists the only things .mdx is good for is just being used as a templating language and importing components. So I think it's just better to just remove them from the content folder. Now as far as draft: is concerned. I'd argue that if someone was thinking about whether or not a page should be a draft of not it should just be put in the content folder instead. But draft is a minor thing. But if it is going to exist as a first-class citizen I'd suggest letting the developer know about it throughout the creation of each collection through types.

Extending and Defining a default Schema

There is a problem with the schema property of define collection. It only allows me to define a ZodRawShape I can't use z.object() on it at all. This means I can't use other features of zod in order to construct my schemas at all. This is not good. This means that if a developer wants to be able to reuse other schema definitions like title author draft updatedDate and even pubDate they would have to rewrite them all over again. Remember the DRY principle.

import { z, defineCollection } from 'astro:content';

// In this example the title is repeated twice 
const releases = defineCollection({
  schema: {
    title: z.string(), // here
    version: z.number(),
  },
});

const engineeringBlog = defineCollection({
  schema: {
    title: z.string(), //and  here
    tags: z.array(z.string()),
    image: z.string().optional(),
  },
});

export const collections = {
  releases: releases,
  // Don't forget 'quotes' for collection names containing dashes
  'engineering-blog': engineeringBlog,
};

I think there should be a default schema for people to use. I'd put it in the astro config but it could also exist in the /config.ts file for the content folder.

{
 contentCollections: {
     defaultSchema: {
         title: z.string().max(90)
         author: z.string("Authors name").default("Authors name")
     } 
 }
}

Users would probably then have the power to extend it by importing it from astro:content


import {defaultSchema, z } from "astro:content"

export const collections = {
    blog: defineCollection({
        schema: defaultSchema 
    })
}

I'm asking for the developer to have more access to zod's features. The schema key expects just a plain object. That makes it so that I can't just use the other functions that zod provides. If you don't intend to let developers gain full access to the API of zod through define collection. You could at least give them back some of its capabilities by using the extends: key in define collections and the omit: key so that you can omit some keys from a schema.

Schema for injected frontmatter

At the moment injected frontmatter is just not typeable at all. I think the user should have access to all the types for the frontmatter that is injected into the pages. I would like to have an injectedFrontmatterSchema: available for collections. If that is not possible I the render() function should have a generic argument that allows the user to pass in a type.

Injected Frontmatter Schema


const blog = defineCollection({
injectedFrontmatterSchema:{
 readingTime: z.string().datetime(),
 author: z.string().default("Shelton Louis"),

    } 
})

Example with the render function


render<T extends Record<string, unknown> >(): Promise<{  
Content: AstroComponentFactory;  
headings: MarkdownHeading[];  
injectedFrontmatter: T;  
}>

Lastly

I don't have many other concerns from here but are going to find a better way of generating types from collections. Generating an entry map for each individual collection seems good in the short run but bad in the long run. The thing is that the map can become huge and ts may not be able to tell us the answers we need and there could be some scalability issues when it comes to writing and erasing types.

I wish there was a way to specify injected front matter as default and each collection as well. That way people don't have to keep having to read the file where they put remark plugins to find which ones they injected. A key to specify injected front Matter Schemas would be nice.

ispringle commented 1 year ago

Would it be possible to provide a function to the collection config so that we can transform/create slugs in our own way? For example, the current slug normalizer that's creating the slug value is not dealing with unicode characters. It's pretty common that websites in languages which have characters beyond the standard ASCII alphanumerals will strip those characters out of URLs. Another example is the current setup doesn't remove whitespaces.

Seems the simplest solution is to continue to provide a simple slugifier and then allow people who want a more advanced one to provide that to the config. Of course you could just map over the collection returned by getCollection, update the slug field, and then use that new object, but this would need to be redone in ever file that uses that collection, but this is less than ideal,.

louiss0 commented 1 year ago

Would it be possible to provide a function to the collection config so that we can transform/create slugs in our own way? For example, the current slug normalizer that's creating the slug value is not dealing with unicode characters. It's pretty common that websites in languages which have characters beyond the standard ASCII alphanumerals will strip those characters out of URLs. Another example is the current setup doesn't remove whitespaces.

Seems the simplest solution is to continue to provide a simple slugifier and then allow people who want a more advanced one to provide that to the config. Of course you could just map over the collection returned by getCollection, update the slug field, and then use that new object, but this would need to be redone in ever file that uses that collection, but this is less than ideal,.

The solution is built in already.

import { defineCollection } from 'astro:content';

const blog = defineCollection({ slug({ id, data }) { return data.slug ?? myCustomSlugify(id); }, schema: {...} });

export const collections = { blog };

bholmesdev commented 1 year ago

@ispringle fair point! Other RFC reviewers raised slug customization as well, so we decided to ship a slug option as part of your collection config (see added section). This should address the more advanced use cases you raised.

Also curious to hear how our default slugger can be improved! We have an existing issue for handling file name spaces, but open to further ideas as well.

bholmesdev commented 1 year ago

Things to figure out before unflagging:

How should users colocate related data that is not the content from the schema? Currently suggesting _ folders are ignored.

What about relative image links? Should those be treated differently from images in side of md files outside of the content/ folder.

Both points have been addressed in the final RFC draft. With these resolved... this RFC is officially accepted and good-to-merge 🥳

Thanks again to everyone for your time and input. You can try on the experimental release with our shiny new Content Collections docs.

We'll also be marking this RFC as closed. So if you have future ideas, we encourage you to start a new discussion. Thanks again 🙌

withastro / roadmap

Content Collections #373

Summary

Links

Alternative proposal

Template-based side-effect

Head bubbling

Improvements on proposal

Dynamic CSS usage

Concerns When It Comes To Fetch Content

Contentions

Suggestions

Formatting

{post.data.title}

One schema config file

`.astro` -> `astro:content`

aliases: [content collections criticism,] tags: [] note type: Main created: day: Friday December 16th 2022
time: 20:49:37

Content Collections Criticism

RSS Feeds

Magic Properties

Extending and Defining a default Schema

Schema for injected frontmatter

Lastly

withastro / roadmap

Content Collections #373

Summary

Links

Alternative proposal

Template-based side-effect

Head bubbling

Improvements on proposal

Dynamic CSS usage

Concerns When It Comes To Fetch Content

Contentions

Suggestions

Formatting

{post.data.title}

One schema config file

.astro -> astro:content

aliases: [content collections criticism,] tags: [] note type: Main created: day: Friday December 16th 2022 time: 20:49:37

Content Collections Criticism

RSS Feeds

Magic Properties

Extending and Defining a default Schema

Schema for injected frontmatter

Lastly

`.astro` -> `astro:content`

aliases: [content collections criticism,] tags: [] note type: Main created: day: Friday December 16th 2022
time: 20:49:37