sdorra / content-collections

Transform your content into type-safe data collections
https://content-collections.dev
MIT License
405 stars 18 forks source link

`transform` timeout causes required file to not be built #214

Closed haydenbleasel closed 1 month ago

haydenbleasel commented 1 month ago

I have the following collection:

const blogs = defineCollection({
  name: 'blog',
  directory: 'content/blog',
  include: '**/*.md',
  schema: (z) => ({
    title: z.string(),
    description: z.string(),
    date: z.string(),
    tags: z.array(z.string()),
  }),
  transform: async (page, context) => {
    const body = await compileMDX(context, page, {
      remarkPlugins: [remarkGfm],
      rehypePlugins: [[rehypeCode, rehypeCodeOptions], remarkHeading],
    });

    const blur = await sqip({
      input: `./public/blog/${page._meta.path}.jpeg`,
      plugins: [
        'sqip-plugin-primitive',
        'sqip-plugin-svgo',
        'sqip-plugin-data-uri',
      ],
    });

    const result = Array.isArray(blur) ? blur[0] : blur;

    return {
      ...page,
      body,
      date: new Date(page.date),
      slug: page._meta.path,
      readingTime: readingTime(page.content).text,
      image: `/blog/${page._meta.path}.jpeg`,
      imageBlur: result.metadata.dataURIBase64 as string,
    };
  },
});

The longest running part of it is of course importing the referenced image and creating a low-quality image placeholder. This takes maybe a minute with the amount of images I'm passing it.

Before the transforms start, .content-collections/generated/index.js is created with the following contents:

// generated by content-collections at Sun Jul 21 2024 00:57:04 GMT-0400 (Eastern Daylight Time)

import allBlogs from "./allBlogs.js";

export { allBlogs };

allBlogs.js is usually generated shortly after, once all the content is done transforming. However, if the transform function takes too long, it times out and allBlogs.js isn't generated, leading to the following build log:

web:build: > web@ build /Users/haydenbleasel/GitHub/eververse/apps/web
web:build: > next build
web:build:
web:build:   ▲ Next.js 14.2.5
web:build:   - Environments: .env.local
web:build:
web:build: Starting content-collections content-collections.ts
web:build:    Creating an optimized production build ...
web:build: build started ...
web:build: Failed to compile.
web:build:
web:build: .content-collections/generated/index.js
web:build: Module not found: Can't resolve './allBlogs.js'
web:build:
web:build: https://nextjs.org/docs/messages/module-not-found
web:build:
web:build: Import trace for requested module:
web:build: ./app/blog/[slug]/page.tsx
web:build:
web:build:
web:build: > Build failed because of webpack errors
web:build:  ELIFECYCLE  Command failed with exit code 1.

Running on Next.js 14.2.5 on Node.js v20.12.2.

sdorra commented 1 month ago

This is strange, because their is no timeout for the transform function. Can you test, if you get the same issue when you use the cli instead of the next adapter?

sdorra commented 1 month ago

You should definitely cache the blur generation, to speed up rebuilds: https://www.content-collections.dev/docs/transform#caching

haydenbleasel commented 1 month ago

@sdorra Seems to work fine when using the CLI. Maybe it's a race condition specifically with the Next.js adapter?

haydenbleasel commented 1 month ago

Thanks for the info on caching! Combining caching with a prebuild script of pnpm content-collections build does the trick. I guess it's an issue specifically with the Next.js adapter.

sdorra commented 1 month ago

@haydenbleasel I've tried to reproduce the issue by using a timeout:

import { defineCollection, defineConfig } from "@content-collections/core";

const characters = defineCollection({
  name: "characters",
  directory: "characters",
  include: "*.md",
  schema: (z) => ({
    name: z.string().min(1),
    origin: z.string().min(1),
    species: z.string().min(1),
    source: z.string().min(1).url(),
  }),
  transform: async (data) => {
    console.log(`transforming ${data.name} ...`)
    for (let i=0; i<60; i++) {
      console.log(`Waiting for 1 second... (${i+1}/60)`);
      await new Promise((resolve) => setTimeout(resolve, 1000));
    }
    return data;
  },
});

export default defineConfig({
  collections: [characters],
});

I've tested with 5 Markdown files, so content collections run for 5 minutes and I've got no timeout. Can you help me reproduce the issue? Is the repository that requires the prebuild step public?

sdorra commented 1 month ago

I'm closing the issue now because I can't reproduce the problem. Please reopen the issue if you can help me reproduce the issue.