[feature request] handle contents failing schema validation, or autofixing

KagamiChan commented 2 months ago

Hi I'm new to velite, and in my use case, I plan to assign a random ID to each markdown file in frontmatter, and prefer the generation is automatic: The file is created without ID and then the ID is written to file after saving.

I wanted to make use of prepare velite config to autofix this but looks like the data available in this hook are all contents that pass the schema validation and those who fail are dropped.

So I'm asking if we could add a new hook or some mechanism to allow fixing validation errors on the fly in velite (watch mode maybe). Here's some thoughts

this hook should have access to the files that did not pass the validation and respetively the reason(s)
this hook should have access to the files that passed the validation because they might be required in the fix (my use case is an example, ID should be unique)
we might need a physical path schema to ease the file writing

If this request is acceptable, I can also help if I find some spare time

zce commented 2 months ago

First of all, thank you for your participation,

I want to confirm your needs first.

Do you want Velite to automatically add some fields such as id to each entity when working, and this process is programmable?

Maybe the transform and superRefine of zod schema can realize your idea.

const plans = {
  name: 'Plan',
  pattern: 'plans/*.md',
  schema: s.object({
    name: s.string(),
    slug: s.slug('plan'),
    icon: s.string(),
    prices: s.object({ yearly: s.number(), monthly: s.number() }),
    description: paragraph,
    featured: s.boolean().default(false),
    benefits: s.array(s.string())
  }).transform(data => ({ ...data, id: uuid() }))
}

If it is not what I understand, can you provide a pseudo code to help me understand your needs

KagamiChan commented 2 months ago

Thanks @zce

I'd like to have a persistent/stable ID for each markdown document, which means the ID should be written to the file the first time it is generated. I feel the transform would produce a random ID each time velite runs, and the ID will change next time. please let me know if I'm wrong.

KagamiChan commented 2 months ago

Looking at the transform and superRefine that you mentioned, I realize I am able to generate the ID then write it to the original file and finally return the document with the key, I'll give it a try, and I'm not sure if this is the best practice for zod

KagamiChan commented 2 months ago

Result: I was using s.unique() for ID, and turns out the document is dropped at unique() validation, so superRefine, transform or prepare would not have chance to fix the file.

Changing to s.optional(s.unique()) could allow the validation pass and the missing data issue could be fixed by myself in the above places

KagamiChan commented 2 months ago

here's my code, hope it could better explain my idea

import fs from 'node:fs/promises'
import path from 'node:path'

import pMap from 'p-map'
import pRetry from 'p-retry'
import matter from 'gray-matter'
import { customAlphabet } from 'nanoid'

const nanoid = customAlphabet('abcdefghijklmnopqrstuvwxyz')

export default defineConfig({
  root: 'contents',
  collections: {
    posts: {
      name: 'Post',
      pattern: 'posts/**/*.mdx',
      schema: s
        .object({
          ...
          id: s.optional(s.unique()),
          path: s.path(),
        })
    },
  },
  prepare: async (data) => {
    const knownIds = data.posts.map((post) => post.id).filter(Boolean)
    const pendingAssignments = data.posts.filter((post) => !post.id)

    await pMap(pendingAssignments, async (post) => {
      const id = await pRetry(() => {
        const id = nanoid()
        if (knownIds.includes(id)) {
          throw new Error('ID collision')
        }
        return id
      })
      const file = path.resolve(process.cwd(), 'contents', `${post.path}.mdx`)
      const content = await fs.readFile(file, 'utf-8')
      const result = matter(content)
      const newContent = matter.stringify(result.content, {
        ...result.data,
        id,
      })
      await fs.writeFile(file, newContent)
      console.log(`Assigned ID ${id} to ${post.title}`)
    })
  },

zce commented 2 months ago

https://github.com/zce/velite/issues/108#issuecomment-2041738210

KagamiChan commented 2 months ago

Thanks for the information. Let me explain my thoughts, ID is not only used to distinct documents.

There do exist a need behind this constant/persistent ID, for example, creating a short URL so that it could be shared more easily, general hashing algorithm usually outputs a long ID as they are designed to reduce collision possibilities

Also if we use hash of the title as ID, then mofidying the title would cause the ID to change, which would confuse people who saved the URL or added it to browser favorites if the ID is used in URL.

I totally agree that ensuring a unique ID does not have to happen inside velite, we might use git hooks, validation tests in CI, etc to handle that. I just feel it is more suitable to be triggered by content generation because we have all content data together as well as validaton failure issues. And this could be generalized as data autofixing capabilities, this ID issue migt not be a proper case though

As my initial requests could be nearly fulfilled (the imperfect part is the id becomes optional in typing), I'm closing this issue. Thanks for you help.

zce commented 2 months ago

As you mentioned, this is the reason why Velite doesn't assertively add an ID for each item.

Honestly, I believe that hardcoding is the most appropriate approach for IDs. If an ID is required, each data should be assigned a constant and unchanging ID at the time of creation.

zce / velite

[feature request] handle contents failing schema validation, or autofixing #133