shikijs / shiki

A beautiful yet powerful syntax highlighter
http://shiki.style/
MIT License
10.36k stars 376 forks source link

`options.meta` does not return parsed metadata object in transformer #792

Open nikolailehbrink opened 1 month ago

nikolailehbrink commented 1 month ago

Validations

Describe the bug

I am trying to create a Shiki transformer that retrieves metadata like the filename from the options.meta object during the preprocessing step, but I'm encountering an issue.

According to the Shiki documentation on transformers, the options.meta object should return a parsed key-value structure like:

{
  "meta": "here",
  "__raw": "meta=here"
}

However, when I use @shikijs/rehype and attempt to access options.meta, I only receive the __raw string without the parsed key-value pairs. For example:

```tsx twoslash filename=app/routes/test.ts
console.log("Hello, World!");

I expect `options.meta` to return:

````json
{
  "twoslash": "something",
  "filename": "app/routes/test.ts",
  "__raw": "twoslash filename=app/routes/test.ts"
}

But instead, I get:

{
  "__raw": "twoslash filename=app/routes/test.ts"
}

This forces me to manually parse the __raw string, which contradicts the documentation.

Steps to Reproduce:

  1. Use @shikijs/rehype with the following transformer:

    const transformer = () => ({
     name: "transformer",
     preprocess(code, options) {
       console.log(options.meta);
     }
    });
  2. Process a code block with metadata (e.g., twoslash filename=app/routes/test.ts).

Expected Behavior: options.meta should return a parsed object with metadata fields like filename, along with the __raw string.

Actual Behavior: Only the __raw string is returned, without parsed metadata.

Environment:

import { defineConfig } from "vite";
import tsconfigPaths from "vite-tsconfig-paths";
import mdx from "@mdx-js/rollup";
import rehypeShiki from "@shikijs/rehype";

const transformer = () => ({
  name: "transformer",
  preprocess(code, options) {
    console.log(options.meta);
  },
});

export default defineConfig({
  plugins: [
    mdx({
      rehypePlugins: [
        [rehypeShiki, { theme: "nord", transformers: [transformer()] }]
      ]
    }),
    tsconfigPaths(),
  ],
});

Any insights on why the metadata isn't being parsed as expected would be greatly appreciated.

Reproduction

https://github.com/nikolailehbrink/shiki-rehype-bugs

Contributes

fuma-nama commented 1 month ago

You have to specify the parser with parseMetaString option, Rehype Shiki doesn't come with a default parser.

/**
 * Custom meta string values
 */
const metaValues: MetaValue[] = [
  {
    name: 'title',
    regex: /title="(?<value>[^"]*)"/,
  },
  {
    name: 'custom',
    regex: /custom="(?<value>[^"]+)"/,
  },
];

const options = {
  parseMetaString(meta) {
    const map: Record<string, string> = {};

    for (const value of metaValues) {
      const result = value.regex.exec(meta);

      if (result) {
        map[value.name] = result[1];
      }
    }

    return map;
  },
};
olets commented 1 month ago

however, when I use @shikijs/rehype [ OP ]

Same for @shikijs/markdown-it plugin. Repro: https://stackblitz.com/edit/shikijs-issue-792-markdown-it-repro

You have to specify the parser [ https://github.com/shikijs/shiki/issues/792#issuecomment-2383497350 ]

The docs (linked in the original report) state that no additional work is required.

fuma-nama commented 1 month ago

I think it is more likely a docs problem, the linked section states: Transformers can also access markdown 'meta' strings

It only tells you that transformers can access the string (__raw), it doesn't mention the meta object (e.g. how to parse it). But the example below is a bit misleading, it simply referenced the object which a parser is specified.

cworld1 commented 1 week ago

A minimal example to solve this problem (at least work as expected as this issue):

// Tks for @olets code
function parseMetaString(str = '') {
  return Object.fromEntries(
    str.split(' ').reduce((acc: [string, string | true][], cur) => {
      const matched = cur.match(/(.+)?=("(.+)"|'(.+)')$/)
      if (matched === null) return acc
      const key = matched[1]
      const value = matched[3] || matched[4] || true
      acc = [...acc, [key, value]]
      return acc
    }, [])
  )
}

export const processMeta = (): ShikiTransformer => {
  return {
    name: 'shiki-transformer-process-meta',
    preprocess() {
      if (!this.options.meta) return
      const rawMeta = this.options.meta?.__raw
      if (!rawMeta) return
      const meta = parseMetaString(rawMeta)
      Object.assign(this.options.meta, meta)
    }
  }
}

// Then add transformer into your config, like:
const shikiConfig = {
      // ...
      transformers: [processMeta()]
    }

Works well on astro v4.x here. Reload configuration file is needed to sync any changes if functions are split and using by import.

olets commented 1 week ago

That parseMetaString is identical to what I authored in the stackblitz I shared (with the addition of annotating acc's type, and two typo I've fixed). Coincidence, or copied and presented as your own without attribution?

cworld1 commented 1 week ago

@olets I thought you're tagged as contributor, so code is shared for this repository (I hope I maybe can do the same). If you ensure to make own copyright of the code, I'll delete it soon. Very sorry.

olets commented 1 week ago

No need to delete it or stop using it, it isn't copyrighted and it isn't significantly different from what someone else could have come up with. I was just surprised to see something I wrote presented as if it was something someone else wrote. Glad it works for you.

cworld1 commented 1 week ago

My aim is to provide a general enough solution to this problem that it can directly help others, just so.