markedjs / marked

A markdown parser and compiler. Built for speed.
https://marked.js.org
Other
32.8k stars 3.38k forks source link

Extension not triggering #3425

Closed nrivard closed 6 days ago

nrivard commented 3 weeks ago

Marked version:

Describe the bug I have created an extension to try and detect runs of newlines but neither source function nor tokenizer function ever seem to get called.

I am using like:

export class Markdown {
  #marked = new Marked().use({
    extensions: [newlineExtension]
  });

  // other stuff in my class
}

To Reproduce

  1. Define your extension as follows:
export const newlineExtension: TokenizerExtension = {
    name: 'newline',
    level: 'block',
    start(src) {
        return src.match(/[\n]/)?.index
    },
    tokenizer(src, tokens) {
        const match = src.match(/^[\n]{2,}$/);
        if (match && match.length > 0) {
            return {
                type: 'newline',
                raw: match[0]
            }
        }
        return undefined;
    }
};
  1. Pass your extension on to marked:
export class Markdown {
  #marked = new Marked().use({
    extensions: [newlineExtension]
  });

  // other stuff in my class
}
  1. Pass the following string in:
const multiLine = `
First line

Second line
`

In this case I am still getting space with raw: '\n\n\n\n\n\n\n' as the token type.

Expected behavior I would expect my extension function(s) to get called a single newline token to get created

UziTech commented 3 weeks ago

Thank you for the detailed issue!

It looks like the issue is $ in your matching regex. That will only match if the new lines are at the end of the string (if Second line line was removed) If you change the matching regex to /^[\n]{2,}/ it works.

nrivard commented 2 weeks ago

Ok but even when I change the regex, neither start nor tokenizer function ever seem to get called.

export const newlineExtension: TokenizerExtension = {
  name: 'newline',
  level: 'block',
  start(src) {
    return src.match(/\n/)?.index;
  },
  tokenizer(src, tokens) {
    const match = src.match(/^[\s]*\n[\s]*\n[\s]*/);
    if (match) {
      return {
        type: 'newline',
        raw: match[0]
      };
    }
    return undefined;
  }
};

export class Markdown {
  #marked = new Marked().use({
    extensions: [newlineExtension]
  });

  constructor() {}

  parse(src: string): AnyComponent {
    // these are custom
    const renderer = new MarkdownRenderer();
    const parser = new MarkdownParser({renderer: renderer});

    const tokens = this.#marked.lexer(src, {async: false, gfm: true});
    const components = tokens.length ? parser.parse(tokens) : [];
    return components.length ? <Column>{components}</Column> : <Blank />;
  }
}

One wrinkle though is that I'm only using the lexer...I have a custom parser and custom renderer objects that return something other than a string so I'm calling the above class like:

const output = Markdown.parse(str)

Is it possible tokenizer is never invoked if i'm just using the lexer?

UziTech commented 2 weeks ago

I see, the options object that you pass to the lexer is the full options object that is used. It does not get combined with the default options that contains the extension (for legacy reasons). Since both of the options you set are defaults you can just not have the options object or pass an object combined with this.#marked.defaults

this.#marked.lexer(src);
// or
this.#marked.lexer(src, { ...this.#marked.defaults, async: false, gfm: true });
UziTech commented 2 weeks ago

We also just added the hooks.provideParser hook in v14.1.0 which should allow you to provide a parser that can return anything as an extension.

This should allow something like:

export const newlineExtension: TokenizerExtension = {
  name: 'newline',
  level: 'block',
  start(src) {
    return src.match(/\n/)?.index;
  },
  tokenizer(src, tokens) {
    const match = src.match(/^[\s]*\n[\s]*\n[\s]*/);
    if (match) {
      return {
        type: 'newline',
        raw: match[0],
      };
    }
    return undefined;
  },
};

const reactParser = {
  provideParser() {
    const renderer = new MarkdownRenderer();
    const parser = new MarkdownParser({ renderer });
    return (tokens) => {
      const components = tokens.length ? parser.parse(tokens) : [];
      return components.length ? <Column>{components}</Column> : <Blank />;
    };
  },
};

export class Markdown {
  #marked = new Marked().use({
    extensions: [newlineExtension],
    hooks: reactParser,
  });

  constructor() {}

  parse(src: string): AnyComponent {
    return this.#marked.parse(src, { async: false, gfm: true });
  }
}
nrivard commented 2 weeks ago

Wow I will give this a try, thank you!