markedjs / marked

A markdown parser and compiler. Built for speed.
https://marked.js.org
Other
32.81k stars 3.38k forks source link

Improve docs with latex parsing for blocks and additional notations than just `$` #3434

Closed ildella closed 1 day ago

ildella commented 2 weeks ago

What pain point are you perceiving?.

Trying to parse latex code has been quite a problem. It comes in different formats and rendering is not enough as the source text seems to mess up with tokenization.

The docs addresses exactly this problem while explaining how to extend tokenizers: https://marked.js.org/using_pro#tokenizer

But the example covers only one simple use case: the dollar notation for inline code. There are 3 more: double dollar for blocks, then a completely different notation for blocks and inline based on parenthesis.

Describe the solution you'd like I've wrote a few tests to show what happens: All tests only depend on marked.

https://github.com/ildella/svelte-markdown/blob/latex/tests/marked-latex.spec.js

The solution would be to provide some additional info on how to parse blocks at least, other than just inline latex functions. Ideally we could move this forward provide an optional set of marked tokenizers that can then be used with renderers.

For renderers, I was successful in rendering both $ and $$ in my project without new marked tokenizers using katex renderer for paragraphs:

  import katex from 'katex'
  import 'katex/dist/katex.min.css'

  const baseMathCss = 'katex overflow-x-auto mt-2 mb-2'

function processMath (text) {
    // Block math: $$...$$
    const withBlocks = text.replace(/\$\$([\s\S]+?)\$\$/g, (match, p1) => `<div class="${baseMathCss}">${katex.renderToString(p1, {throwOnError: false})}</div>`)
    // Inline math: $...$
    const withInline = withBlocks.replace(/\$(.+?)\$/g, (match, p1) => `<span class="${baseMathCss}">${katex.renderToString(p1, {throwOnError: false})}</span>`)
    return withInline
  }

But with \[ and \( notation we need to work at tokenizer level. Additionally, even dollars notation sometimes fails as some functions are called \nabla or \times we need to work at the tokenizer level to be succesfull, or the newline and tab chars are messing things up.

UziTech commented 2 weeks ago

If you would like to improve the docs a PR would be great 😁👍

For a full katex extension you can check out marked-katex-extension