microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
163.18k stars 28.84k forks source link

Markdown Preview: ParseError: KaTeX parse error: Expected 'EOF', got '#' #138970

Closed rioj7 closed 1 year ago

rioj7 commented 2 years ago

Does this issue occur when all extensions are disabled?: Yes

Steps to Reproduce:

  1. Create a markdown file with the following content:
    * `${dateTimeFormat}` : use the setting `templates.dateTimeFormat` to construct a [date-time](#variable-datetimeformat).
    * <code>${dateTimeFormat:<em>name</em>:}</code> : use a _named_ format in the setting `templates.dateTimeFormat` to construct a [date-time](#variable-datetimeformat). The format properties override what is defined in `templates.dateTimeFormat`.
    * `${input:description:}` : Ask the user some text and use the _`properties`_ part as the description for the InputBox<br/>Example: `${input:Title of this page:}`
    * <code>${input:<em>description</em>:}</code> : Ask the user some text and use the _`properties`_ part as the description for the InputBox<br/>Example: 
    * <code>${input:<em>description</em>:}</code> : Ask the user some text and use the _`properties`_ part as the description for the InputBox<br/>Example: `${input:Title of this page:}`
    * <code>${snippet:<em>definition</em>:}</code> : you can use the full syntax of the [Visual Studio Code snippets](https://code.visualstudio.com/docs/editor/userdefinedsnippets#_snippet-syntax).<br/>A snippet is evaluated after the file is created with the command: **Next Snippet in File** from the Context Menu or Command Palette. The editor needs to be put in _Snippet_ mode. Apply this command for every `${snippet}` or `${cursor}` variable still in the file.<br/>
    Example: `${snippet##${1|*,**,***|} ${TM_FILENAME/(.*)/${1:/upcase}/} ${1}##}`
  2. Open a Preview to the side
  3. Rendering/Parsing of line 5 goes wrong
  4. Parsing of line 6 gives the: ParseError: KaTeX parse error: Expected 'EOF', got '#'
  5. I have added line 3 and 4 to determine what causes the problem, it is the literal text at the end of line 5: ${input:Title of this page:}
  6. Any literal text (single backtick) of the content: ${xxx} generates the error
  7. If there is a <code> tag on a line, any literal text (single backtick) after that is not recognized by the syntax highlighting in the editor
rioj7 commented 2 years ago

I have just updated the extension File Templates that has this piece of Markdown in the README.

Github, Marketplace and Extension Bar all render the markdown correct.

Lemmingh commented 2 years ago

I recommend that you disable the built-in Markdown Math (vscode.markdown-math), because it turns the Markdown inside VS Code to a flavor much different from GFM that you apparently don't want.

The Markdown Math was introduced in version 1.58.

Besides, if you install some VS Code extensions that offer math support someday, you will need to check their documentation to disable the corresponding features.


The rendering did not go wrong, to my knowledge. You just discovered the dark side of Markdown.

Markdown is not a single rigorous language, but a collection of various flavors (aka "dialect", "variant") that disagree with each other.

Let's take a glance at its history first, though there should already be more detailed articles on the Internet.

John Gruber and Aaron Swartz created the first specification in 2004. It's the origin of Markdown, and unfortunately ambiguous.

Soon, flavors bloomed.

Many years later, a group led by John MacFarlane began to develop CommonMark (CM; formerly "STMD"). The new spec has become the basis of many flavors in recent years.

Among these derivatives, the current GitHub Flavored Markdown (GFM), debuted in 2017, is the one that you're familiar with. Interestingly, a number of Markdown implementations adopt GFM and modify its extensions to create new flavors, for example, markdown-it.

I don't know the very flavor that VS Code's Markdown Math tries to follow, but it (mjbvz/markdown-it-katex) appears to take some conventions from pandoc and Jupyter Notebook. Please correct me.

Now we can see that lots of Markdown flavors are technically incompatible. Thus, it's a pity that Markdown users have to configure their processors to get desired results. Here, in VS Code, the built-in Markdown Language Features (vscode.markdown-language-features) ships a GFM-like environment. You can combine other extensions to get a more GFM-like experience as long as you're careful.

mjbvz commented 2 years ago

Minimal example that demonstrates the issue:

<code>$x</code> `${a}`
mjbvz commented 2 years ago

Marking as a good first issue since this should be a fairly scoped bug

Here's the relevant repository where this issue should be fixed: https://github.com/mjbvz/markdown-it-katex

The repo includes tests that you can use to confirm the issue is fixed and also check for regressions

JihongGan commented 2 years ago

I am looking to fix this by not allowing a dangling dollar symbol inside a markdown block to open or close. Do you think I'm on the right path?

Lemmingh commented 2 years ago

I still do not regard the behavior here as a bug.

Things just work this way as per the inline parsing strategy of CommonMark.


From a CommonMark-compliant parser's view, we can say in a non-normative way that

<code>$x</code> `${a}`

is equivalent to

<xxxx>$x<xxxxx> `${x}`

Then, given the enabled rules (markdown-it default, CommonMark HTML, pandoc-style math, and something else), it's recognized as

<xxxx>$xxxxxxxxxx$xxxx

This is correct and exactly reflects the philosophy of CommonMark.


This issue ought to be marked as https://github.com/microsoft/vscode/labels/%2Aas-designed

Markdown flavors are not compatible, as I mentioned above. A derivative flavor can inherit the philosophy of its parent, but cannot retain the semantics.

In other words, you cannot enable a math plugin and expect GFM-like rendering at the same time. You can only discuss right or wrong within the same flavor.

If you want GFM-like rendering, disable the built-in Markdown Math (vscode.markdown-math) extension first.

rioj7 commented 2 years ago

@Lemmingh But how do I write in Markdown Math a line that shows me

<code>$x</code> `${a}`

Is there a way to Escape the $

Lemmingh commented 2 years ago
`$x` `${a}`
Lemmingh commented 2 years ago

Avoid plain HTML-like pieces in Markdown. Use proper syntax whenever possible.

In Markdown, HTML-like pieces (leaf block "HTML block" and inline structure "Raw HTML") are nothing, but just a kind of literal content. CommonMark requires implementations to skip them: There are roughly 7 + 6 rules for recognizing HTML-like pieces; HTML-like pieces are captured, and then emitted as-is to the output, without further touching parser's machine state.

This design is a compromise for compatibility, and causes usability and security issues, like your cases. jgm complained about the terrible fact many years ago.


As for escaping, if your use case is presenting a character in textual content, you can try "backslash escape" like \$, and "entity and character reference" like &dollar; and &#36;.

rioj7 commented 2 years ago

@Lemmingh Maybe the wrong example asked, the OP was about some italic (or bold) text inside a code block, it was done with <code>foo<em>bar</em>foo</code>

foobarfoo

and when you use a $ character outside backticks and further down on the line a $ it goes wrong.

I did not know Markdown allowed HTML entities.

I have now enabled Markdown Math and replaced all $ characters inside <code></code> with &dollar; and now there is no KaTeX parse error

Lemmingh commented 2 years ago

Excuse me? I'm confused now.

So, you add these dangerous pieces to your Markdown document for formatting?

<code>${input:<em>description</em>:}</code>

<code>foo<em>bar</em>foo</code>

Unfortunately, CommonMark Markdown is not a superset of WHATWG HTML. They actually have almost no intersection, and apply nearly opposite handling when encountering pieces that look like HTML tags.

If you decide to have HTML-like things in a Markdown document, you have to be careful, and know how to lead the parser to your desired result.

I thought I've made these basic facts clear. But seems no. No one understands my comments above.


Before I come up with another way to explain, we can demonstrate something by this sample:

<section>
<!--

Hum?
-->
Oops!
</section>

If it's treated as HTML, there is one section element with a comment and a text node "Oops!" inside.

However, if you feed it to a CommonMark-compliant renderer, you'll get somewhat illegal HTML output:

<section>
<!--
<p>Hum?
--&gt;
Oops!</p>
</section>
rioj7 commented 2 years ago

@Lemmingh If they are so dangerous why do they allow HTML tags in CommonMark Markdown?

As soon as you use single or triple backtick all inside is treated as literal, there is AFAIK no way to make something inside italic (or bold) using Markdown syntax. I use HTML tags to get the required rendered output.

Lemmingh commented 2 years ago

I use HTML tags to get the required rendered output

Good to know.

Then, you only need to be careful when configuring Markdown renderer and authoring Markdown documents. This is for not only performance but also conformance.

Avoid using inappropriate parsers, as flavors may hold dramatically different logic. For example, cmark, Python-Markdown, and kramdown can give mostly different results for the same input.

The Markdown flavor of GitHub is GFM. GitHub processes user content with cmark-gfm, and then pipes the HTML output through some "filters" to add GitHub-specific features, such as sanitization, heading anchors, emoji codes, issue references, and diagrams.

As I said above, VS Code's vscode.markdown-language-features provides markdown-it default + html: true, which is a few steps away from GitHub. Basically, you just need to install extensions for emoji and task list, and disable those that don't exist on GitHub (e.g. vscode.markdown-math). Additionally, since VS Code and GitHub generate heading ID with slightly different methods, a link to heading might work on one and break on the other.


If they are so dangerous why do they allow HTML tags in CommonMark

For compatibility.

It's John Gruber, the primary inventor of Markdown, that introduced "HTML" (but not real HTML). The world has no choice but to live with it.

If you ask me to judge Gruber's decision, I'd say the "HTML" in Markdown is a sign of laziness, besides, thousands of collective developer hours have been wasted due to tons of lazy and ambiguous design of the original Markdown. He should have defined generic construct syntax first, and built language structures on top of it.


Probably I'm bad at teaching. But other people have good skill in communication. I recommend their articles: