markedjs / marked

A markdown parser and compiler. Built for speed.
https://marked.js.org
Other
33.23k stars 3.39k forks source link

[bug] `CodeBlock` - mismatched end fence behave differently #3514

Closed Bistard closed 3 weeks ago

Bistard commented 3 weeks ago

Marked version: 14.1.2

Describe the bug Consider when lexing the following two texts (both have end fence with length 4):

// "```\nmismatched\n````"
// "```\nmismatched\n~~~~"

The following are their tokenized result:

{type:"code", raw:"```\nmismatch\n````", lang:"", text:"mismatch"},
{type:"code", raw:"```\nmismatch\n~~~~", lang:"", text:"mismatch\n~~~~"}

Expected behavior 🤔I was expecting:

{type:"code", raw:"```\nmismatch\n````", lang:"", text:"mismatch"},
{type:"code", raw:"```\nmismatch\n~~~~", lang:"", text:"mismatch"}

or just

{type:"code", raw:"```\nmismatch\n````", lang:"", text:"mismatch\n````"},
{type:"code", raw:"```\nmismatch\n~~~~", lang:"", text:"mismatch\n~~~~"}
UziTech commented 3 weeks ago

As you can see in these demos the text property is the code that is displayed

demo 1

demo 2

Bistard commented 3 weeks ago

Given that the ending fences are mismatched in both cases, shouldn't they both be handled (or ignored) within the text property for simplicity and consistency?

This behavior is confusing because:

  1. In the case with "~", the ending fence with 4 ~ characters is displayed as part of the text.
  2. However, in the case with backticks, the ending fence with 4 backticks does not appear in the text.

Is this behavior defined by CommonMark, or could the behaviors be aligned?

UziTech commented 3 weeks ago

This is the same behavior as CommonMark

demo 1

demo 2

Bistard commented 3 weeks ago

OK. That make sense. I should check common mark first. Thanks for the response.