markedjs / marked

A markdown parser and compiler. Built for speed.
https://marked.js.org
Other
33.23k stars 3.39k forks source link

[Question] Consistent Behavior of End-of-Line Characters Across Block-Level Tokens #3506

Open Bistard opened 3 weeks ago

Bistard commented 3 weeks ago

Marked version: 14.1.2

Background

This is not a bug, but rather a confusion from me. Consider the following text and tokenization result:

const token = lexer.lex('paragraph1\n');
// tokenization result
{type:"paragraph", raw:"paragraph1\n", text:"paragraph1", tokens:[
  {type:"text", raw:"paragraph1", text:"paragraph1"}
]}

I notice that the end of the line character \n only exists at the token.raw and undetectable under its children tokens or token.text. This is also confirmed by this previous issue I asked.

Expected behavior

My question is: Does this behaviour work for EVERY block-level token? That is, for every block-level token, when a '\n' character is at the end of that block, is it always only accessible and detectable in the token.raw property?

Example

I tested list, paragraph, heading, codeBlock, blockQuote in the official demo website. They seem to follow my expectations.

For example, the tokenization result from heading, codeBlock and BlockQuote tokens in my case is the following:

// '# Heading\n'
{type:"heading", raw:"# heading\n", depth:1, text:"heading", tokens:[
  {type:"text", raw:"heading", text:"heading"}
]}
// '> paragraph1\n'
{type:"paragraph", raw:"'> paragraph1\n", text:"'> paragraph1", tokens:[
  {type:"text", raw:"'> paragraph1", text:"'> paragraph1"}
]}
// '```ts\nconsole.log(1)\n```\n'
[
{type:"code", raw:"```ts\nconsole.log(1)\n```\n", lang:"ts", text:"console.log(1)"}
]

But I tried html token, seems like an exception:

// '<div>hi</div>\n'
[
{type:"html", block:true, raw:"<div>hi</div>\n", pre:false, text:"<div>hi</div>\n"}
]

Additionals

For hr token, since it only has the token.raw property but no token.text property, so this block-level token is not in the range of my question:

// '---\n'
{type:"hr", raw:"---"}
UziTech commented 3 weeks ago

I don't think it is consistent. If you would like to create a PR to make it consistent we could get it in the next major version. 😁👍

Bistard commented 3 weeks ago

OK. In the next few days or weeks, I will look up the source code and try to make it consistent through a PR.