syntax-tree / mdast-util-from-markdown

mdast utility to parse markdown
MIT License
212 stars 20 forks source link

Both italics and underline are not reversable #37

Closed ptc-tonyo closed 1 year ago

ptc-tonyo commented 1 year ago

Initial checklist

Affected packages and versions

mdast-util-to-markdown "^2.0.0" => "remark": "^15.0.1", "remark-parse": "^11.0.0", "remark-stringify": "^11.0.0",

Link to runnable example

No response

Steps to reproduce

This unit test fails:

it("should preserve bold, italics and underscore", () => {
  const markdown1 = "Here's some *bold text*, some _italic text_, and some __underlined text__.";
  const ast = remark().use(remarkParse).parse(markdown);
  const markdown2 = remark().use(remarkStringify).stringify(ast);
  expect(markdown1).toEqual(markdown2);  // Fails<<<
});
// markdown1 == "Here's some *bold text*, some _italic text_, and some __underlined text__."
// markdown2 == "Here's some *bold text*, some *italic text*, and some **underlined text**."

Expected behavior

Both bold and italics are parsed to become type: 'emphasis' in the AST so can't be unparsed via remark-stringify as it can't distinguish them and underlined becomes **underlined text**.

{
  "type": "root",
  "children": [
    {
      "type": "paragraph",
      "children": [
        {
          "type": "text",
          "value": "Here's some ",
          "position": {
            "start": {
              "line": 1,
              "column": 1,
              "offset": 0
            },
            "end": {
              "line": 1,
              "column": 13,
              "offset": 12
            }
          }
        },
        {
          "type": "emphasis",
          "children": [
            {
              "type": "text",
              "value": "bold text",
              "position": {
                "start": {
                  "line": 1,
                  "column": 14,
                  "offset": 13
                },
                "end": {
                  "line": 1,
                  "column": 23,
                  "offset": 22
                }
              }
            }
          ],
          "position": {
            "start": {
              "line": 1,
              "column": 13,
              "offset": 12
            },
            "end": {
              "line": 1,
              "column": 24,
              "offset": 23
            }
          }
        },
        {
          "type": "text",
          "value": ", some ",
          "position": {
            "start": {
              "line": 1,
              "column": 24,
              "offset": 23
            },
            "end": {
              "line": 1,
              "column": 31,
              "offset": 30
            }
          }
        },
        {
          "type": "emphasis",
          "children": [
            {
              "type": "text",
              "value": "italic text",
              "position": {
                "start": {
                  "line": 1,
                  "column": 32,
                  "offset": 31
                },
                "end": {
                  "line": 1,
                  "column": 43,
                  "offset": 42
                }
              }
            }
          ],
          "position": {
            "start": {
              "line": 1,
              "column": 31,
              "offset": 30
            },
            "end": {
              "line": 1,
              "column": 44,
              "offset": 43
            }
          }
        },
        {
          "type": "text",
          "value": ", and some ",
          "position": {
            "start": {
              "line": 1,
              "column": 44,
              "offset": 43
            },
            "end": {
              "line": 1,
              "column": 55,
              "offset": 54
            }
          }
        },
        {
          "type": "strong",
          "children": [
            {
              "type": "text",
              "value": "underlined text",
              "position": {
                "start": {
                  "line": 1,
                  "column": 57,
                  "offset": 56
                },
                "end": {
                  "line": 1,
                  "column": 72,
                  "offset": 71
                }
              }
            }
          ],
          "position": {
            "start": {
              "line": 1,
              "column": 55,
              "offset": 54
            },
            "end": {
              "line": 1,
              "column": 74,
              "offset": 73
            }
          }
        },
        {
          "type": "text",
          "value": ".",
          "position": {
            "start": {
              "line": 1,
              "column": 74,
              "offset": 73
            },
            "end": {
              "line": 1,
              "column": 75,
              "offset": 74
            }
          }
        }
      ],
      "position": {
        "start": {
          "line": 1,
          "column": 1,
          "offset": 0
        },
        "end": {
          "line": 1,
          "column": 75,
          "offset": 74
        }
      }
    }
  ],
  "position": {
    "start": {
      "line": 1,
      "column": 1,
      "offset": 0
    },
    "end": {
      "line": 1,
      "column": 75,
      "offset": 74
    }
  }
}

Actual behavior

  const markdown1 = "Here's some *bold text*, some _italic text_, and some __underlined text__.";
  const ast = remark().use(remarkParse).parse(markdown);
  const markdown2 = remark().use(remarkStringify).stringify(ast);

I expected remarkParse to be reversable by remarkStringify however we get his result:

// markdown1 == "Here's some *bold text*, some _italic text_, and some __underlined text__."
// markdown2 == "Here's some *bold text*, some *italic text*, and some **underlined text**."

I'm testing using vitest

Affected runtime and version

node@18.18.0

Affected package manager and version

yarn@122.19

Affected OS and version

Windows 10

Build and bundle tools

Vite

ChristianMurphy commented 1 year ago

Welcome @ptc-tonyo! 👋 Sorry you ran into some confusion.

A good starting point would be understand what Markdown is and what it supports quick version: https://commonmark.org/help/ more formal details: https://spec.commonmark.org/0.30/#emphasis-and-strong-emphasis Emphasis is surrounded by a single * or _ on either side. Bold text is surrounded by two ** or __ on either side. There is no underline construct in markdown.

remark/mdast-util-from-markdown will preserve valid and supported markdown structure.

remark/micromark/mdast-util-from-markdown do support extending markdown with new syntax. Creating your own custom markdown-like flavor that is only supported by your application is not generally advisable, but is possible https://github.com/micromark/micromark#extending-markdown

github-actions[bot] commented 1 year ago

Hi! This was closed. Team: If this was fixed, please add phase/solved. Otherwise, please add one of the no/* labels.

ptc-tonyo commented 1 year ago

Thanks. I think I was looking at a different flavour of markdown when I chose the example.